ChatGPT 

 

ChatGPT (Generative Pre-trained Transformer)[1] is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3.5 family of large language models, and is fine-tuned with both supervised and reinforcement learning techniques.

ChatGPT was launched as a prototype on November 30, 2022, and quickly garnered attention for its detailed responses and articulate answers across many domains of knowledge, however its uneven factual accuracy was identified as a significant drawback.[2] Following the release of ChatGPT, OpenAI was reportedly valued at $29 billion.[3]

Training

 
 
OpenAI CEO Sam Altman

ChatGPT was fine-tuned on top of GPT-3.5 using supervised learning as well as reinforcement learning.[4] Both approaches used human trainers to improve the model's performance. In the case of supervised learning, the model was provided with conversations in which the trainers played both sides: the user and the AI assistant. In the reinforcement step, human trainers first ranked responses that the model had created in a previous conversation. These rankings were used to create 'reward models' that the model was further fine-tuned on using several iterations of Proximal Policy Optimization (PPO).[5][6] Proximal Policy Optimization algorithms present a cost-effective benefit to trust region policy optimization algorithms; they negate many of the computationally expensive operations with faster performance.[7][8] The models were trained in collaboration with Microsoft on their Azure supercomputing infrastructure.

In addition, OpenAI continues to gather data from ChatGPT users that could be used to further train and fine-tune ChatGPT. Users are allowed to upvote or downvote the responses they receive from ChatGPT; upon upvoting or downvoting, they can also fill out a text field with additional feedback.[9][10][11]

Features and limitations

While the core function of a chatbot is to mimic a human conversationalist, journalists have also noted ChatGPT's versatility and improvisation skills, including its ability to write and debug computer programs; to compose music, teleplays, fairy tales, and student essays; to answer test questions (sometimes, depending on the test, at a level above the average human test-taker);[12] to write poetry and song lyrics;[13] to emulate a Linux system; to simulate an entire chat room; to play games like tic-tac-toe; and to simulate an ATM.[14]

In comparison to its predecessor, InstructGPT, ChatGPT attempts to reduce harmful and deceitful responses;[15] in one example, while InstructGPT accepts the prompt "Tell me about when Christopher Columbus came to the US in 2015" as truthful, ChatGPT uses information about Columbus' voyages and information about the modern world – including perceptions of Columbus to construct an answer that assumes what would happen if Columbus came to the U.S. in 2015.[5] ChatGPT's training data includes man pages and information about Internet phenomena and programming languages, such as bulletin board systems and the Python programming language.[14]

Unlike most chatbots, ChatGPT remembers previous prompts given to it in the same conversation; journalists have suggested that this will allow ChatGPT to be used as a personalized therapist.[16] To prevent offensive outputs from being presented to and produced from ChatGPT, queries are filtered through OpenAI's company-wide[17][18] moderation API, and potentially racist or sexist prompts are dismissed.[5][16]

ChatGPT suffers from multiple limitations. OpenAI acknowledged that ChatGPT "sometimes writes plausible-sounding but incorrect or nonsensical answers".[5] The reward model of ChatGPT, designed around human oversight, can be over-optimized and thus hinder performance, otherwise known as Goodhart's law.[19] ChatGPT has limited knowledge of events that occurred after 2021. According to the BBC, as of December 2022 ChatGPT is not allowed to "express political opinions or engage in political activism".[20] Yet, research suggests that ChatGPT exhibits a pro-environmental, left-libertarian orientation when prompted to take a stance on political statements from two established voting advice applications.[21] In training ChatGPT, human reviewers preferred longer answers, irrespective of actual comprehension or factual content.[5] Training data also suffers from algorithmic bias, which may be revealed when ChatGPT responds to prompts including descriptors of people. In one instance, ChatGPT generated a rap indicating that women and scientists of color were inferior to white and male scientists.[22][23]

Service

ChatGPT was launched on November 30, 2022, by San Francisco-based OpenAI, the creator of DALL·E 2 and Whisper. The service was launched as initially free to the public, with plans to monetize the service later.[24] By December 4, OpenAI estimated ChatGPT already had over one million users.[9] CNBC wrote on December 15, 2022, that the service "still goes down from time to time".[25] The service works best in English, but is also able to function in some other languages, to varying degrees of success.[13] Unlike some other recent high-profile advances in AI, as of December 2022, there is no sign of an official peer-reviewed technical paper about ChatGPT.[26]

According to OpenAI guest researcher Scott Aaronson, OpenAI is working on a tool to attempt to watermark its text generation systems so as to combat bad actors using their services for academic plagiarism or for spam.[27][28] The New York Times relayed in December 2022 that the next version of GPT, GPT-4, has been "rumored" to be launched sometime in 2023.[16]

Reception, criticism and issues

Positive reactions

ChatGPT was met in December 2022 with generally positive reviews; The New York Times labeled it "the best artificial intelligence chatbot ever released to the general public".[29] Samantha Lock of Britain's The Guardian newspaper noted that it was able to generate "impressively detailed" and "human-like" text.[30] Technology writer Dan Gillmor used ChatGPT on a student assignment, and found its generated text was on par with what a good student would deliver and opined that "academia has some very serious issues to confront".[31] Alex Kantrowitz of Slate magazine lauded ChatGPT's pushback to questions related to Nazi Germany, including the claim that Adolf Hitler built highways in Germany, which was met with information regarding Nazi Germany's use of forced labor.[32]

In The Atlantic's "Breakthroughs of the Year" for 2022, Derek Thompson included ChatGPT as part of "the generative-AI eruption" that "may change our mind about how we work, how we think, and what human creativity really is".[33]

Kelsey Piper of the Vox website wrote that "ChatGPT is the general public's first hands-on introduction to how powerful modern AI has gotten, and as a result, many of us are (stunned)" and that "ChatGPT is smart enough to be useful despite its flaws". Paul Graham of Y Combinator tweeted that "The striking thing about the reaction to ChatGPT is not just the number of people who are blown away by it, but who they are. These are not people who get excited by every shiny new thing. Clearly something big is happening."[34] Elon Musk wrote that "ChatGPT is scary good. We are not far from dangerously strong AI".[35] Musk paused OpenAI's access to a Twitter database pending better understanding of OpenAI's plans, stating that "OpenAI was started as open-source and non-profit. Neither are still true."[36][37] Musk had co-founded OpenAI in 2015, in part to address existential risk from artificial intelligence, but had resigned in 2018.[37]

In December 2022 Google internally expressed alarm at the unexpected strength of ChatGPT and the newly discovered potential of large language models to disrupt the search engine business, and CEO Sundar Pichai "upended" and reassigned teams within multiple departments to aid in its artificial intelligence products, according to The New York Times.[38] The Information reported on January 3, 2023 that Microsoft Bing was planning to add optional ChatGPT functionality into its public search engine, possibly around March 2023.[39][40]

Negative reactions

In a December 2022 opinion piece, economist Paul Krugman wrote that ChatGPT would affect the demand for knowledge workers.[41] The Verge's James Vincent saw the viral success of ChatGPT as evidence that artificial intelligence had gone mainstream.[6] Journalists have commented on ChatGPT's tendency to "hallucinate".[42] Mike Pearl of Mashable tested ChatGPT with multiple questions. In one example, he asked ChatGPT for "the largest country in Central America that isn't Mexico". ChatGPT responded with Guatemala, when the answer is instead Nicaragua.[43] When CNBC asked ChatGPT for the lyrics to "The Ballad of Dwight Fry", ChatGPT supplied invented lyrics rather than the actual lyrics.[25] Researchers cited by The Verge compared ChatGPT to a "stochastic parrot",[44] as did Professor Anton Van Den Hengel of the Australian Institute for Machine Learning.[45]

In December 2022, the question and answer website Stack Overflow banned the use of ChatGPT for generating answers to questions, citing the factually ambiguous nature of ChatGPT's responses.[2] In January 2023, the International Conference on Machine Learning banned any undocumented use of ChatGPT or other large language models to generate any text in submitted papers.[46]

Economist Tyler Cowen expressed concerns regarding its effects on democracy, citing the ability of one to write automated comments to affect the decision process of new regulations.[47] The Guardian questioned whether any content found on the Internet after ChatGPT's release "can be truly trusted" and called for government regulation.[48]

Implications for cybersecurity

Check Point Research and others noted that ChatGPT was capable of writing phishing emails and malware, especially when combined with OpenAI Codex.[49] The CEO of ChatGPT creator OpenAI, Sam Altman, wrote that advancing software could pose "(for example) a huge cybersecurity risk" and also continued to predict "we could get to real AGI (artificial general intelligence) in the next decade, so we have to take the risk of that extremely seriously". Altman argued that, while ChatGPT is "obviously not close to AGI", one should "trust the exponential. Flat looking backwards, vertical looking forwards."[9]

Implications for education

 

In The Atlantic magazine, Stephen Marche noted that its effect on academia and especially application essays is yet to be understood.[50] California high school teacher and author Daniel Herman wrote that ChatGPT would usher in "The End of High School English".[51]

In the Nature journal, Chris Stokel-Walker pointed out that teachers should be concerned about students using ChatGPT to outsource their writing but that education providers will adapt to enhance critical thinking or reasoning.[52]

Emma Bowman with NPR wrote of the danger of students plagiarizing through an AI tool that may output biased or nonsensical text with an authoritative tone: "There are still many cases where you ask it a question and it'll give you a very impressive-sounding answer that's just dead wrong."[53]

Joanna Stern with The Wall Street Journal described cheating in American high school English with the tool by submitting a generated essay.[54] Professor Darren Hick of Furman University described noticing ChatGPT's "style" in a paper submitted by a student. An online GPT detector claimed the paper was 99.9% likely to be computer-generated, but Hick had no hard proof. However, the student in question confessed to using GPT when confronted, and as a consequence failed the course.[55] Hick suggested a policy of giving an ad-hoc individual oral exam on the paper topic if a student is strongly suspected of submitting an AI-generated paper.[56] Edward Tian, a senior undergraduate student at Princeton University, claimed that he has created a program, named "GPTZero," that detects whether an essay is human written or not to combat academic plagiarism.[57][58]

As of January 4, 2023, the New York City Department of Education has restricted access to ChatGPT from its public school internet and devices.[59][60]

Jailbreaks

ChatGPT attempts to reject prompts that may violate its content policy. However, some users managed to jailbreak ChatGPT by using various prompt engineering techniques to bypass these restrictions in early December 2022 and successfully tricked ChatGPT into giving instructions for how to create a Molotov cocktail or a nuclear bomb, or into generating arguments in the style of a Neo-Nazi.[61] A Toronto Star reporter had uneven personal success in getting ChatGPT to make inflammatory statements shortly after launch: ChatGPT readily endorsed the Russian invasion of Ukraine, but even when asked to play along with a fictional scenario, ChatGPT balked at generating arguments for why Canadian Prime Minister Justin Trudeau was guilty of treason.[62][63]

See also

Source https://en.wikipedia.org/wiki/ChatGPT

 

Fundamentals of Database part 1

The difference between data and information

Why Databases?

Databases solve many of the problems encountered in data management

Used in almost all modern settings involving data management:

Business

Research

Administration

Important to understand how databases work and interact with other applications

Data vs. Information

  • Data are raw facts
  • Information is the result of processing raw data to reveal meaning
  • Information requires context to reveal meaning
  • Raw data must be formatted for storage, processing, and presentation
  • Data are the foundation of information, which is the bedrock of knowledge
  • Data: building blocks of information
  • Information produced by processing data
  • Information used to reveal meaning in data
  • Accurate, relevant, timely information is the key to good decision making
  • Good decision making is the key to organizational survival

What is 'Data Mining'

Data mining is a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies as well as increase sales and decrease costs. Data mining depends on effective data collection and warehousing as well as computer processing.

BREAKING DOWN 'Data Mining'

Grocery stores are well-known users of data mining techniques. Many supermarkets offer free loyalty cards to customers that give them access to reduced prices not available to non-members. The cards make it easy for stores to track who is buying what, when they are buying it and at what price. The stores can then use this data, after analyzing it, for multiple purposes, such as offering customers coupons targeted to their buying habits and deciding when to put items on sale or when to sell them at full price. Data mining can be a cause for concern when only selected information, which is not representative of the overall sample group, is used to prove a certain hypothesis.

Data Warehousing

When companies centralize their data into one database or program, it is called data warehousing. With a data warehouse, an organization may spin off segments of the data for specific users to analyze and utilize. However, in other cases, analysts may start with the type of data they want and create a data warehouse based on those specs. Regardless of how businesses and other entities organize their data, they use it to support management's decision-making processes.

Data Mining Software

Data mining programs analyze relationships and patterns in data based on what users request. For example, data mining software can be used to create classes of information. To illustrate, imagine a restaurant wants to use data mining to determine when they should offer certain specials. It looks at the information it has collected and creates classes based on when customers visit and what they order.

In other cases, data miners find clusters of information based on logical relationships, or they look at associations and sequential patterns to draw conclusions about trends in consumer behavior.

Data Mining Process

The data mining process breaks down into five steps. First, organizations collect data and load it into their data warehouses. Next, they store and manage the data, either on in-house servers or the cloud. Business analysts, management teams and information technology professionals access the data and determine how they want to organize it. Then, application software sorts the data based on the user's results, and finally, the end user presents the data in an easy-to-share format, such as a graph or table.

for more detail

http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

http://www.investopedia.com/terms/d/datamining.asp

https://en.wikipedia.org/wiki/Data_mining

Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.

Normalization involves arranging attributes in tables based on dependencies between attributes, ensuring that the dependencies are properly enforced by database integrity constraints. Normalization is accomplished through applying some formal rules either by a process of synthesis or decomposition. Synthesis creates a normalized database design based on a known set of dependencies. Decomposition takes an existing (insufficiently normalized) database design and improves it based on the known set of dependencies.

Edgar F. Codd, the inventor of the relational model (RM), introduced the concept of normalization and what we now know as the First normal form (1NF) in 1970.[1] Codd went on to define the Second normal form (2NF) and Third normal form (3NF) in 1971,[2] and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.[3] Informally, a relational database table is often described as "normalized" if it meets Third Normal Form.[4] Most 3NF tables are free of insertion, update, and deletion anomalies.

Normalization Rule

Normalization rule are divided into following normal form.

  1. First Normal Form
  2. Second Normal Form
  3. Third Normal Form
  4. BCNF

 

First normal form (1NF). This is the "basic" level of database normalization, and it generally corresponds to the definition of any database, namely:

  • It contains two-dimensional tables with rows and columns.
  • Each column corresponds to a subobject or an attribute of the object represented by the entire table.
  • Each row represents a unique instance of that subobject or attribute and must be different in some way from any other row (that is, no duplicate rows are possible).
  • All entries in any column must be of the same kind. For example, in the column labeled "Customer," only customer names or numbers are permitted.

Second normal form (2NF). At this level of normalization, each column in a table that is not a determiner of the contents of another column must itself be a function of the other columns in the table. For example, in a table with three columns containing the customer ID, the product sold and the price of the product when sold, the price would be a function of the customer ID (entitled to a discount) and the specific product.

Third normal form (3NF). At the second normal form, modifications are still possible because a change to one row in a table may affect data that refers to this information from another table. For example, using the customer table just cited, removing a row describing a customer purchase (because of a return, perhaps) will also remove the fact that the product has a certain price. In the third normal form, these tables would be divided into two tables so that product pricing would be tracked separately.

Extensions of basic normal forms include the domain/key normal form, in which a key uniquely identifies each row in a table, and the Boyce-Codd normal form, which refines and enhances the techniques used in the 3NF to handle some types of anomalies.

Database normalization's ability to avoid or reduce data anomalies, data redundancies and data duplications, while improving data integrity, have made it an important part of the data developer's toolkit for many years. It has been one of the hallmarks of the relational data model.

The relational model arose in an era when business records were, first and foremost, on paper. Its use of tables was, in some part, an effort to mirror the type of tables used on paper that acted as the original representation of the (mostly accounting) data. The need to support that type of representation has waned as digital-first representations of data have replaced paper-first records.

But other factors have also contributed to challenging the dominance of database normalization.

 

sources  :

https://en.wikipedia.org/wiki/Database_normalization

http://databases.about.com/od/specificproducts/a/normalization.htm

http://searchsqlserver.techtarget.com/definition/normalization

http://www.studytonight.com/dbms/database-normalization.php