Fundamentals of Database part 1

The difference between data and information

Why Databases?

Databases solve many of the problems encountered in data management

Used in almost all modern settings involving data management:

Business

Research

Administration

Important to understand how databases work and interact with other applications

Data vs. Information

  • Data are raw facts
  • Information is the result of processing raw data to reveal meaning
  • Information requires context to reveal meaning
  • Raw data must be formatted for storage, processing, and presentation
  • Data are the foundation of information, which is the bedrock of knowledge
  • Data: building blocks of information
  • Information produced by processing data
  • Information used to reveal meaning in data
  • Accurate, relevant, timely information is the key to good decision making
  • Good decision making is the key to organizational survival

Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.

Normalization involves arranging attributes in tables based on dependencies between attributes, ensuring that the dependencies are properly enforced by database integrity constraints. Normalization is accomplished through applying some formal rules either by a process of synthesis or decomposition. Synthesis creates a normalized database design based on a known set of dependencies. Decomposition takes an existing (insufficiently normalized) database design and improves it based on the known set of dependencies.

Edgar F. Codd, the inventor of the relational model (RM), introduced the concept of normalization and what we now know as the First normal form (1NF) in 1970.[1] Codd went on to define the Second normal form (2NF) and Third normal form (3NF) in 1971,[2] and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.[3] Informally, a relational database table is often described as "normalized" if it meets Third Normal Form.[4] Most 3NF tables are free of insertion, update, and deletion anomalies.

Normalization Rule

Normalization rule are divided into following normal form.

  1. First Normal Form
  2. Second Normal Form
  3. Third Normal Form
  4. BCNF

 

First normal form (1NF). This is the "basic" level of database normalization, and it generally corresponds to the definition of any database, namely:

  • It contains two-dimensional tables with rows and columns.
  • Each column corresponds to a subobject or an attribute of the object represented by the entire table.
  • Each row represents a unique instance of that subobject or attribute and must be different in some way from any other row (that is, no duplicate rows are possible).
  • All entries in any column must be of the same kind. For example, in the column labeled "Customer," only customer names or numbers are permitted.

Second normal form (2NF). At this level of normalization, each column in a table that is not a determiner of the contents of another column must itself be a function of the other columns in the table. For example, in a table with three columns containing the customer ID, the product sold and the price of the product when sold, the price would be a function of the customer ID (entitled to a discount) and the specific product.

Third normal form (3NF). At the second normal form, modifications are still possible because a change to one row in a table may affect data that refers to this information from another table. For example, using the customer table just cited, removing a row describing a customer purchase (because of a return, perhaps) will also remove the fact that the product has a certain price. In the third normal form, these tables would be divided into two tables so that product pricing would be tracked separately.

Extensions of basic normal forms include the domain/key normal form, in which a key uniquely identifies each row in a table, and the Boyce-Codd normal form, which refines and enhances the techniques used in the 3NF to handle some types of anomalies.

Database normalization's ability to avoid or reduce data anomalies, data redundancies and data duplications, while improving data integrity, have made it an important part of the data developer's toolkit for many years. It has been one of the hallmarks of the relational data model.

The relational model arose in an era when business records were, first and foremost, on paper. Its use of tables was, in some part, an effort to mirror the type of tables used on paper that acted as the original representation of the (mostly accounting) data. The need to support that type of representation has waned as digital-first representations of data have replaced paper-first records.

But other factors have also contributed to challenging the dominance of database normalization.

 

sources  :

https://en.wikipedia.org/wiki/Database_normalization

http://databases.about.com/od/specificproducts/a/normalization.htm

http://searchsqlserver.techtarget.com/definition/normalization

http://www.studytonight.com/dbms/database-normalization.php

 

What is 'Data Mining'

Data mining is a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies as well as increase sales and decrease costs. Data mining depends on effective data collection and warehousing as well as computer processing.

BREAKING DOWN 'Data Mining'

Grocery stores are well-known users of data mining techniques. Many supermarkets offer free loyalty cards to customers that give them access to reduced prices not available to non-members. The cards make it easy for stores to track who is buying what, when they are buying it and at what price. The stores can then use this data, after analyzing it, for multiple purposes, such as offering customers coupons targeted to their buying habits and deciding when to put items on sale or when to sell them at full price. Data mining can be a cause for concern when only selected information, which is not representative of the overall sample group, is used to prove a certain hypothesis.

Data Warehousing

When companies centralize their data into one database or program, it is called data warehousing. With a data warehouse, an organization may spin off segments of the data for specific users to analyze and utilize. However, in other cases, analysts may start with the type of data they want and create a data warehouse based on those specs. Regardless of how businesses and other entities organize their data, they use it to support management's decision-making processes.

Data Mining Software

Data mining programs analyze relationships and patterns in data based on what users request. For example, data mining software can be used to create classes of information. To illustrate, imagine a restaurant wants to use data mining to determine when they should offer certain specials. It looks at the information it has collected and creates classes based on when customers visit and what they order.

In other cases, data miners find clusters of information based on logical relationships, or they look at associations and sequential patterns to draw conclusions about trends in consumer behavior.

Data Mining Process

The data mining process breaks down into five steps. First, organizations collect data and load it into their data warehouses. Next, they store and manage the data, either on in-house servers or the cloud. Business analysts, management teams and information technology professionals access the data and determine how they want to organize it. Then, application software sorts the data based on the user's results, and finally, the end user presents the data in an easy-to-share format, such as a graph or table.

for more detail

http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

http://www.investopedia.com/terms/d/datamining.asp

https://en.wikipedia.org/wiki/Data_mining

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs:

Volume. Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.

Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.

Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.

At SAS, we consider two additional dimensions when it comes to big data:

Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data.

Complexity. Today's data comes from multiple sources, which makes it difficult to link, match, cleanse and transform data across systems. However, it’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.

Why Is Big Data Important?

The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:

  • Determining root causes of failures, issues and defects in near-real time.
  • Generating coupons at the point of sale based on the customer’s buying habits.
  • Recalculating entire risk portfolios in minutes.
  • Detecting fraudulent behavior before it affects your organization.