Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.

Normalization involves arranging attributes in tables based on dependencies between attributes, ensuring that the dependencies are properly enforced by database integrity constraints. Normalization is accomplished through applying some formal rules either by a process of synthesis or decomposition. Synthesis creates a normalized database design based on a known set of dependencies. Decomposition takes an existing (insufficiently normalized) database design and improves it based on the known set of dependencies.

Edgar F. Codd, the inventor of the relational model (RM), introduced the concept of normalization and what we now know as the First normal form (1NF) in 1970.[1] Codd went on to define the Second normal form (2NF) and Third normal form (3NF) in 1971,[2] and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.[3] Informally, a relational database table is often described as "normalized" if it meets Third Normal Form.[4] Most 3NF tables are free of insertion, update, and deletion anomalies.

Normalization Rule

Normalization rule are divided into following normal form.

  1. First Normal Form
  2. Second Normal Form
  3. Third Normal Form
  4. BCNF


First normal form (1NF). This is the "basic" level of database normalization, and it generally corresponds to the definition of any database, namely:

  • It contains two-dimensional tables with rows and columns.
  • Each column corresponds to a subobject or an attribute of the object represented by the entire table.
  • Each row represents a unique instance of that subobject or attribute and must be different in some way from any other row (that is, no duplicate rows are possible).
  • All entries in any column must be of the same kind. For example, in the column labeled "Customer," only customer names or numbers are permitted.

Second normal form (2NF). At this level of normalization, each column in a table that is not a determiner of the contents of another column must itself be a function of the other columns in the table. For example, in a table with three columns containing the customer ID, the product sold and the price of the product when sold, the price would be a function of the customer ID (entitled to a discount) and the specific product.

Third normal form (3NF). At the second normal form, modifications are still possible because a change to one row in a table may affect data that refers to this information from another table. For example, using the customer table just cited, removing a row describing a customer purchase (because of a return, perhaps) will also remove the fact that the product has a certain price. In the third normal form, these tables would be divided into two tables so that product pricing would be tracked separately.

Extensions of basic normal forms include the domain/key normal form, in which a key uniquely identifies each row in a table, and the Boyce-Codd normal form, which refines and enhances the techniques used in the 3NF to handle some types of anomalies.

Database normalization's ability to avoid or reduce data anomalies, data redundancies and data duplications, while improving data integrity, have made it an important part of the data developer's toolkit for many years. It has been one of the hallmarks of the relational data model.

The relational model arose in an era when business records were, first and foremost, on paper. Its use of tables was, in some part, an effort to mirror the type of tables used on paper that acted as the original representation of the (mostly accounting) data. The need to support that type of representation has waned as digital-first representations of data have replaced paper-first records.

But other factors have also contributed to challenging the dominance of database normalization.


sources  :