Normalization in SQL: 1NF, 2NF, 3NF, and BCNF in DBMS

Normalization in SQL: 1NF, 2NF, 3NF, and BCNF in DBMS

Table of content

Show More

Normalization is a procedure used to reduce dependency and redundancy in a relational database.  This method ensures that the data stored in a database is well-organized, accurate, and easy to maintain. Normalization in SQL involves dividing a database into two or more tables and defining the relationship between the tables. If you work with databases and want to understand what normalization is, you must go through this blog. Let’s explore normalization in detail. 

Watch this video on Normalization in SQL for a detailed explanation

Video Thumbnail

What is Normalization in SQL?

In SQL, the process of structuring and resolving data in a database is known as normalization. It is achieved by applying a set of rules and procedures to a database, known as normalization forms. 

There are several normalization forms, including the First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Boyce-Codd Normal Form (BCNF), and the Fourth Normal Form (4NF). Each form builds on the previous one, providing a set of guidelines for organizing data in a way that meets the specific requirements of a database.

KEY in SQL

In SQL, a KEY is a field or combination of fields that uniquely identifies each record in a table. There are different types of keys:

  • Primary Key: The primary key in Excel is a unique identifier for each row/record in a table. It can be a single column or multiple columns defined as a composite primary key. A table should have only one primary key.
  • Foreign Key: A foreign key links two tables together by referencing the primary key in another table. It helps enforce relational integrity between tables by preventing orphan records.
  • Candidate Key: A candidate key is a column or set of columns that can uniquely identify rows like a primary key. However, only one candidate key can be designated as the primary key.
  • Alternate Key: An alternate key is a unique identifier for rows, similar to a primary key, but it is not designated as the primary key. It can be used as a foreign key in other tables.
  • Composite Key: When multiple columns are combined to uniquely identify rows, it is called a composite key. The columns together must be unique, but individual columns may have duplicate values.

Types of Normalization in SQL

There are various levels of normalization, each with its own set of regulations. The most commonly used types of normalization in SQL are the First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Fourth Normal Form (4NF), and Boyce-Codd Normal Form.

Types of Normalization in SQL

First Normal Form (1NF)

1NF ensures that each column in a table contains atomic (indivisible) values, avoiding multiple values within a single cell. Let’s look at a simplified table that shows the data about the students:

Student IDNameCourses
001JohnMath, Physics
002SarahChemistry
003MichaelMath, Biology

This table shows a violation of 1NF due to the “Courses” column containing multiple values, breaching the atomicity principle. To align with 1NF, the table should be restructured:

Student IDNameCourse
001JohnMath
001JohnPhysics
002SarahChemistry
003MichaelMath
003MichaelBiology

In this revised structure, each row represents a single student-course pairing, following the atomic value requirement of 1NF. By splitting the original “Courses” column into distinct rows, the data now meets the 1NF criteria, enabling smoother data retrieval and manipulation.

Second Normal Form (2NF)

2NF revolves around ensuring data dependency within a table. To understand this, let’s consider a hypothetical table showing information about courses, professors, and departments:

Course IDCourse NameProfessorDepartment
101BiologyDr. SmithBiology
102ChemistryDr. JohnsonChemistry
103PhysicsDr. WhitePhysics

At first glance, this table appears straightforward. However, it doesn’t satisfy 2NF because the “Professor” column is functionally dependent only on the “Course ID” but not on the “Department.” To follow 2NF, let’s refine the structure:

Course IDCourse NameProfessor IDProfessor NameDepartment
101Biology001Dr. SmithBiology
102Chemistry002Dr. JohnsonChemistry
103Physics003Dr. WhitePhysics
Professor IDProfessor NameDepartment
001Dr. SmithBiology
002Dr. JohnsonChemistry
003Dr. WhitePhysics

By breaking down the original table into two separate tables, one collecting course-related details and the other focusing only on professors and departments, we follow 2NF. This separation resolves the issue of partial dependency, ensuring each table’s columns are functionally dependent on the primary keys.

Get 100% Hike!

Master Most in Demand Skills Now!

Third Normal Form (3NF)

Imagine a table that stores information about customers and their orders:

Order IDCustomer NameCustomer PhoneProduct IDProduct NameProduct Category
001John123-456-7890101LaptopElectronics
002Sarah987-654-3210102HeadphonesElectronics
003John123-456-7890103ShirtClothing

At first glance, this table seems fine, but it doesn’t meet the 3NF criteria due to transitive dependency. The “Product Category” is dependent not only on the “Product ID” but also indirectly on the “Customer Name” and “Customer Phone.” To align with 3NF, let’s restructure this:

Order IDCustomer IDProduct ID
001001101
002002102
003001103
Customer IDCustomer NameCustomer Phone
001John123-456-7890
002Sarah987-654-3210
Product IDProduct NameProduct Category
101LaptopElectronics
102HeadphonesElectronics
103ShirtClothing

By splitting the original table into three separate tables: one for orders, one for customers, and another for products, so that the transitive dependency issue gets resolved. Each table now holds information based on its primary key, eliminating redundancy and ensuring data integrity.

Boyce-Codd Normal Form (BCNF)

Consider a table capturing information about employees and their projects:

Employee IDEmployee NameProject IDProject NameProject Supervisor
001John101Project AJane
002Sarah102Project BJohn
003Michael101Project AJane

At first glance, this table seems reasonable, but it doesn’t comply with BCNF due to the dependency between non-prime attributes. Specifically, the “Project Supervisor” is dependent on the “Project ID” but also indirectly on the “Employee ID.” To align with BCNF, let’s restructure the table:

Employee IDEmployee Name
001John
002Sarah
003Michael
Project IDProject Name
101Project A
102Project B
Project IDProject Supervisor
101Jane
102John

By dividing the initial table into three separate tables: employees, projects, and project supervisors, the BCNF requirements are met. Each table now contains information based on its primary key, eliminating any dependencies among non-prime attributes.

Fourth Normal Form (4NF)

Let’s imagine a table that holds information about employees and their projects:

Employee IDEmployee NameProjectHours Worked
001JohnA20
002SarahA25
003MichaelB30
003MichaelC15

This table may seem okay, but it doesn’t quite meet the 4NF criteria due to a multi-valued dependency. The “Hours Worked” column isn’t just dependent on the employee or the project but it’s also dependent on the combination of both.

In order to follow 4NF, we can split the table into two:

Employee IDEmployee Name
001John
002Sarah
003Michael
ProjectHours Worked
A20
A25
B30
C15

By separating the employee information from the project details, we resolve the multi-valued dependency issue. Each table now contains data that depends only on its primary key, avoiding redundancy and maintaining data integrity.

Advantages of Normalization

Normalization provides several benefits to database design and management. Following are some of the benefits of normalization:

Improved Data Consistency

Normalization helps to improve data consistency by eliminating data anomalies. This ensures that data is well-organized, making it easier to maintain and update.

Improved Data Security

Normalization helps to improve data security by reducing the amount of data that needs to be stored in a single table. This in turn reduces the risk of data loss or corruption. 

Reduced Data Redundancy

Normalization helps to eliminate data redundancy by breaking down the data into small, manageable tables. This helps to reduce the amount of duplicate data stored in the database and improves the overall efficiency of the database.

Ease of Maintenance

Normalization makes it easier to maintain a database by reducing the amount of data that needs to be updated. Making changes to the database without affecting the other parts also becomes easy. 

Ease of Querying

Normalization makes it easier to query data from a database by reducing the amount of data stored in a single table. This makes it easy to retrieve the data that is needed for a specific task.

Disadvantages of Normalizations

How to achieve Normalization in SQL?

Achieving Normalization in SQL involves following a set of rules and procedures to ensure that the data in a database is organized optimally. Below are the steps to achieve normalization in SQL:

Identify the Entities and Relationships

Start by identifying the main entities and relationships in your database, and creating a rough table structure which reflects these dependencies.

Choose a Primary Key

Select a unique identifier for each table in your database, which will serve as the primary key.

Remove Repeating Groups

Look for any repeating groups in your tables, and remove them by creating separate tables for each group and establishing relationships between the new tables and the original table.

Eliminate Redundancy

Remove any redundant data in your tables by ensuring that each piece of data is stored only once.

Apply Normalization Rules

Apply the rules for the First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and the higher forms, such as BCNF or 4NF, to ensure that your tables are organized optimally and consistently.

Test and Refine

Test your database design by inserting and retrieving data, and refine the architecture as needed to ensure that your data is organized in a way that meets your requirements.

Document your Design

Document your database design, including the structure of each table, the relationships between tables, and any normalization rules that were applied.

Conclusion

Normalization is an essential aspect of database design in SQL. By organizing data in a structured and consistent manner, normalization helps to improve data integrity and reduces the chances of data anomalies and inconsistencies. It is important to understand the various normalization forms and how to apply them to your data. 

By taking the effort to understand and use normalization in SQL, you can make sure that your database is set up in a way that meets your requirements and serves as a strong foundation for your data-driven applications.

Our SQL Courses Duration and Fees

Program Name
Start Date
Fees
Cohort starts on 14th Jan 2025
₹15,048
Cohort starts on 21st Jan 2025
₹15,048

About the Author

Data Engineer

As a skilled Data Engineer, Sahil excels in SQL, NoSQL databases, Business Intelligence, and database management. He has contributed immensely to projects at companies like Bajaj and Tata. With a strong expertise in data engineering, he has architected numerous solutions for data pipelines, analytics, and software integration, driving insights and innovation.