Normalization in SQL with Example: 1NF, 2NF, 3NF and BCNF

Normalization is a procedure used to reduce dependency and redundancy in a relational database. This method ensures that the data stored in a database is well-organized, accurate, and easy to maintain. Normalization in SQL involves dividing a database into two or more tables and defining the relationship between the tables. If you work with databases and want to understand what normalization is, you must go through this blog. Let’s explore normalization in detail.

What is Normalization in SQL?
Key in SQL
Types of Normalization in SQL
Advantages of Normalization
Disadvantages of Normalizations
How to achieve Normalization in SQL?
What is denormalization in SQL?
How is Normalization and Denormalization different from each other?
Conclusion

What is Normalization in SQL?

In SQL, the process of structuring and resolving data in a database is known as normalization. It is achieved by applying a set of rules and procedures to a database, known as normalization forms.

There are several normalization forms, including the First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Boyce-Codd Normal Form (BCNF), and the Fourth Normal Form (4NF). Each form builds on the previous one, providing a set of guidelines for organizing data in a way that meets the specific requirements of a database.

Key in SQL

In SQL, a KEY is a field or combination of fields that uniquely identifies each record in a table. There are different types of keys:

1. Primary Key

The primary key in Excel is a unique identifier for each row/record in a table. It can be a single column or multiple columns defined as a composite primary key. A table should have only one primary key.

2. Foreign Key

A foreign key links two tables together by referencing the primary key in another table. It helps enforce relational integrity between tables by preventing orphan records.

3. Candidate Key

A candidate key is a column or set of columns that can uniquely identify rows like a primary key. However, only one candidate key can be designated as the primary key.

4. Alternate Key

An alternate key is a unique identifier for rows, similar to a primary key, but it is not designated as the primary key. It can be used as a foreign key in other tables.

5. Composite Key

When multiple columns are combined to uniquely identify rows, it is called a composite key. The columns together must be unique, but individual columns may have duplicate values.

Types of Normalization in SQL

There are various levels of normalization, each with its own set of regulations. The most commonly used types of normalization in SQL are the First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Fourth Normal Form (4NF), and Boyce-Codd Normal Form.

1. First Normal Form (1NF)

1NF ensures that each column in a table contains atomic (indivisible) values, avoiding multiple values within a single cell. Let’s look at a simplified table that shows the data about the students:

Student ID	Name	Courses
001	John	Math, Physics
002	Sarah	Chemistry
003	Michael	Math, Biology

This table shows a violation of 1NF due to the “Courses” column containing multiple values, breaching the atomicity principle. To align with 1NF, the table should be restructured:

Student ID	Name	Course
001	John	Math
001	John	Physics
002	Sarah	Chemistry
003	Michael	Math
003	Michael	Biology

In this revised structure, each row represents a single student-course pairing, following the atomic value requirement of 1NF. By splitting the original “Courses” column into distinct rows, the data now meets the 1NF criteria, enabling smoother data retrieval and manipulation.

2. Second Normal Form (2NF)

2NF revolves around ensuring data dependency within a table. To understand this, let’s consider a hypothetical table showing information about courses, professors, and departments:

Course ID	Course Name	Professor	Department
101	Biology	Dr. Smith	Biology
102	Chemistry	Dr. Johnson	Chemistry
103	Physics	Dr. White	Physics

At first glance, this table appears straightforward. However, it doesn’t satisfy 2NF because the “Professor” column is functionally dependent only on the “Course ID” but not on the “Department.” To follow 2NF, let’s refine the structure:

Course ID	Course Name	Professor ID	Professor Name	Department
101	Biology	001	Dr. Smith	Biology
102	Chemistry	002	Dr. Johnson	Chemistry
103	Physics	003	Dr. White	Physics

Professor ID	Professor Name	Department
001	Dr. Smith	Biology
002	Dr. Johnson	Chemistry
003	Dr. White	Physics

By breaking down the original table into two separate tables, one collecting course-related details and the other focusing only on professors and departments, we follow 2NF. This separation resolves the issue of partial dependency, ensuring each table’s columns are functionally dependent on the primary keys.

Get 100% Hike!

Master Most in Demand Skills Now!

3. Third Normal Form (3NF)

Imagine a table that stores information about customers and their orders:

Order ID	Customer Name	Customer Phone	Product ID	Product Name	Product Category
001	John	123-456-7890	101	Laptop	Electronics
002	Sarah	987-654-3210	102	Headphones	Electronics
003	John	123-456-7890	103	Shirt	Clothing

At first glance, this table seems fine, but it doesn’t meet the 3NF criteria due to transitive dependency. The “Product Category” is dependent not only on the “Product ID” but also indirectly on the “Customer Name” and “Customer Phone.” To align with 3NF, let’s restructure this:

Order ID	Customer ID	Product ID
001	001	101
002	002	102
003	001	103

Customer ID	Customer Name	Customer Phone
001	John	123-456-7890
002	Sarah	987-654-3210

Product ID	Product Name	Product Category
101	Laptop	Electronics
102	Headphones	Electronics
103	Shirt	Clothing

By splitting the original table into three separate tables: one for orders, one for customers, and another for products, so that the transitive dependency issue gets resolved. Each table now holds information based on its primary key, eliminating redundancy and ensuring data integrity.

4. Boyce-Codd Normal Form (BCNF)

Consider a table capturing information about employees and their projects:

Employee ID	Employee Name	Project ID	Project Name	Project Supervisor
001	John	101	Project A	Jane
002	Sarah	102	Project B	John
003	Michael	101	Project A	Jane

At first glance, this table seems reasonable, but it doesn’t comply with BCNF due to the dependency between non-prime attributes. Specifically, the “Project Supervisor” is dependent on the “Project ID” but also indirectly on the “Employee ID.” To align with BCNF, let’s restructure the table:

Employee ID	Employee Name
001	John
002	Sarah
003	Michael

Project ID	Project Name
101	Project A
102	Project B

Project ID	Project Supervisor
101	Jane
102	John

By dividing the initial table into three separate tables: employees, projects, and project supervisors, the BCNF requirements are met. Each table now contains information based on its primary key, eliminating any dependencies among non-prime attributes.

5. Fourth Normal Form (4NF)

Let’s imagine a table that holds information about employees and their projects:

Employee ID	Employee Name	Project	Hours Worked
001	John	A	20
002	Sarah	A	25
003	Michael	B	30
003	Michael	C	15

This table may seem okay, but it doesn’t quite meet the 4NF criteria due to a multi-valued dependency. The “Hours Worked” column isn’t just dependent on the employee or the project but it’s also dependent on the combination of both.

In order to follow 4NF, we can split the table into two:

Employee ID	Employee Name
001	John
002	Sarah
003	Michael

Project	Hours Worked
A	20
A	25
B	30
C	15

By separating the employee information from the project details, we resolve the multi-valued dependency issue. Each table now contains data that depends only on its primary key, avoiding redundancy and maintaining data integrity.

Advantages of Normalization

Normalization provides several benefits to database design and management. Following are some of the benefits of normalization:

1. Improved Data Consistency

Normalization helps to improve data consistency by eliminating data anomalies. This ensures that data is well-organized, making it easier to maintain and update.

2. Improved Data Security

Normalization helps to improve data security by reducing the amount of data that needs to be stored in a single table. This in turn reduces the risk of data loss or corruption.

3. Reduced Data Redundancy

Normalization helps to eliminate data redundancy by breaking down the data into small, manageable tables. This helps to reduce the amount of duplicate data stored in the database and improves the overall efficiency of the database.

4. Ease of Maintenance

Normalization makes it easier to maintain a database by reducing the amount of data that needs to be updated. Making changes to the database without affecting the other parts also becomes easy.

5. Ease of Querying

Normalization makes it easier to query data from a database by reducing the amount of data stored in a single table. This makes it easy to retrieve the data that is needed for a specific task.

Disadvantages of Normalizations

1. Complex Queries

The priorities of an organization can create significant challenges in writing and managing queries, given how information is organized across multiple tables.

2. Slower Performance

Scanning multiple tables to find necessary information in a database can significantly impact performance, potentially causing substantial slowdowns with large datasets.

3. Increased CPU and Memory Usage

Additional strain will be placed on the database due to the increased number of optimally connected tasks that require memory and processing power, with mid-range experiencing greater challenges during periods of heightened activity.

4. Difficult Data Retrieval

Tasks like creating reports require numerous queries and are time-intensive because of how information is combined across multiple tables.

5. Higher Maintenance Effort

It will no longer be as straightforward to change certain portions of information, now requiring increased complexity and time, as a change in one table will lead to changes in several dependent tables.

How to achieve Normalization in SQL?

Achieving Normalization in SQL involves following a set of rules and procedures to ensure that the data in a database is organized optimally. Below are the steps to achieve normalization in SQL:

1. Identify the Entities and Relationships

Start by identifying the main entities and relationships in your database and creating a rough table structure which reflects these dependencies.

2. Choose a Primary Key

Select a unique identifier for each table in your database, which will serve as the primary key.

3. Remove Repeating Groups

Look for any repeating groups in your tables, and remove them by creating separate tables for each group and establishing relationships between the new tables and the original table.

3. Eliminate Redundancy

Remove any redundant data in your tables by ensuring that each piece of data is stored only once.

5. Apply Normalization Rules

Apply the rules for the First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and the higher forms, such as BCNF or 4NF, to ensure that your tables are organized optimally and consistently.

6. Test and Refine

Test your database design by inserting and retrieving data, and refine the architecture as needed to ensure that your data is organized in a way that meets your requirements.

7. Document your Design

Document your database design, including the structure of each table, the relationships between tables, and any normalization rules that were applied.

What is denormalization in SQL?

As a database optimization approach, denormalization involves adding redundant data to one or more tables. In a relational database, this can assist us in avoiding expensive joins. Be aware that denormalization does not imply “not to normalize” or “reversing normalization.” Following normalization, this optimization method is used.

Denormalization is essentially the process of taking a normalised schema and turning it non-normalized. Designers utilise this technique to adjust system performance to accommodate time-sensitive tasks.

How is Normalization and Denormalization different from each other?

Both normalisation and denormalisation are database techniques, however they operate in opposition to one another. While denormalisation adds redundancy to normalised tables to improve functionality, reduce running time of database queries (such as join operations), and optimise for performance and query simplicity, one-side normalisation reduces or removes redundancy, meaning that there will be no duplicate data or entries in the same table and also optimises for data integrity and efficient storage.

We virtually always use aspects of both normalized and deformalized databases in a system that requires scalability, such as any large software corporation.

Conclusion

Normalization is an essential aspect of database design in SQL. By organizing data in a structured and consistent manner, normalization helps to improve data integrity and reduces the chances of data anomalies and inconsistencies. It is important to understand the various normalization forms and how to apply them to your data.

By taking the effort to understand and use normalization in SQL, you can make sure that your database is set up in a way that meets your requirements and serves as a strong foundation for your data-driven applications. Master SQL with Intellipaat’s Microsoft SQL Server Certification Training and boost your database skills!