Each table in a database typically holds a certain sort of data. As a result, in a database, there are sometimes hundreds of tables that are connected to one another. This means that SQL joins are required. Using the Link keyword, you may join various tables based on their common columns. A table can also be joined to itself, which is known as a SQL self join.
Get 100% Hike!
Master Most in Demand Skills Now!
Structured Query Language (SQL) is a programming language that is used to maintain relational databases and execute different operations on the data contained inside them. SQL, which was developed in the 1970s, is now widely used not just by database administrators but also by developers building data integration scripts and data analysts wanting to set up and run analytical queries.
In this blog, we will define a self-join, explain how it works, and when you should use it in your SQL queries.
If you are interested in SQL databases, Refer to this youtube video
What is Self Join in SQL?
As the names indicate, the self join joins a table to itself. To utilize a self join, the table must include a column (let’s call it X) that operates as the primary key and another column (let’s call it Y) that holds data that can be matched up with the values in Column X. Columns X and Y do not have to have the same value for every given row, and Column Y may even be null.
Using table abbreviations, we can do Self Join. The table abbreviations prevent us from using the same table name again in a single sentence. It will produce an error if we use the same table name more than once in a single query without using table aliases.
Table abbreviations allow us to utilize the temporary name of the table that will be used in the query. The following explanation will help us know the table abbreviations.
Syntax and Query in Self Join SQL
SELECT e.employee_name, m.manager_name
FROM employees AS e
JOIN employees AS m
ON e.manager_id = m.employee_id;
The syntax shows the structure of joining a table to itself by aliasing the table name in each reference (t1 and t2). The ON clause matches a common column between the two instances.
Best Example of Self Join in SQL
When evaluating a hierarchy, the self join is utilized generally. A hierarchy, as we saw previously, assigns one row in a table to another row inside the same table. You may consider it to have parent and child rows.
Here’s an example of the members and their management. Let’s have look at the members table:
Each member has his/her unique id, which we call “Column X.” The column Teamlead Id holds the Id of the member or teamlead; this is our “Column Y.” If we use these columns to track the member-teamlead relationships in this table:
Id | Full Name | Salary | Teamlead Id |
1 | Chirs Hemsworth | 200000 | 5 |
2 | Tom Holland | 250000 | 5 |
3 | Ben Affleck | 120000 | 1 |
4 | Christian Bale | 150000 | |
5 | Gal Gadot | 300000 | 4 |
- The teamlead of the member Chirs Hemsworth is the member with Id 5, i.e., Gal Gadot.
- The teamlead of the member Tom Holland is the member with Id 5, i.e., Gal Gadot.
- The teamlead of the member Ben Affleck is the member with Id 1, i.e., Chirs Hemsworth.
- The member Christian Bale does not have a teamlead; his teamleadId is null.
- The teamlead of the member Gal Gadot is the member with Id 1, i.e., Christian Bale.
In hierarchies, this form of table arrangement is particularly common. Now, we can execute the following query to display the teamlead’s name for each member in the same row:
Code:
SELECT
member. Id,
member.FullName,
member.teamleadId,
teamlead.FullName as teamleadName
FROM members member
JOIN members teamlead
ON member.teamleadId = teamlead.Id
which returns the following result:
Id | FullName | Teamlead Id | Teamlead Name |
1 | Chirs Hemsworth | 5 | Gal Galdot |
2 | Tom Holland | 5 | Gal Galdot |
3 | Ben Affleck | 1 | Chirs Hemsworth |
5 | Gal Galdot | 4 | Christian Bale |
The query collects the columns Id, FullName, and teamleadId from the member table. It also chooses the FullName column of the table changes are taking place teamlead and labels it as teamleadName. As an outcome, the ID and name of every member who has a teamlead are produced.
The members table is connected with itself in this query and performs two functions:
- Role1: It stores employee information (alias employee).
- Role2: It saves management data (alias manager).
By doing so, we are treating the two copies of the Employees table as if they were two separate tables, one for the employees and one for the supervisors.
- A self join may be used to generate pairs of rows based on the criteria in the ON clause. Let’s Begin with a basic example that creates all possible pairings of coworkers. Consider the table below, partner:
Id | FullName | Age |
1 | Gal Galdot | 25 |
2 | Chirs Evans | 70 |
3 | Tom Holland | 35 |
4 | Jon Snow | 38 |
Assume we need to find all possible pairings of coworkers so that everyone gets an opportunity to meet everyone else at the business introducing the event. The SQL code is as follows:
Code:
SELECT
playerteammate1.FullName as playerTeammate1FullName,
playerteammate1.Age as playerTeammate1Age,
playerteammate2.FullName as playerTeammate2FullName,
playerteammate2.Age as playerTeammate2Age
FROM partnerColleagues playerteammate1
CROSS JOIN partnerColleagues playerteammate2
ON playerteammate1.FullName <> playerteammate2.FullName
The result:
Player 1FullName | Player 1Age | Player 2FullName | Player 2Age |
Chirs Evans | 70 | Gal Galdot | 25 |
Tom Holland | 35 | Gal Galdot | 25 |
Jon Snow | 38 | Gal Galdot | 25 |
Gal Galdot | 25 | Chirs Evans | 70 |
Tom Holland | 35 | Chirs Evans | 70 |
Jon Snow | 38 | Chirs Evans | 70 |
Gal Galdot | 25 | Tom Holland | 35 |
Chirs Evans | 70 | Tom Holland | 35 |
Jon Snow | 38 | Tom Holland | 35 |
Gal Galdot | 25 | Jon Snow | 38 |
Chirs Evans | 70 | Jon Snow | 38 |
Tom Holland | 35 | Jon Snow | 38 |
Every single individual in the table is matched with the outcome. We don’t want anyone partnered with himself or herself, therefore we have the ON clause condition player1.FullName <> player2.FullName.Because there are four partners at this event, each participant will be partnered with three other partners.
- In SQL, a self join can be used in connection with one or more separate tables. While this is not a clean self join, it is quite frequent in practice.
A real-world example is flight information in airports, which generates massive amounts of data each hour. Assume we want to look for a flight identification number as well as information about the departure and destination airports. Consider the tables below:
AirportId | Country | City |
1 | India | Delhi |
2 | Germany | Hamburg |
3 | Canada | ottawa |
4 | France | Paris |
5 | Bangladesh | Dhaka |
Table Flight:
FlightId | PlaneId | intialTimesStamp | finalTimesStamp | intialAirportId | finalAirportId |
1 | 889191 | 2020-01-14 13:00:00 | 2020-01-14 15:00:00 | 3 | 4 |
2 | 868821 | 2020-02-04 01:00:00 | 2020-02-04 16:00:00 | 1 | 5 |
3 | 836755 | 2020-02-15 09:00:00 | 2020-02-15 12:00:00 | 5 | 4 |
4 | 897867 | 2020-02-24 03:00:00 | 2020-02-24 19:00:00 | 4 | 2 |
5 | 834567 | 2020-03-25 10:00:00 | 2020-03-25 12:00:00 | 2 | 1 |
6 | 899905 | 2020-04-01 00:00:00 | 2020-04-01 17:00:00 | 3 | 1 |
7 | 989898 | 2020-01-01 02:00:00 | 2020-01-01 04:00:00 | 1 | 2 |
Make mention that the Airport table’s AirportId column is the foreign key to the Flight table’s intialAirportId and finalAirportId columns. We will connect the Airport table to the Flight table twice, as follows:
- Airport serves as the table with the beginning airports in the first JOIN.
- Airport serves as the table with the destination airports in the second JOIN.
The inquiry is as follows:
flight.FlightId,
flight.PlaneId,
flight.intialAirportId,
intialairport.Country as intialAirportCountry,
intialairport.City as intialAirportCity,
flight.finalAirportId,
finalairport.Country as finalAirportCountry,
finalairport.City as finalAirportCity
FROM Flight flight
JOIN Airport intialairport
ON flight.intialAirportId = intialairport.AirportId
JOIN Airport finalairport
ON flight.finalAirportId = finalairport.AirportId
And the result of the query looks like this:
FlightId | PlaneId | intialAirportId | intialAirportCountry | intialAirportcity | finalAirportId | finalAirportCountry | finalAirportcity |
1 | 889191 | 3 | Canada | Ottawa | 4 | France | Paris |
2 | 868821 | 1 | India | Delhi | 5 | Bangladesh | Dhaka |
3 | 836755 | 5 | Bangladesh | Dhaka | 4 | France | Paris |
4 | 897867 | 4 | France | Paris | 2 | Germany | Hamburg |
5 | 834567 | 2 | Germany | Hamburg | 1 | India | Delhi |
6 | 899905 | 3 | Canada | Ottawa | 1 | India | Delhi |
7 | 989898 | 1 | India | Delhi | 2 | Germany | Hamburg |
Let us now review the outcome. The first three columns are from a simpleSELECT of the Flight table. The following two columns are from Airport, which serves as the intialing airport table; The rows are matched using in AirportId and intialAirportId from the Airport and Flight tables, respectively.
Conclusion
As we’ve learned, the self join SQL is a significant subset of the join. We’ve seen examples of self join applications ranging from processing a hierarchy in a database to matching rows inside a table. We can join the same table numerous times, but each reference must have an alias that defines its purpose. These table abbreviations are used to retrieve columns from this single table based on the role that it is associated with.