SQL Self Join

SQL Self Join

Each table in a database typically holds a certain sort of data. As a result, in a database, there are sometimes hundreds of tables that are connected to one another. This means that SQL joins are required. Using the Link keyword, you may join various tables based on their common columns. A table can also be joined to itself, which is known as a SQL self join.

Get 100% Hike!

Master Most in Demand Skills Now!

Structured Query Language (SQL) is a programming language that is used to maintain relational databases and execute different operations on the data contained inside them. SQL, which was developed in the 1970s, is now widely used not just by database administrators but also by developers building data integration scripts and data analysts wanting to set up and run analytical queries.

In this blog, we will define a self-join, explain how it works, and when you should use it in your SQL queries.

If you are interested in SQL databases, Refer to this youtube video

Video Thumbnail

What is Self Join in SQL?

As the names indicate, the self join joins a table to itself. To utilize a self join, the table must include a column (let’s call it X) that operates as the primary key and another column (let’s call it Y) that holds data that can be matched up with the values in Column X. Columns X and Y do not have to have the same value for every given row, and Column Y may even be null.

Using table abbreviations, we can do Self Join. The table abbreviations prevent us from using the same table name again in a single sentence. It will produce an error if we use the same table name more than once in a single query without using table aliases.

Table abbreviations allow us to utilize the temporary name of the table that will be used in the query. The following explanation will help us know the table abbreviations.

Syntax and Query in Self Join SQL

SELECT e.employee_name, m.manager_name
FROM employees AS e
JOIN employees AS m
ON e.manager_id = m.employee_id;

The syntax shows the structure of joining a table to itself by aliasing the table name in each reference (t1 and t2). The ON clause matches a common column between the two instances.

Best Example of Self Join in SQL

When evaluating a hierarchy, the self join is utilized generally. A hierarchy, as we saw previously, assigns one row in a table to another row inside the same table. You may consider it to have parent and child rows.

Here’s an example of the members and their management. Let’s have look at the members table:

Each member has his/her unique id, which we call “Column X.” The column Teamlead Id holds the Id of the member or teamlead; this is our “Column Y.” If we use these columns to track the member-teamlead relationships in this table:

Id  Full NameSalaryTeamlead Id
1 Chirs Hemsworth 2000005
2 Tom Holland2500005
3Ben Affleck1200001
4Christian Bale150000 
5Gal Gadot3000004
  • The teamlead of the member Chirs Hemsworth is the member with Id 5, i.e., Gal Gadot.
  • The teamlead of the member Tom Holland is the member with Id 5, i.e., Gal Gadot.
  • The teamlead of the member Ben Affleck is the member with Id 1, i.e., Chirs Hemsworth.
  • The member Christian Bale does not have a teamlead; his teamleadId is null.
  • The teamlead of the member Gal Gadot is the member with Id 1, i.e., Christian Bale.

In hierarchies, this form of table arrangement is particularly common. Now, we can execute the following query to display the teamlead’s name for each member in the same row:

Code:

SELECT
    member. Id,
        member.FullName,
        member.teamleadId,
        teamlead.FullName as teamleadName
FROM members member
JOIN members teamlead
ON member.teamleadId = teamlead.Id

which returns the following result:

IdFullNameTeamlead IdTeamlead Name
1Chirs Hemsworth5Gal Galdot
2Tom Holland5Gal Galdot
3Ben Affleck1Chirs Hemsworth
5Gal Galdot4Christian Bale

The query collects the columns Id, FullName, and teamleadId from the member table. It also chooses the FullName column of the table changes are taking place teamlead and labels it as teamleadName. As an outcome, the ID and name of every member who has a teamlead are produced.

The members table is connected with itself in this query and performs two functions:

  • Role1: It stores employee information (alias employee).
  • Role2: It saves management data  (alias manager).

By doing so, we are treating the two copies of the Employees table as if they were two separate tables, one for the employees and one for the supervisors.

  • A self join may be used to generate pairs of rows based on the criteria in the ON clause. Let’s  Begin with a basic example that creates all possible pairings of coworkers. Consider the table below, partner:
IdFullNameAge
1Gal Galdot25
2Chirs Evans70
3Tom Holland35
4Jon Snow38

Assume we need to find all possible pairings of coworkers so that everyone gets an opportunity to meet everyone else at the business introducing the event. The SQL code is as follows:

Code:

SELECT
    playerteammate1.FullName as playerTeammate1FullName,
    playerteammate1.Age as playerTeammate1Age,
        playerteammate2.FullName as playerTeammate2FullName,
    playerteammate2.Age as playerTeammate2Age
FROM partnerColleagues playerteammate1
CROSS JOIN partnerColleagues playerteammate2
ON playerteammate1.FullName <> playerteammate2.FullName

The result:

Player 1FullNamePlayer 1AgePlayer 2FullNamePlayer 2Age
Chirs Evans70Gal Galdot25
Tom Holland35Gal Galdot25
Jon Snow38Gal Galdot25
Gal Galdot                25Chirs Evans70
Tom Holland35Chirs Evans70
Jon Snow38Chirs Evans70
Gal Galdot                25Tom Holland35
Chirs Evans70Tom Holland35
Jon Snow38Tom Holland35
Gal Galdot                25Jon Snow38
Chirs Evans70Jon Snow38
Tom Holland35Jon Snow38

Every single individual in the table is matched with the outcome. We don’t want anyone partnered with himself or herself, therefore we have the ON clause condition player1.FullName <> player2.FullName.Because there are four partners at this event, each participant will be partnered with three other partners.

  • In SQL, a self join can be used in connection with one or more separate tables. While this is not a clean self join, it is quite frequent in practice.

A real-world example is flight information in airports, which generates massive amounts of data each hour. Assume we want to look for a flight identification number as well as information about the departure and destination airports. Consider the tables below:

AirportIdCountryCity
1IndiaDelhi
2GermanyHamburg
3Canadaottawa
4FranceParis
5BangladeshDhaka

Table Flight:

FlightIdPlaneIdintialTimesStampfinalTimesStampintialAirportIdfinalAirportId
18891912020-01-14 13:00:002020-01-14 15:00:0034
28688212020-02-04 01:00:002020-02-04 16:00:0015
38367552020-02-15 09:00:002020-02-15 12:00:0054
48978672020-02-24 03:00:002020-02-24 19:00:0042
58345672020-03-25 10:00:002020-03-25 12:00:0021
68999052020-04-01 00:00:002020-04-01 17:00:0031
79898982020-01-01 02:00:002020-01-01 04:00:0012

Make mention that the Airport table’s AirportId column is the foreign key to the Flight table’s intialAirportId and finalAirportId columns. We will connect the Airport table to the Flight table twice, as follows:

  • Airport serves as the table with the beginning airports in the first JOIN.
  • Airport serves as the table with the destination airports in the second JOIN.

The inquiry is as follows:

flight.FlightId,
    	flight.PlaneId,
    	flight.intialAirportId,
    	intialairport.Country as intialAirportCountry,
    	intialairport.City as intialAirportCity,
    	flight.finalAirportId,
    	finalairport.Country as finalAirportCountry,
    	finalairport.City as finalAirportCity
FROM Flight flight
JOIN Airport intialairport
ON flight.intialAirportId = intialairport.AirportId
JOIN Airport finalairport
ON flight.finalAirportId = finalairport.AirportId

And the result of the query looks like this:

FlightIdPlaneIdintialAirportIdintialAirportCountryintialAirportcityfinalAirportIdfinalAirportCountryfinalAirportcity
18891913CanadaOttawa4FranceParis
28688211IndiaDelhi5BangladeshDhaka
38367555BangladeshDhaka4FranceParis
48978674FranceParis2GermanyHamburg
58345672GermanyHamburg1IndiaDelhi
68999053CanadaOttawa1IndiaDelhi
79898981IndiaDelhi2GermanyHamburg

Let us now review the outcome. The first three columns are from a simpleSELECT of the Flight table. The following two columns are from Airport, which serves as the intialing airport table; The rows are matched using in AirportId and intialAirportId from the Airport and Flight tables, respectively.

Become a Database Architect

Conclusion

As we’ve learned, the self join SQL is a significant subset of the join. We’ve seen examples of self join applications ranging from processing a hierarchy in a database to matching rows inside a table. We can join the same table numerous times, but each reference must have an alias that defines its purpose. These table abbreviations are used to retrieve columns from this single table based on the role that it is associated with.

Our SQL Courses Duration and Fees

Program Name
Start Date
Fees
Cohort starts on 14th Jan 2025
₹15,048
Cohort starts on 21st Jan 2025
₹15,048

About the Author

Data Engineer

As a skilled Data Engineer, Sahil excels in SQL, NoSQL databases, Business Intelligence, and database management. He has contributed immensely to projects at companies like Bajaj and Tata. With a strong expertise in data engineering, he has architected numerous solutions for data pipelines, analytics, and software integration, driving insights and innovation.