SQL Subquery: A Step-By-Step comprehensive Guide

SQL subqueries are one of the most helpful tools for working with databases. It allows you to put one query inside another query to retrieve or even change the data in a much more efficient way. Regardless of which database system you are using, SQL subqueries are important when writing optimized SQL code.

In this article, we are going to learn about SQL subqueries and their applications with some examples.

Table of Contents

What is an SQL Subquery?

A SQL Subquery, also known as a nested query, is a query that is kept inside another SQL query. In Microsoft SQL Server, the subquery runs first, and its results are then used by the outer query to perform specific tasks like filtering, aggregating, or joining the data.

Example

SELECT EmployeeName
FROM Intellipaat_Employees
WHERE DepartmentID = (
    SELECT DepartmentID
    FROM Intellipaat_Departments
    WHERE DepartmentName = 'Sales'
);

In the above SQL Subquery example, it retrieves the `DepartmentID` for the “Sales” department.

The outer query extracts the names of employees who belong to the “Sales” department.

Example:

SELECT EmployeeName
FROM Intellipaat_Employees
WHERE Salary > (
    SELECT AVG(Salary)
    FROM Intellipaat_Employees
);

Types of Subquery in SQL

Let us analyze each type of subquery using the Intellipaat Employees table. As already said, subqueries can be categorized based on how many rows and columns they output.

1. Single-Row Subqueries

This type of subquery returns only one row, which is not too vast for a singular. Any single row can be returned, as is often handled with comparison operators, whether it is =, >, or <. These operators allow you to zone in on perfect conditions in larger queries.

Example:

SELECT EmployeeName
FROM Intellipaat_Employees
WHERE Salary > (
    SELECT AVG(Salary)
    FROM Intellipaat_Employees
);

The above query will retrieve those employees whose salary is greater than the average salary of all employees.

2. Multiple-Row Subqueries

This is a more expansive version; therefore, these subqueries return more than one row as they usually do. These subqueries are mostly used with operators like IN, ANY, ALL or EXISTS, as with these, the method of comparing a list of values or conditions is fast and simple.

Example with `IN`:

SELECT EmployeeName

FROM Intellipaat_Employees

WHERE DepartmentID IN (

    SELECT DepartmentID

    FROM Intellipaat_Departments

    WHERE Location = 'New York'

);

The above query will retrieve those employees who work in a department that are located in New York.

Example with `EXISTS`:

SELECT DepartmentName

FROM Intellipaat_Departments d

WHERE EXISTS (

    SELECT 1

    FROM Intellipaat_Employees e

    WHERE e.DepartmentID = d.DepartmentID

);

The above query will list departments that have at least one employee.

3. Correlated Subqueries

A correlated subquery is one in which ‘the nested query’ retrieves data from ‘the outer query’. Since the outer query retrieves the data, the subquery will have to execute for each row returned by the main query. Thus, correlated subqueries are executed multiple times and they rely on the outer query to stand.

Example:

SELECT EmployeeName

FROM Intellipaat_Employees e

WHERE Salary > (

    SELECT AVG(Salary)

    FROM Intellipaat_Employees

    WHERE DepartmentID = e.DepartmentID

);

Here, the subquery calculates the average salary for each department, and the outer query compares each employee’s salary to this average.

4. Scalar Subqueries

This type of subquery acts as a mini query that produces a better solution when used with SELECT statements or WHERE clauses to retrieve one row and one column, as a singular scalar value is sufficient for comparison or calculation.

Example:

SELECT DepartmentName,
       (SELECT COUNT(*)
        FROM Intellipaat_Employees
        WHERE Intellipaat_Employees.DepartmentID = Intellipaat_Departments.DepartmentID) AS EmployeeCount
FROM Intellipaat_Departments;

This query retrieves the name of each department along with the count of employees in that department.

5. Derived Tables

A derived table is a subquery that is found within the FROM clause of a query. It enables you to create a new table that is accessible only during the execution of the query, allowing for better data management and retrieval as if it were a normal table.

 Example:

SELECT AvgSalary

FROM (

    SELECT DepartmentID, AVG(Salary) AS AvgSalary

    FROM Intellipaat_Employees

    GROUP BY DepartmentID

) AS DepartmentAvg

WHERE AvgSalary > 60000;

In this example, the above subquery calculates and stores the average salary per department in a new table. The outer query then filters out all departments that have lower average salaries than 60,000.

Using Subqueries in UPDATE and DELETE Statements

For example, update the values based on a subquery

UPDATE Intellipaat_Employees
SET Salary = Salary * 1.10
WHERE DepartmentID IN (
    SELECT DepartmentID
    FROM Intellipaat_Departments
    WHERE DepartmentName = 'IT'
);

For example, delete the records using a subquery.

DELETE FROM Intellipaat_Employees
WHERE DepartmentID = (
    SELECT DepartmentID
    FROM Intellipaat_Departments
    WHERE DepartmentName = 'HR'
);

Subquery Advantages in SQL

Subqueries offer several advantages, making them indispensable for database developers:

  • Modularity: It allows you to break down a challenging problem into more manageable and smaller parts.
  • Dynamic Filtering: They allow you to filter information based on the outcome of another query.
  • Reusability: The same subquery can be executed in various locations according to their needs.
  • Flexibility: They aid in performing calculations and certain types of aggregates that are difficult to accomplish in a single query.

A subquery is a good method for finding, for example, employees earning above the average salary in the department. It also helps in SQL query optimization by increasing the effectiveness and maintainability of your SQL stats.

Best Practices for Writing Subqueries in SQL

While subqueries are powerful, they should be used judiciously to ensure optimal performance and readability. Here are some best practices:

1. Avoid Unnecessary Nesting: Use a JOIN instead of a subquery to allow for better improvement in efficiency and speed.

2. Use Correlated Subqueries Sparingly: Because correlated subqueries are executed for each row in the main query, they should be used very sparingly.

3. Index Optimization: Subquery columns should have indexes to enhance the performance of the query.

4. Test and Optimize: Use actual data to adjust and eliminate slow running queries.

Real-World Use Cases of SQL Subqueries

Relational subqueries are particularly important in data analysis and reporting tasks because they assist in simplifying problem statements into concise and precise queries. Here are two examples that explain the point.

1. Identifying the Best Employees Within Each Department

Think about the case where you are interested in finding out which employee is paid the most in each department from the company’s database. This can, however, be complicated without the use of subqueries, but it can be solved like this.

SELECT DepartmentName
FROM Intellipaat_Departments
WHERE DepartmentID IN (
    SELECT DepartmentID
    FROM Intellipaat_Employees
    GROUP BY DepartmentID
    HAVING COUNT(*) > 10
);

How It Works:

  • The subquery retrieves the highest payment for each department.
  • The outer query filters the result for payment equal to the retrieved highest payment.
  • This simplifies the process for HR or managers in locating prominent staff members in each department.

2. Using Subqueries for Reports and Dashboards

Subqueries are common in business intelligence (BI) systems in dashboards to show summarized data. For example, in a given dashboard that is supposed to represent all the departments with more than ten employees, it can be seen that subqueries perform this task flawlessly.

SELECT DepartmentName
FROM Intellipaat_Departments
WHERE DepartmentID IN (
    SELECT DepartmentID
    FROM Intellipaat_Employees
    GROUP BY DepartmentID
    HAVING COUNT(*) > 10
);

How It Works:

  • The subquery summed up all the employees per department using the clause HAVING COUNT(*) >10.
  • The outer query displays the selected department’s names.
  • This helps the users understand at a glance which departments have grown.

Why This Matters?

  • Improves Decision Making: Managers can assess the changing patterns of employee productivity and department effectiveness over time.
  • Enhances Dashboard Efficiency: The data can be adjusted upfront without the need for reporting.
  • Saves Time: A single subquery gets the specific needed information instead of writing complicated, lengthy sentences.

Conclusion

In SQL, subqueries are essential elements that make your queries more efficient and flexible. Once you learn the types of subqueries and how to use them, you will be able to solve complex problems related to databases very easily. Subqueries can be helpful for filtering data, making calculations, and updating records. By learning the different types of subqueries and how to utilize them, you will greatly improve your SQL skills.

About the Author

Data Engineer

As a skilled Data Engineer, Sahil excels in SQL, NoSQL databases, Business Intelligence, and database management. He has contributed immensely to projects at companies like Bajaj and Tata. With a strong expertise in data engineering, he has architected numerous solutions for data pipelines, analytics, and software integration, driving insights and innovation.

business intelligence professional