How to Select the Top Row of Each Group in SQL?

How to Select the Top Row of Each Group in SQL?

Finding the most recent order per customer, the highest salary per department, etc., requires SQL Server to retrieve the top row of each group. ROW_NUMBER() and TOP WITH TIES are two of the several methods that SQL Server offers to accomplish this, and each is appropriate for a particular use case. In this blog, let’s explore the different approaches to getting each group’s top row in SQL Server.

Table of Contents:

Methods to Select the Top Row of Each Group in SQL

Depending on performance and use cases, SQL Server offers a variety of methods to effectively retrieve the top row per group, each with a unique advantage. ROW_NUMBER, TOP WITH TIES, and CROSS APPLY are the main strategies.

Before exploring the methods let us create an ORDERS table and insert some values into it. Let this ORDERS table be used as an example for the following methods.

--Create an ORDERS table
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    OrderAmount DECIMAL(10,2)
);
--Insert some data into it
INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderAmount) VALUES 
(1001, 101, '2024-02-10', 500.00),
(1002, 101, '2024-02-15', 750.00),  
(1003, 102, '2024-02-08', 400.00),
(1004, 102, '2024-02-12', 650.00),  
(1005, 103, '2024-02-05', 300.00),
(1006, 103, '2024-02-18', 900.00);  

--To display the ORDERS table
Select * from Orders;
Methods to Select the Top Row of Each Group in SQL

This is how the table looks like when it is created and inserted with the data.

Method 1: Using GROUP BY with Aggregate Function in SQL Server

We can group rows by a particular column and retrieve the maximum or most recent value from another column by combining GROUP BY with an aggregate function like MAX().

Syntax:

SELECT <group_column>, MAX(<sorting_column>) AS LatestValue
FROM <table_name>
GROUP BY <group_column>;

Example:

--To get the most recent order per customer
SELECT CustomerID, MAX(OrderDate) AS LatestOrderDate
FROM Orders
GROUP BY CustomerID;

Output:

Using GROUP BY with Aggregate Function in SQL Server Output

Explanation: ‘GROUP BY CustomerID’ groups the records by CustomerID, and MAX (OrderDate) gets the most recent OrderDate for each Customer.

Method 2: Using ROW_NUMBER Window Function in SQL Server

Based on a specific sorting column, the ROW_NUMBER function gives each row in a group a distinct ranking.

Syntax:

WITH RankedData AS (
    SELECT *, 
           ROW_NUMBER() OVER (PARTITION BY <group_column> ORDER BY <sorting_column> DESC) AS rn
    FROM <table_name>
)
SELECT * FROM RankedData WHERE rn = 1;

Example:

--To get the most recent order per customer
WITH RankedOrders AS (
    SELECT OrderID, CustomerID, OrderDate, OrderAmount,
           ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS rn
    FROM Orders
)
SELECT OrderID, CustomerID, OrderDate, OrderAmount
FROM RankedOrders WHERE rn = 1;

Output:

Window Function in SQL Server Output

Explanation: ROW_NUMBER() sorts the orders by OrderDate and assigns a unique ranking to each customer’s order. Only the most recent order for each customer is returned by the WHERE Clause.

Method 3: Using TOP WITH TIES in SQL Server

The TOP WITH TIES Clause guarantees that several records with the same ranking are included while permitting the retrieval of the first row per group.

Syntax:

SELECT TOP 1 WITH TIES <columns>
FROM <table_name>
ORDER BY ROW_NUMBER() OVER (PARTITION BY <group_column> ORDER BY <sorting_column> DESC);

Example:

--To get the lowest order amount per customer 
WITH RankedOrders AS (
    SELECT OrderID, CustomerID, OrderDate, OrderAmount,
           ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderAmount ASC) AS rn
    FROM Orders
)
SELECT TOP 1 WITH TIES OrderID, CustomerID, OrderDate, OrderAmount
FROM RankedOrders
ORDER BY rn;

Output:

Using TOP WITH TIES in SQL Server Output

Explanation: Here, each customer’s order is given a unique ranking by ROW_NUMBER(), which sorts them in ascending order by OrderDate. The WHERE Clause returns with the lowest amount per customer.

Method 4: Using CROSS APPLY in SQL Server

Each row from the main table can be joined with the top matching row from a subquery using the CROSS APPLY operator.

Syntax:

SELECT o.<columns>
FROM (SELECT DISTINCT <group_column> FROM <table_name>) c
CROSS APPLY (
    SELECT TOP 1 <columns>
    FROM <table_name> o
    WHERE o.<group_column> = c.<group_column>
    ORDER BY <sorting_column> DESC
) o;

Example:

SELECT o.OrderID, o.CustomerID, o.OrderDate, o.OrderAmount
FROM (SELECT DISTINCT CustomerID FROM Orders) c
CROSS APPLY (
    SELECT TOP 1 OrderID, OrderDate, OrderAmount, CustomerID
    FROM Orders o
    WHERE o.CustomerID = c.CustomerID
    ORDER BY OrderDate DESC
) o;

Output:

Using CROSS APPLY in SQL Server

Explanation: The most recent order for each customer is retrieved by the Cross App (top one order date DESC).

Method 5: Using OUTER APPLY in SQL Server

The OUTER APPLY operator returns every row from the left table, even if the right table contains no matching rows.

Syntax:

SELECT <columns>
FROM <main_table> m
OUTER APPLY (
    SELECT TOP 1 <columns>
    FROM <related_table> r
    WHERE r.<group_column> = m.<group_column>
    ORDER BY <sorting_column> DESC
) alias_name;

Example:

CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    CustomerName VARCHAR(100)
);
INSERT INTO Customers (CustomerID, CustomerName) VALUES
(101, 'Alice'),
(102, 'Bob'),
(103, 'Charlie'),
(104, 'David');  -- No orders for this customer
--To get the most recent order per customer (including customers with no orders)
SELECT c.CustomerID, o.OrderID, o.OrderDate, o.OrderAmount
FROM Customers c
OUTER APPLY (
    SELECT TOP 1 OrderID, OrderDate, OrderAmount
    FROM Orders o
    WHERE o.CustomerID = c.CustomerID
    ORDER BY OrderDate DESC
) o;

Output:

Using OUTER APPLY in SQL Server Output

Explanation: The most recent order for each customer, arranged by OrderDate DESC, can be found using the OUTER APPLY. CustomerID = 104 with no orders is also included..

Method 6: Using DISTINCT WITH FIRST_VALUE() in SQL Server

By using FIRST_VALUE() inside a function, the DISTINCT WITH FIRST_VALUE enables you to choose the first row of each group.

Syntax:

SELECT DISTINCT column1, 
       FIRST_VALUE(column2) OVER (PARTITION BY column1 ORDER BY column3 DESC) AS column2,
       FIRST_VALUE(column4) OVER (PARTITION BY column1 ORDER BY column3 DESC) AS column4
FROM table_name;

Example:

--To retrieve the latest order per customer
SELECT DISTINCT CustomerID, 
       FIRST_VALUE(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS OrderID,
       FIRST_VALUE(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS OrderDate,
       FIRST_VALUE(OrderAmount) OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS OrderAmount
FROM Orders;

Output:

Using DISTINCT WITH FIRST_VALUE() in SQL Server Output

Explanation: Using OrderDate DESC, FIRST_VALUE() chooses each customer’s most recent order. Only one row per customer will show up in the final result thanks to DISTINCT.

Method 7: Using DISTINCT WITH MAX() in SQL Server

The most recent record for each group is extracted using the MAX() aggregate function in this DISTINCT WITH MAX(). The original table is then joined to retrieve the pertinent row.

Syntax:

SELECT DISTINCT t1.column1, t1.column2, t1.column3, ...
FROM table_name t1
JOIN (
    SELECT column_group, MAX(column_to_sort) AS latest_value
    FROM table_name
    GROUP BY column_group
) t2 ON t1.column_group = t2.column_group AND t1.column_to_sort = t2.latest_value;

Example:

--Retrieving the latest order per customer
SELECT DISTINCT o.CustomerID, o.OrderID, o.OrderDate, o.OrderAmount
FROM Orders o
JOIN (
    SELECT CustomerID, MAX(OrderDate) AS LatestOrderDate
    FROM Orders
    GROUP BY CustomerID
) latest ON o.CustomerID = latest.CustomerID AND o.OrderDate = latest.LatestOrderDate;

Output:

Using DISTINCT WITH MAX() in SQL Server Output

Explanation: The subquery finds the most recent OrderDate for each CustomerID using MAX(OrderDate). The main query links the subquery to the original Orders table in order to retrieve the entries that match the most recent order date.

Method 8: Using PARTITION BY WITH RANK() in SQL Server

This method finds the top row for each group using the RANK() window function. RANK(), as opposed to ROW_NUMBER(), assigns the same rank to rows with identical values and skips numbers for succeeding ranks.

Syntax:

SELECT column1, column2, column3, ...
FROM (
    SELECT column1, column2, column3, 
           RANK() OVER (PARTITION BY column_group ORDER BY column_to_sort DESC) AS rnk
    FROM table_name
) ranked
WHERE rnk = 1;

Example:

--To retrieve the latest order per customer using RANK()
SELECT OrderID, CustomerID, OrderDate, OrderAmount
FROM (
    SELECT OrderID, CustomerID, OrderDate, OrderAmount,
           RANK() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS rnk
    FROM Orders
) ranked
WHERE rnk = 1;

Output:

Using PARTITION BY WITH RANK() in SQL Server Output

Explanation: The inner query uses RANK() to assign a rank to each CustomerID group. The outer query only filters rows where rnk = 1 to ensure that only the most recent order for each client is obtained.

Performance Comparison of Each Method

Method Use Case Pros Cons
Using GROUP BY with Aggregate functionBest for simple queries where only the latest record is neededWorks well on small datasets.It just returns the total values and does not return the complete row.
Using ROW_NUMBERIt works best when each group requires exactly one row.ensures that each group has a single row.Sorting has made it a little slower than GROUP BY.
Using TOP WITH TIESBest when multiple records have the same rankRecords for indexed datasets are simple to use and retrieve.If there are ties, it can return many rows for each group.
Using CROSS APPLYThe effective way to efficiently retrieve the most recent row per groupCan be faster than ROW_NUMBER in some casesMight not function properly with large datasets.
Using OUTER APPLYWhen there are missing values in the group, it can be applied.Helps in situations where not every organization has a record.Not as simple as CROSS APPLY
Using DISTINCT WITH FIRST_VALUE()Best for retrieving the first value per group without rankingWorks well only one value column is neededDoes not ensure that each group will have exactly one row.
Using DISTINCT WITH MAX()Though it returns more columns, it is comparable to GROUP BY.Effective at retrieving the most recent row of dataJOIN is necessary to obtain more columns.
Using RANK()Useful when there are several recent recordsHandles ties data properlyMay return more than one row per group

Real-world Examples

1. Customer Management System: To display each customer’s most recent purchase, an e-commerce system must fetch the most recent order.

Example:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    OrderAmount DECIMAL(10,2)
);
--Insert sample data 
INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderAmount) VALUES
(1001, 101, '2024-02-10', 500.00),
(1002, 101, '2024-02-15', 750.00),
(1003, 102, '2024-02-08', 400.00),
(1004, 102, '2024-02-12', 650.00),
(1005, 103, '2024-02-05', 300.00),
(1006, 103, '2024-02-18', 900.00);
--Retrieve the latest record order per customer
SELECT OrderID, CustomerID, OrderDate, OrderAmount
FROM (
    SELECT OrderID, CustomerID, OrderDate, OrderAmount,
           ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS rn
    FROM Orders
) ranked
WHERE rn = 1;

Output:

Real-world Examples Output

Explanation: ROW_NUMBER() assigns a unique rank to every order per customer ordered by OrderDate. The WHERE Clause returns only the latest order per customer.

2. HR System: The HR Department wants to track and obtain the most recent salary of each employee from their salary history

Example:

CREATE TABLE EmployeeSalaries (
    SalaryID INT PRIMARY KEY,
    EmployeeID INT,
    SalaryAmount DECIMAL(10,2),
    EffectiveDate DATE
);
--Insert sample data
INSERT INTO EmployeeSalaries (SalaryID, EmployeeID, SalaryAmount, EffectiveDate) VALUES
(1, 201, 60000, '2023-01-01'),
(2, 201, 65000, '2024-01-01'),
(3, 202, 55000, '2023-02-01'),
(4, 202, 60000, '2024-02-01'),
(5, 203, 70000, '2023-03-01'),
(6, 203, 75000, '2024-03-01');
--To retrieve the most recent salary per employee
SELECT SalaryID, EmployeeID, SalaryAmount, EffectiveDate
FROM (
    SELECT SalaryID, EmployeeID, SalaryAmount, EffectiveDate,
           DENSE_RANK() OVER (PARTITION BY EmployeeID ORDER BY EffectiveDate DESC) AS rnk
    FROM EmployeeSalaries
) ranked
WHERE rnk = 1;

Output:

HR System Output

Explanation: The DENSE_RANK()  just ranks without skipping the number of ties, and the outer query filters rnk = 1, which fetches the latest salary per employee.

Conclusion

In SQL Server, retrieving the top row for each group is a very basic yet vital query for data management. Different options are available: ROW_NUMBER(), TOP WITH TIES, CROSS APPLY, etc. The decision on which to choose should be based on the specific performance requirements and data needs. Selecting the appropriate one is very important in keeping the accuracy and efficiency in working with grouped data. In this blog, you might have learned useful techniques for getting the top row per group in SQL Server.

To learn more about SQL functions, check out this SQL course and also explore SQL Interview Questions prepared by industry experts.

How to Select the Top Row of Each Group in SQL? – FAQs

1. How can I get the latest record for each group in SQL Server?

You can use the ROW_NUMBER function with PARTITION BY to assign a rank to each group and filter only the first row

2. What is the fastest method to get the top row per group?

CROSS APPLY is often faster than any other Windows function when dealing with indexed tables.

3. Can I use GROUP BY to get the latest record per group?

Yes, but GROUP BY is most useful when used with MAX() to determine the what the latest date is, and then you have to join it back to retrieve other columns.

4. What happens if the top value of several rows is the same?

Using DENSE_RANK() instead of ROW_NUMBER() ensures that the tied values get the same rank.

5. What is the difference between TOP 1 and TOP WITH TIES?

Yes, TOP 1 returns only one row, while TOP WITH TIES includes all rows that share the same sorting value as the first row.

About the Author

Data Engineer

As a skilled Data Engineer, Sahil excels in SQL, NoSQL databases, Business Intelligence, and database management. He has contributed immensely to projects at companies like Bajaj and Tata. With a strong expertise in data engineering, he has architected numerous solutions for data pipelines, analytics, and software integration, driving insights and innovation.

business intelligence professional