How to Select the First Row in Each GROUP BY Group in SQL?

How to Select the First Row in Each GROUP BY Group in SQL?

Answer: To select the first row in each Group By group you can use the aggregate function like min().

Selecting the first row in each Group By group allows you to find the earliest dataset and will let you select the first row of the specific column in each group. In this blog let’s explore the different approaches in SQL for selecting the first row in each GROUP By group.

Table of Contents:

Methods to Select the First Row in each Group By group in SQL

First, let’s create a dataset to select the first row in each GroupBy group, and use this table as an example for all the methods.

Example:

--Creation of the students table:
 CREATE TABLE students (
    id INT PRIMARY KEY,
    student_name VARCHAR(50),
    sport VARCHAR(50),
    enrollment_date DATE
);
CREATE TABLE students (
    id INT,
    student_name VARCHAR(100),
    sport VARCHAR(50),
    enrollment_date DATE
);
INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Chahar', 'Basketball', '2023-01-20'),
(2, 'Deepak', 'Basketball', '2023-03-05'),
(3, 'Arun', 'Football', '2023-01-10'),
(4, 'Bhuvan', 'Football', '2023-02-15'),
(5, 'Ram', 'Kabaddi', '2023-02-01');
SELECT * FROM students;

Output:

Method 1: Using aggregate function min() in SQL 

The min() aggregate function in SQL retrieves the minimum value from the group.

Example:

SELECT s.id, s.student_name, s.sport, s.enrollment_date
FROM students s
JOIN (
    SELECT sport, MIN(enrollment_date) AS first_enrollment
    FROM students
    GROUP BY sport
) t ON s.sport = t.sport AND s.enrollment_date = t.first_enrollment;

Output:

Explanation: The MIN(enrollment_date) AS first_enrollment fetched the student data who enrolled first in sports.

Method 2: The FETCH command with DISTINCT ON in SQL

The FETCH command with DISTINCT ON will retrieve the first row from the Group BY group table sorted data. The Fetch with DISTINCT ON will effectively work on Postgresql. 

Example:

SELECT DISTINCT ON (sport) id, student_name, sport, enrollment_date
FROM students
ORDER BY sport, enrollment_date ASC
FETCH FIRST 1 ROW ONLY;

Output:

Explanation: The DISTINCT ON (sport) ID fetches the data of the sport that was enrolled first, here it is basketball.

Method 3: Using JOIN with a subquery that ranks students in SQL

The JOIN with subquery in SQL involves assigning the values to each row based on a specific condition specified by the user. It will effectively work on Postgresql. 

Example:

-- Query to rank students by enrollment date and join to find the first enrollment in each sport
SELECT s.id, s.student_name, s.sport, s.enrollment_date
FROM students s
JOIN (
    SELECT 
        id, 
        sport, 
        ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date) AS rank
    FROM students
) ranked_students ON s.id = ranked_students.id
WHERE ranked_students.rank = 1;

Output:

Explanation: This (PARTITION BY sport ORDER BY enrollment_date ASC) subquery fetched the data of students who ranked first in each group based on their enrollment date. 

Method 4: Using ROW_NUMBERS in SQL 

The ROW_NUMBER() in SQL assigns a unique row number to each row in the group after sorting. It will effectively work on Postgresql

Example:

WITH RankedStudents AS (
    SELECT id, student_name, sport, enrollment_date,
           ROW_NUMBER() OVER (ORDER BY enrollment_date ASC) AS row_num
    FROM students
)
SELECT id, student_name, sport, enrollment_date
FROM RankedStudents
WHERE row_num = 1;

Output:

Explanation: 

The data of students is fetched based on the condition “rank.” The student who enrolled first is ranked first. 

Using ROW_NUMBERS with Partition By command in SQL

The ROW_NUMBER with Partition By will partition the group into smaller datasets and then fetch the first row from each group. It will effectively work on Postgresql

Example:

WITH RankedStudents AS (
    SELECT id, student_name, sport, enrollment_date,
        ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS row_num
    FROM students
)
SELECT id, student_name, sport, enrollment_date
FROM RankedStudents
WHERE row_num = 1;

Output: 

Explanation: This partition by command will fetch the first row of each group; here Chahar, Arun, and Ram were the first ones to enroll in their respective sports. 

Method 5: Using FILTER command with min() in SQL

The FILTER command in SQL uses aggregate functions like min() to select the first row in each Group by group in SQL. This will calculate the values based on the condition given. It will effectively work on Postgresql

Example:

SELECT
    sport,
    MIN(student_name) FILTER (
        WHERE enrollment_date = (SELECT MIN(enrollment_date) 
                                 FROM students s2 
                                 WHERE s2.sport = s1.sport)
    ) AS first_student,
    MIN(enrollment_date) AS first_enrollment
FROM students s1
GROUP BY sport;

Output:

Explanation: Using the FILTER WHERE enrollment_date, the data of the students who enrolled in the sport first is fetched.

What should you do when handling ties in the dataset? 

Using the command KEEP (DENSE_RANK FIRST), this does not need any subqueries like the ROW_NUMBER command. It is very useful when we don’t use the subqueries to fetch the first row. This method will work effectively in the Oracle SQL database. 

Example:

-- 1. Create the 'students' table
CREATE TABLE students (
    id INT,
    student_name VARCHAR2(100),
    sport VARCHAR2(50),
    enrollment_date DATE
);
-- 2. Insert the data into the 'students' table
INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Chahar', 'Basketball', TO_DATE('2023-01-20', 'YYYY-MM-DD')),
(2, 'Deepak', 'Basketball', TO_DATE('2023-03-05', 'YYYY-MM-DD')),
(3, 'Arun', 'Football', TO_DATE('2023-01-10', 'YYYY-MM-DD')),
(4, 'Bhuvan', 'Football', TO_DATE('2023-02-15', 'YYYY-MM-DD')),
(5, 'Ram', 'Kabaddi', TO_DATE('2023-02-01', 'YYYY-MM-DD'));

Output:

SELECT 
    sport, 
    student_name, 
    TO_CHAR(enrollment_date, 'DD-MON-YYYY') AS enrollment_date
FROM (
    SELECT 
           sport, 
           student_name, 
           enrollment_date,
           DENSE_RANK() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS rank
    FROM students
)
WHERE rank = 1;

Explanation: This  KEEP (DENSE_RANK FIRST) method will fetch the data of students who enrolled first in each sport. It compared the students of the same sports and gave the output of the one who enrolled first. It created a separate table for Kabbadi because it only has one student in it.

Alternative method using PL/SQL

If the dataset is huge, PL/SQL can fetch the first row in each GroupBy group efficiently. If you need to fetch the first row from multiple rows, Partition By, ORDER BY subqueries will be useful. It can handle large datasets efficiently. 

Example:

CREATE TABLE students (
    id INT PRIMARY KEY,
    student_name VARCHAR(50),
    sport VARCHAR(50),
    enrollment_date DATE
);

INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Arun', 'Football', TO_DATE('2023-01-10', 'YYYY-MM-DD')),
(2, 'Bhuvan', 'Football', TO_DATE('2023-02-15', 'YYYY-MM-DD')),
(3, 'Chahar', 'Basketball', TO_DATE('2023-01-20', 'YYYY-MM-DD')),
(4, 'Deepak', 'Basketball', TO_DATE('2023-03-05', 'YYYY-MM-DD')),
(5, 'Ram', 'Kabaddi', TO_DATE('2023-02-01', 'YYYY-MM-DD'));

SET SERVEROUTPUT ON;
DECLARE
    CURSOR c_students IS 
        SELECT sport, student_name, enrollment_date
        FROM (
            SELECT sport, student_name, enrollment_date,
                   ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS row_num
            FROM students
        )
        WHERE row_num = 1;
BEGIN
    FOR student_rec IN c_students LOOP
        DBMS_OUTPUT.PUT_LINE(student_rec.sport || ': ' || student_rec.student_name);
    END LOOP;
END;

Output:

Explanation: Using PL/SQL, we fetched the data of first-enrolled people from the student row along with matched data from the sports row. 

Real-world Use Cases of GROUP BY Group in SQL

1. Healthcare & Patient Records: This is used in hospitals for the patient’s medical record. Using these methods, we can fetch the first appointment date for each patient.

Example:

--Create the appointments table
CREATE TABLE appointments (
    patient_id INT,
    appointment_date DATE
);

--Insert data into the appointments table
INSERT INTO appointments (patient_id, appointment_date) VALUES
(101, TO_DATE('2023-03-15', 'YYYY-MM-DD')),
(102, TO_DATE('2023-02-10', 'YYYY-MM-DD')),
(103, TO_DATE('2023-04-20', 'YYYY-MM-DD'));

--Query to display the patient_id and the first (earliest) appointment date
SELECT patient_id, appointment_date AS first_appointment
FROM appointments
WHERE appointment_date = (SELECT MIN(appointment_date) FROM appointments);

Output: 

Explanation: This will be useful while getting the data of people who got an appointment first.     MIN(appointment_date) AS first_appointment  This query gets the first-person data from the appointment row. This works efficiently in the PL/SQL server.

2. In Banking & Finance: The query will let you display the first transaction details, followed by other transactions. 

Example:

CREATE TABLE transactions (
    account_id INT,
    transaction_date DATE
);

--Insert data into the transactions table
INSERT INTO transactions (account_id, transaction_date) VALUES
(1001, TO_DATE('2023-05-12', 'YYYY-MM-DD')),
(1002, TO_DATE('2023-03-08', 'YYYY-MM-DD')),
(1003, TO_DATE('2023-06-25', 'YYYY-MM-DD'));

-- Query to display the patient_id and the first (earliest) appointment date
SELECT account_id, transaction_date AS first_transaction
FROM transactions
WHERE transaction_date = (SELECT MIN(transaction_date) FROM transactions);

Output:

Explanation: This fetches the transaction data and filters it based on their transaction ID.

Conclusion

Selecting the first row in each GroupBy group is essential when you are working on large data sets or time-based records like appointments and transactions. There are methods to get the first row in each GroupBy group, like FETCH, MIN(), ROW_NUMBER, and FILTER with subqueries. These methods will help us filter or fetch the earliest database, which will help you save time. By understanding these methods, you can effectively select the first row in each GROUP BY group in SQL.

About the Author

Data Engineer

As a skilled Data Engineer, Sahil excels in SQL, NoSQL databases, Business Intelligence, and database management. He has contributed immensely to projects at companies like Bajaj and Tata. With a strong expertise in data engineering, he has architected numerous solutions for data pipelines, analytics, and software integration, driving insights and innovation.

business intelligence professional