Answer: To select the first row in each Group By group you can use the aggregate function like min().
Selecting the first row in each Group By group allows you to find the earliest dataset and will let you select the first row of the specific column in each group. In this blog let’s explore the different approaches in SQL for selecting the first row in each GROUP By group.
Table of Contents:
Methods to Select the First Row in each Group By group in SQL
First, let’s create a dataset to select the first row in each GroupBy group, and use this table as an example for all the methods.
Example:
--Creation of the students table:
CREATE TABLE students (
id INT PRIMARY KEY,
student_name VARCHAR(50),
sport VARCHAR(50),
enrollment_date DATE
);
CREATE TABLE students (
id INT,
student_name VARCHAR(100),
sport VARCHAR(50),
enrollment_date DATE
);
INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Chahar', 'Basketball', '2023-01-20'),
(2, 'Deepak', 'Basketball', '2023-03-05'),
(3, 'Arun', 'Football', '2023-01-10'),
(4, 'Bhuvan', 'Football', '2023-02-15'),
(5, 'Ram', 'Kabaddi', '2023-02-01');
SELECT * FROM students;
Output:
Method 1: Using aggregate function min() in SQL
The min() aggregate function in SQL retrieves the minimum value from the group.
Example:
SELECT s.id, s.student_name, s.sport, s.enrollment_date
FROM students s
JOIN (
SELECT sport, MIN(enrollment_date) AS first_enrollment
FROM students
GROUP BY sport
) t ON s.sport = t.sport AND s.enrollment_date = t.first_enrollment;
Output:
Explanation: The MIN(enrollment_date) AS first_enrollment fetched the student data who enrolled first in sports.
Method 2: The FETCH command with DISTINCT ON in SQL
The FETCH command with DISTINCT ON will retrieve the first row from the Group BY group table sorted data. The Fetch with DISTINCT ON will effectively work on Postgresql.
Example:
SELECT DISTINCT ON (sport) id, student_name, sport, enrollment_date
FROM students
ORDER BY sport, enrollment_date ASC
FETCH FIRST 1 ROW ONLY;
Output:
Explanation: The DISTINCT ON (sport) ID fetches the data of the sport that was enrolled first, here it is basketball.
Method 3: Using JOIN with a subquery that ranks students in SQL
The JOIN with subquery in SQL involves assigning the values to each row based on a specific condition specified by the user. It will effectively work on Postgresql.
Example:
-- Query to rank students by enrollment date and join to find the first enrollment in each sport
SELECT s.id, s.student_name, s.sport, s.enrollment_date
FROM students s
JOIN (
SELECT
id,
sport,
ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date) AS rank
FROM students
) ranked_students ON s.id = ranked_students.id
WHERE ranked_students.rank = 1;
Output:
Explanation: This (PARTITION BY sport ORDER BY enrollment_date ASC) subquery fetched the data of students who ranked first in each group based on their enrollment date.
Method 4: Using ROW_NUMBERS in SQL
The ROW_NUMBER() in SQL assigns a unique row number to each row in the group after sorting. It will effectively work on Postgresql
Example:
WITH RankedStudents AS (
SELECT id, student_name, sport, enrollment_date,
ROW_NUMBER() OVER (ORDER BY enrollment_date ASC) AS row_num
FROM students
)
SELECT id, student_name, sport, enrollment_date
FROM RankedStudents
WHERE row_num = 1;
Output:
Explanation:
The data of students is fetched based on the condition “rank.” The student who enrolled first is ranked first.
Using ROW_NUMBERS with Partition By command in SQL
The ROW_NUMBER with Partition By will partition the group into smaller datasets and then fetch the first row from each group. It will effectively work on Postgresql
Example:
WITH RankedStudents AS (
SELECT id, student_name, sport, enrollment_date,
ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS row_num
FROM students
)
SELECT id, student_name, sport, enrollment_date
FROM RankedStudents
WHERE row_num = 1;
Output:
Explanation: This partition by command will fetch the first row of each group; here Chahar, Arun, and Ram were the first ones to enroll in their respective sports.
Method 5: Using FILTER command with min() in SQL
The FILTER command in SQL uses aggregate functions like min() to select the first row in each Group by group in SQL. This will calculate the values based on the condition given. It will effectively work on Postgresql
Example:
SELECT
sport,
MIN(student_name) FILTER (
WHERE enrollment_date = (SELECT MIN(enrollment_date)
FROM students s2
WHERE s2.sport = s1.sport)
) AS first_student,
MIN(enrollment_date) AS first_enrollment
FROM students s1
GROUP BY sport;
Output:
Explanation: Using the FILTER WHERE enrollment_date, the data of the students who enrolled in the sport first is fetched.
What should you do when handling ties in the dataset?
Using the command KEEP (DENSE_RANK FIRST), this does not need any subqueries like the ROW_NUMBER command. It is very useful when we don’t use the subqueries to fetch the first row. This method will work effectively in the Oracle SQL database.
Example:
-- 1. Create the 'students' table
CREATE TABLE students (
id INT,
student_name VARCHAR2(100),
sport VARCHAR2(50),
enrollment_date DATE
);
-- 2. Insert the data into the 'students' table
INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Chahar', 'Basketball', TO_DATE('2023-01-20', 'YYYY-MM-DD')),
(2, 'Deepak', 'Basketball', TO_DATE('2023-03-05', 'YYYY-MM-DD')),
(3, 'Arun', 'Football', TO_DATE('2023-01-10', 'YYYY-MM-DD')),
(4, 'Bhuvan', 'Football', TO_DATE('2023-02-15', 'YYYY-MM-DD')),
(5, 'Ram', 'Kabaddi', TO_DATE('2023-02-01', 'YYYY-MM-DD'));
Output:
SELECT
sport,
student_name,
TO_CHAR(enrollment_date, 'DD-MON-YYYY') AS enrollment_date
FROM (
SELECT
sport,
student_name,
enrollment_date,
DENSE_RANK() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS rank
FROM students
)
WHERE rank = 1;
Explanation: This KEEP (DENSE_RANK FIRST) method will fetch the data of students who enrolled first in each sport. It compared the students of the same sports and gave the output of the one who enrolled first. It created a separate table for Kabbadi because it only has one student in it.
Alternative method using PL/SQL
If the dataset is huge, PL/SQL can fetch the first row in each GroupBy group efficiently. If you need to fetch the first row from multiple rows, Partition By, ORDER BY subqueries will be useful. It can handle large datasets efficiently.
Example:
CREATE TABLE students (
id INT PRIMARY KEY,
student_name VARCHAR(50),
sport VARCHAR(50),
enrollment_date DATE
);
INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Arun', 'Football', TO_DATE('2023-01-10', 'YYYY-MM-DD')),
(2, 'Bhuvan', 'Football', TO_DATE('2023-02-15', 'YYYY-MM-DD')),
(3, 'Chahar', 'Basketball', TO_DATE('2023-01-20', 'YYYY-MM-DD')),
(4, 'Deepak', 'Basketball', TO_DATE('2023-03-05', 'YYYY-MM-DD')),
(5, 'Ram', 'Kabaddi', TO_DATE('2023-02-01', 'YYYY-MM-DD'));
SET SERVEROUTPUT ON;
DECLARE
CURSOR c_students IS
SELECT sport, student_name, enrollment_date
FROM (
SELECT sport, student_name, enrollment_date,
ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS row_num
FROM students
)
WHERE row_num = 1;
BEGIN
FOR student_rec IN c_students LOOP
DBMS_OUTPUT.PUT_LINE(student_rec.sport || ': ' || student_rec.student_name);
END LOOP;
END;
Output:
Explanation: Using PL/SQL, we fetched the data of first-enrolled people from the student row along with matched data from the sports row.
Real-world Use Cases of GROUP BY Group in SQL
1. Healthcare & Patient Records: This is used in hospitals for the patient’s medical record. Using these methods, we can fetch the first appointment date for each patient.
Example:
--Create the appointments table
CREATE TABLE appointments (
patient_id INT,
appointment_date DATE
);
--Insert data into the appointments table
INSERT INTO appointments (patient_id, appointment_date) VALUES
(101, TO_DATE('2023-03-15', 'YYYY-MM-DD')),
(102, TO_DATE('2023-02-10', 'YYYY-MM-DD')),
(103, TO_DATE('2023-04-20', 'YYYY-MM-DD'));
--Query to display the patient_id and the first (earliest) appointment date
SELECT patient_id, appointment_date AS first_appointment
FROM appointments
WHERE appointment_date = (SELECT MIN(appointment_date) FROM appointments);
Output:
Explanation: This will be useful while getting the data of people who got an appointment first. MIN(appointment_date) AS first_appointment This query gets the first-person data from the appointment row. This works efficiently in the PL/SQL server.
2. In Banking & Finance: The query will let you display the first transaction details, followed by other transactions.
Example:
CREATE TABLE transactions (
account_id INT,
transaction_date DATE
);
--Insert data into the transactions table
INSERT INTO transactions (account_id, transaction_date) VALUES
(1001, TO_DATE('2023-05-12', 'YYYY-MM-DD')),
(1002, TO_DATE('2023-03-08', 'YYYY-MM-DD')),
(1003, TO_DATE('2023-06-25', 'YYYY-MM-DD'));
-- Query to display the patient_id and the first (earliest) appointment date
SELECT account_id, transaction_date AS first_transaction
FROM transactions
WHERE transaction_date = (SELECT MIN(transaction_date) FROM transactions);
Output:
data:image/s3,"s3://crabby-images/80cb8/80cb82bb7f8debaea8a66f3e95711de93eaf8a95" alt=""
Explanation: This fetches the transaction data and filters it based on their transaction ID.
Conclusion
Selecting the first row in each GroupBy group is essential when you are working on large data sets or time-based records like appointments and transactions. There are methods to get the first row in each GroupBy group, like FETCH, MIN(), ROW_NUMBER, and FILTER with subqueries. These methods will help us filter or fetch the earliest database, which will help you save time. By understanding these methods, you can effectively select the first row in each GROUP BY group in SQL.