Window Functions in SQL

Window Functions in SQL

The SQL window function is a crucial feature for everyone because it makes it possible to calculate a specific set of rows while keeping the rows in the dataset intact. This is referred to as “windows”. These help in advanced data analytics and database management.

In this article, we will examine the definition of SQL window functions, their classifications, and real-life examples with reference to how they work.

Table of Contents

What are Window Functions in SQL Server?

SQL window functions let you compute values over a set of records associated with the current record without grouping them. Window functions are different from aggregate functions such as SUM() or AVG() because window functions do not change the total number of rows in the data, rather, they provide more insight into a single record from the data.

Window functions could calculate the running total, rank, or moving average with the full dataset intact.

Types of Window Functions

Window functions in SQL can be categorized into the following three types:

1. Ranking Functions

In SQL, ranking functions are used to assign a position to specific rows in a given order. They provide an order to the dataset without deleting any rows.

For the implementation, we are going to create an employee table with columns employee_id, name, department and salary

CREATE TABLE employees (

    employee_id INT PRIMARY KEY,

    name VARCHAR(100),

    department VARCHAR(50),

    salary DECIMAL(10,2)

);

INSERT INTO employees (employee_id, name, department, salary) VALUES
(1, 'Alice', 'HR', 60000),
(2, 'Bob', 'HR', 75000),
(3, 'Charlie', 'HR', 70000),
(4, 'David', 'Engineering', 90000),
(5, 'Eve', 'Engineering', 85000),
(6, 'Frank', 'Engineering', 95000),
(7, 'Grace', 'Marketing', 72000),
(8, 'Hank', 'Marketing', 78000),
(9, 'Ivy', 'Marketing', 75000);

1. ROW_NUMBER()

This function gives a unique value to each row in a specified order without any matching values. In this case, there are no ties and every row receives a different number, such as 1, 2, 3 and so forth.

SELECT employee_id, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

2. RANK()

This function uniquely ranks each row within a specified order. It also skips numbers in its ranking if there are multiple ties. For example, if there are two rows sharing the same rank value of 2, then the next row would be assigned rank 4.

 SELECT employee_id, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;

3. DENSE_RANK()

This function can be viewed as a variation of the RANK function. It does not skip ranks for duplicate values. For example, if there are two rows sharing the same rank of 2, the next row will receive a rank value of 3.

SELECT employee_id, department, salary,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;

4. NTILE()

This function partitions the data into nearly equal sections (as close as possible) after dividing it into n sections and assigns a group number to each row. For example, if the data is cut into 4 sections, every row shall be assigned a number from 1 to 4.

SELECT employee_id, department, salary,
       NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) AS quartile

2. Aggregate Functions

In SQL, aggregate functions are unique tools that summarize values from multiple rows into a single result. Aggregate functions work behind the scenes to provide you with totals, averages, counts, etc.

1. SUM()

It calculates the total value of all items in a column. Mostly used when a total has to be computed. For instance, calculate the total revenue of a company by summing all sales.

SELECT employee_id, department, salary,

       SUM(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS running_total

FROM employees;

2. AVG()

It finds the average value of items in a column. Effective when trying to understand patterns. For instance, find the average salary of employees in a department.

SELECT employee_id, department, salary,

       AVG(salary) OVER (PARTITION BY department) AS average_salary

FROM employees;

3. MIN() & MAX()

MIN() –  It gives the smallest value (least amount) found in the column. Useful in determining the smallest value. For instance, get the lowest product price in an inventory.

MAX() – It gives the largest value (most extreme) found in the column. Useful for determining the most valuable assets. For instance, find the highest marks scored in a class.

SELECT employee_id, department, salary,

       MIN(salary) OVER (PARTITION BY department) AS min_salary,

       MAX(salary) OVER (PARTITION BY department) AS max_salary

FROM employees;

3. Analytic Functions

Analytic functions in SQL are parallel to window functions. They perform calculations in a dataset on a defined set of rows, however, they do not alter the data as presented. These functions let one see data in a more intricate multivariate manner while keeping the dataset intact.

1. LEAD()

It lets you access the next row’s value in a column. Useful for comparing the current row with the next one.

Example: Compare the sales of today with tomorrow’s sales.

SELECT employee_id, department, salary,
       LEAD(salary, 1) OVER (PARTITION BY department ORDER BY salary DESC) AS next_salary
FROM employees;

2. LAG()

It lets you access the previous row’s value in a column. It is helpful for comparing the current row with the one before it. Example: Compare today’s sales with yesterday’s sales.

SELECT employee_id, department, salary,
       LAG(salary, 1) OVER (PARTITION BY department ORDER BY salary DESC) AS previous_salary
FROM employees;

3. FIRST_VALUE()

It returns the first value in a specified window of rows. Example: Get the first product price from a sorted list.

4. LAST_VALUE()

It returns the last value in a specified window of rows. Example: Get the last order date from a sorted list of orders.

SELECT employee_id, department, salary,
       FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_salary,
       LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY salary DESC 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_salary

Use Cases of Window Functions with Examples

1. Calculating Running Totals

Window functions assist you in the tracking of running totals where each row represents the total amount from the start until that row.

SELECT employee_id, sales_date, sales_amount,

       SUM(sales_amount) OVER (PARTITION BY employee_id ORDER BY sales_date) AS running_total

FROM sales;

2. Ranking Top Performers

With window functions, you are able to track the top n data entries like top sales or top employees without the underlying data getting lost.

SELECT employee_id, department, performance_score,

       RANK() OVER (PARTITION BY department ORDER BY performance_score DESC) AS rank

FROM employee_performance

WHERE rank <= 3;

3. Calculating Moving Averages

You are able to track Moving averages to get rid of short-term fluctuations and get an idea of underlying trends.

SELECT sales_date, sales_amount,

       AVG(sales_amount) OVER (ORDER BY sales_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg

FROM sales;

4. Comparing Current and Previous Sales

You may track the current row value against a previous row value instantly, as one can track today’s sales or yesterday’s.

SELECT sales_date, sales_amount,

       LAG(sales_amount, 1) OVER (ORDER BY sales_date) AS previous_sales,

       (sales_amount - LAG(sales_amount, 1) OVER (ORDER BY sales_date)) AS sales_growth

FROM sales;

5. Identifying First and Last Transactions

Window functions help you track the first value or last value in the series, like the first transaction in a series of transactions or the last one in the series.

SELECT customer_id, transaction_date, amount,

       FIRST_VALUE(transaction_date) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS first_transaction,

       LAST_VALUE(transaction_date) OVER (PARTITION BY customer_id ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_transaction

FROM transactions;

Performance Considerations of Window Functions

Like aggregation functions, window functions are often not optimized to their full potential. For example, you have to make sure that the column that is being used in PARTITION BY is indexed for fast access. If it’s not indexed, the system will have to go through each row, and this ends up making querying too slow. On top of that, performance may also be affected by sorting (ORDER BY), especially if the dataset is very large. Processing time can also increase when there are multiple window functions in one single query. A way to optimize these functions is to restrict the number of rows processed by applying ROWS BETWEEN instead of checking the entire dataset. If the queries are still slow, break them down into simpler pieces or use materialized views to run queries faster.

Conclusion

SQL makes use of window functions to help keep track of complex computations without losing the details of each individual record. It helps to perform complex tasks such as data ranking, row-wise running total, and inter-row comparisons with ease. These functions, ROW_NUMBER, RANK, LEAD, and LAG, make a powerful enhancement to your analysis. Thus, knowing how to use window functions extends your capacity to work with SQL and analyze data in more intelligent ways.

About the Author

Data Engineer

As a skilled Data Engineer, Sahil excels in SQL, NoSQL databases, Business Intelligence, and database management. He has contributed immensely to projects at companies like Bajaj and Tata. With a strong expertise in data engineering, he has architected numerous solutions for data pipelines, analytics, and software integration, driving insights and innovation.

business intelligence professional