Window Functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LEAD, LAG) in SQL
Window functions in SQL allow you to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, which return a single result per group, window functions return a result for each row within a partition of the result set. They are particularly useful for tasks such as ranking, calculating running totals, and retrieving data from adjacent rows. In this article, we will explore some of the most commonly used window functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE, LEAD, and LAG.
What Are Window Functions?
Window functions perform calculations over a specific range of rows, known as a "window." These functions are similar to aggregate functions, but they return a value for each row instead of summarizing the data. The result is calculated by partitioning the data into groups (using the PARTITION BY
clause) and ordering it (using the ORDER BY
clause) within those partitions.
Types of Window Functions
1. ROW_NUMBER
The ROW_NUMBER()
function assigns a unique, sequential integer to rows within a partition of the result set, starting at 1 for the first row in each partition.
Syntax
SELECT column1, column2, ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column2) AS row_num FROM table_name;
Example
Let’s consider a sales table with the following structure:
CREATE TABLE sales ( sale_id INT, sale_date DATE, amount DECIMAL(10, 2) ); INSERT INTO sales (sale_id, sale_date, amount) VALUES (1, '2024-11-01', 150.00), (2, '2024-11-02', 200.00), (3, '2024-11-03', 250.00), (4, '2024-11-01', 300.00);
To assign a row number based on the sale amount, ordered by sale date, we can use:
SELECT sale_id, sale_date, amount, ROW_NUMBER() OVER (PARTITION BY sale_date ORDER BY amount DESC) AS row_num FROM sales;
2. RANK
The RANK()
function assigns a rank to each row within a partition of the result set. The ranking is based on the order defined by the ORDER BY
clause. If two rows have the same value, they will receive the same rank, and the next rank will be skipped.
Syntax
SELECT column1, column2, RANK() OVER (PARTITION BY column1 ORDER BY column2) AS rank FROM table_name;
Example
Using the sales
table, let’s rank the sales amounts by sale date:
SELECT sale_id, sale_date, amount, RANK() OVER (PARTITION BY sale_date ORDER BY amount DESC) AS rank FROM sales;
3. DENSE_RANK
The DENSE_RANK()
function is similar to RANK()
, but it does not skip ranks when there are ties. If two rows have the same value, they receive the same rank, and the next rank is the very next integer, without any gaps.
Syntax
SELECT column1, column2, DENSE_RANK() OVER (PARTITION BY column1 ORDER BY column2) AS dense_rank FROM table_name;
Example
Using the same sales table, let’s apply DENSE_RANK()
:
SELECT sale_id, sale_date, amount, DENSE_RANK() OVER (PARTITION BY sale_date ORDER BY amount DESC) AS dense_rank FROM sales;
4. NTILE
The NTILE()
function divides the result set into a specified number of groups (or "tiles") and assigns a tile number to each row. This is often used for creating quartiles, deciles, or other grouping systems.
Syntax
SELECT column1, column2, NTILE(n) OVER (PARTITION BY column1 ORDER BY column2) AS tile FROM table_name;
Example
Let’s divide the sales amounts into 2 groups (tiles) based on the amount:
SELECT sale_id, sale_date, amount, NTILE(2) OVER (PARTITION BY sale_date ORDER BY amount DESC) AS tile FROM sales;
5. LEAD
The LEAD()
function allows you to access the value of the next row in the result set without using a self-join. It is useful for comparing a row with the following row.
Syntax
SELECT column1, column2, LEAD(column2) OVER (PARTITION BY column1 ORDER BY column2) AS next_value FROM table_name;
Example
Using the sales
table, we can find the next sale amount for each sale:
SELECT sale_id, sale_date, amount, LEAD(amount) OVER (ORDER BY sale_date) AS next_sale_amount FROM sales;
6. LAG
The LAG()
function is similar to LEAD()
, but instead of accessing the next row, it allows you to access the previous row's value. It is useful for comparing the current row with the preceding row.
Syntax
SELECT column1, column2, LAG(column2) OVER (PARTITION BY column1 ORDER BY column2) AS prev_value FROM table_name;
Example
Using the sales
table, we can find the previous sale amount for each sale:
SELECT sale_id, sale_date, amount, LAG(amount) OVER (ORDER BY sale_date) AS prev_sale_amount FROM sales;
Conclusion
Window functions are a powerful feature in SQL that allow you to perform calculations across a set of rows related to the current row. Functions like ROW_NUMBER
, RANK
, DENSE_RANK
, NTILE
, LEAD
, and LAG
help in ranking, ordering, and comparing rows efficiently without needing complex joins or subqueries. These functions can be used in various scenarios, such as ranking sales data, calculating running totals, or comparing values in adjacent rows.