Window Functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LEAD, LAG) in SQL


Window functions in SQL allow you to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, which return a single result per group, window functions return a result for each row within a partition of the result set. They are particularly useful for tasks such as ranking, calculating running totals, and retrieving data from adjacent rows. In this article, we will explore some of the most commonly used window functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE, LEAD, and LAG.

What Are Window Functions?

Window functions perform calculations over a specific range of rows, known as a "window." These functions are similar to aggregate functions, but they return a value for each row instead of summarizing the data. The result is calculated by partitioning the data into groups (using the PARTITION BY clause) and ordering it (using the ORDER BY clause) within those partitions.

Types of Window Functions

1. ROW_NUMBER

The ROW_NUMBER() function assigns a unique, sequential integer to rows within a partition of the result set, starting at 1 for the first row in each partition.

Syntax

        SELECT column1, column2, 
               ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column2) AS row_num
        FROM table_name;
    

Example

Let’s consider a sales table with the following structure:

        CREATE TABLE sales (
            sale_id INT,
            sale_date DATE,
            amount DECIMAL(10, 2)
        );
        
        INSERT INTO sales (sale_id, sale_date, amount)
        VALUES (1, '2024-11-01', 150.00),
               (2, '2024-11-02', 200.00),
               (3, '2024-11-03', 250.00),
               (4, '2024-11-01', 300.00);
    

To assign a row number based on the sale amount, ordered by sale date, we can use:

        SELECT sale_id, sale_date, amount,
               ROW_NUMBER() OVER (PARTITION BY sale_date ORDER BY amount DESC) AS row_num
        FROM sales;
    

2. RANK

The RANK() function assigns a rank to each row within a partition of the result set. The ranking is based on the order defined by the ORDER BY clause. If two rows have the same value, they will receive the same rank, and the next rank will be skipped.

Syntax

        SELECT column1, column2, 
               RANK() OVER (PARTITION BY column1 ORDER BY column2) AS rank
        FROM table_name;
    

Example

Using the sales table, let’s rank the sales amounts by sale date:

        SELECT sale_id, sale_date, amount,
               RANK() OVER (PARTITION BY sale_date ORDER BY amount DESC) AS rank
        FROM sales;
    

3. DENSE_RANK

The DENSE_RANK() function is similar to RANK(), but it does not skip ranks when there are ties. If two rows have the same value, they receive the same rank, and the next rank is the very next integer, without any gaps.

Syntax

        SELECT column1, column2, 
               DENSE_RANK() OVER (PARTITION BY column1 ORDER BY column2) AS dense_rank
        FROM table_name;
    

Example

Using the same sales table, let’s apply DENSE_RANK():

        SELECT sale_id, sale_date, amount,
               DENSE_RANK() OVER (PARTITION BY sale_date ORDER BY amount DESC) AS dense_rank
        FROM sales;
    

4. NTILE

The NTILE() function divides the result set into a specified number of groups (or "tiles") and assigns a tile number to each row. This is often used for creating quartiles, deciles, or other grouping systems.

Syntax

        SELECT column1, column2, 
               NTILE(n) OVER (PARTITION BY column1 ORDER BY column2) AS tile
        FROM table_name;
    

Example

Let’s divide the sales amounts into 2 groups (tiles) based on the amount:

        SELECT sale_id, sale_date, amount,
               NTILE(2) OVER (PARTITION BY sale_date ORDER BY amount DESC) AS tile
        FROM sales;
    

5. LEAD

The LEAD() function allows you to access the value of the next row in the result set without using a self-join. It is useful for comparing a row with the following row.

Syntax

        SELECT column1, column2, 
               LEAD(column2) OVER (PARTITION BY column1 ORDER BY column2) AS next_value
        FROM table_name;
    

Example

Using the sales table, we can find the next sale amount for each sale:

        SELECT sale_id, sale_date, amount,
               LEAD(amount) OVER (ORDER BY sale_date) AS next_sale_amount
        FROM sales;
    

6. LAG

The LAG() function is similar to LEAD(), but instead of accessing the next row, it allows you to access the previous row's value. It is useful for comparing the current row with the preceding row.

Syntax

        SELECT column1, column2, 
               LAG(column2) OVER (PARTITION BY column1 ORDER BY column2) AS prev_value
        FROM table_name;
    

Example

Using the sales table, we can find the previous sale amount for each sale:

        SELECT sale_id, sale_date, amount,
               LAG(amount) OVER (ORDER BY sale_date) AS prev_sale_amount
        FROM sales;
    

Conclusion

Window functions are a powerful feature in SQL that allow you to perform calculations across a set of rows related to the current row. Functions like ROW_NUMBER, RANK, DENSE_RANK, NTILE, LEAD, and LAG help in ranking, ordering, and comparing rows efficiently without needing complex joins or subqueries. These functions can be used in various scenarios, such as ranking sales data, calculating running totals, or comparing values in adjacent rows.





Advertisement