Reshaping Data: Pivoting with pivot_longer() and pivot_wider()


Overview

This tutorial demonstrates how to reshape data in R using the pivot_longer() and pivot_wider() functions from the tidyr package. These functions allow you to easily change data formats, which is useful when preparing data for analysis or visualization.

Prerequisites

Before starting, ensure you have R and the tidyr package installed. If you don't have tidyr installed, you can install it by running:

install.packages("tidyr")

Step 1: Loading the Data

For this tutorial, we'll use a simple data frame that contains sales data for different products over three months:

    # Load the tidyr package
    library(tidyr)
    
    # Create a sample dataset
    sales_data <- data.frame(
      product = c("Product A", "Product B", "Product C"),
      Jan = c(150, 200, 250),
      Feb = c(160, 210, 240),
      Mar = c(170, 220, 230)
    )
    
    # View the dataset
    sales_data
        

The dataset sales_data looks like this:

      product  Jan Feb Mar
    1 Product A  150  160  170
    2 Product B  200  210  220
    3 Product C  250  240  230
        

Step 2: Pivoting Data with pivot_longer()

The pivot_longer() function is used to convert data from wide format (multiple columns) to long format (key-value pairs). Let's reshape our data so that the months become a single column:

    # Pivot the data from wide to long format
    long_data <- sales_data %>%
      pivot_longer(cols = Jan:Mar, 
                   names_to = "month", 
                   values_to = "sales")
    
    # View the reshaped data
    long_data
        

After using pivot_longer(), the data will look like this:

    # A tibble: 9 × 3
      product   month sales
             
    1 Product A Jan      150
    2 Product A Feb      160
    3 Product A Mar      170
    4 Product B Jan      200
    5 Product B Feb      210
    6 Product B Mar      220
    7 Product C Jan      250
    8 Product C Feb      240
    9 Product C Mar      230
        

In this long format, the "month" column contains the month names, and the "sales" column contains the corresponding sales values for each product.

Step 3: Pivoting Data with pivot_wider()

The pivot_wider() function is used to convert data from long format to wide format (multiple columns). Let's reshape the long data back to the original wide format where each month has its own column:

    # Pivot the data from long to wide format
    wide_data <- long_data %>%
      pivot_wider(names_from = "month", 
                  values_from = "sales")
    
    # View the reshaped data
    wide_data
        

After using pivot_wider(), the data will look like this:

    # A tibble: 3 × 4
      product   Jan   Feb   Mar
           
    1 Product A   150   160   170
    2 Product B   200   210   220
    3 Product C   250   240   230
        

In this wide format, each month is now a separate column, and the sales values are arranged accordingly for each product.

Step 4: Combining Both Pivoting Functions

You can use both pivot_longer() and pivot_wider() functions together in a pipeline for data manipulation. For example, you can first pivot the data to long format and then pivot it back to wide format:

    # Combine both pivoting functions
    reshaped_data <- sales_data %>%
      pivot_longer(cols = Jan:Mar, 
                   names_to = "month", 
                   values_to = "sales") %>%
      pivot_wider(names_from = "month", 
                  values_from = "sales")
    
    # View the reshaped data
    reshaped_data
        

This will return the data back to its original wide format.

Conclusion

In this tutorial, we learned how to use pivot_longer() and pivot_wider() in R to reshape data between long and wide formats. These functions are extremely useful for preparing and analyzing data in various formats.





Advertisement