Reshaping Data: Pivoting with pivot_longer() and pivot_wider()
Overview
This tutorial demonstrates how to reshape data in R using the pivot_longer()
and pivot_wider()
functions from the tidyr
package. These functions allow you to easily change data formats, which is useful when preparing data for analysis or visualization.
Prerequisites
Before starting, ensure you have R and the tidyr
package installed. If you don't have tidyr
installed, you can install it by running:
install.packages("tidyr")
Step 1: Loading the Data
For this tutorial, we'll use a simple data frame that contains sales data for different products over three months:
# Load the tidyr package library(tidyr) # Create a sample dataset sales_data <- data.frame( product = c("Product A", "Product B", "Product C"), Jan = c(150, 200, 250), Feb = c(160, 210, 240), Mar = c(170, 220, 230) ) # View the dataset sales_data
The dataset sales_data
looks like this:
product Jan Feb Mar 1 Product A 150 160 170 2 Product B 200 210 220 3 Product C 250 240 230
Step 2: Pivoting Data with pivot_longer()
The pivot_longer()
function is used to convert data from wide format (multiple columns) to long format (key-value pairs). Let's reshape our data so that the months become a single column:
# Pivot the data from wide to long format long_data <- sales_data %>% pivot_longer(cols = Jan:Mar, names_to = "month", values_to = "sales") # View the reshaped data long_data
After using pivot_longer()
, the data will look like this:
# A tibble: 9 × 3 product month sales1 Product A Jan 150 2 Product A Feb 160 3 Product A Mar 170 4 Product B Jan 200 5 Product B Feb 210 6 Product B Mar 220 7 Product C Jan 250 8 Product C Feb 240 9 Product C Mar 230
In this long format, the "month" column contains the month names, and the "sales" column contains the corresponding sales values for each product.
Step 3: Pivoting Data with pivot_wider()
The pivot_wider()
function is used to convert data from long format to wide format (multiple columns). Let's reshape the long data back to the original wide format where each month has its own column:
# Pivot the data from long to wide format wide_data <- long_data %>% pivot_wider(names_from = "month", values_from = "sales") # View the reshaped data wide_data
After using pivot_wider()
, the data will look like this:
# A tibble: 3 × 4 product Jan Feb Mar1 Product A 150 160 170 2 Product B 200 210 220 3 Product C 250 240 230
In this wide format, each month is now a separate column, and the sales values are arranged accordingly for each product.
Step 4: Combining Both Pivoting Functions
You can use both pivot_longer()
and pivot_wider()
functions together in a pipeline for data manipulation. For example, you can first pivot the data to long format and then pivot it back to wide format:
# Combine both pivoting functions reshaped_data <- sales_data %>% pivot_longer(cols = Jan:Mar, names_to = "month", values_to = "sales") %>% pivot_wider(names_from = "month", values_from = "sales") # View the reshaped data reshaped_data
This will return the data back to its original wide format.
Conclusion
In this tutorial, we learned how to use pivot_longer()
and pivot_wider()
in R to reshape data between long and wide formats. These functions are extremely useful for preparing and analyzing data in various formats.