Packages: dplyr and tidyr in R Programming
1. Overview of dplyr
The dplyr
package is a powerful tool for data manipulation in R. It provides a set of functions that make it easy to manipulate and transform data frames.
Installing and Loading dplyr
# Install dplyr install.packages("dplyr") # Load dplyr library(dplyr)
Using dplyr for Data Manipulation
The main functions in dplyr include:
filter()
: Filter rows based on conditionsselect()
: Select specific columnsmutate()
: Add or modify columnsarrange()
: Sort rowssummarize()
: Summarize data
# Example data frame data <- data.frame( Name = c("Alice", "Bob", "Charlie", "David"), Age = c(25, 30, 35, 40), Score = c(85, 90, 95, 80) ) # Filtering rows where Age > 30 filtered_data <- filter(data, Age > 30) print(filtered_data) # Selecting specific columns selected_data <- select(data, Name, Score) print(selected_data) # Adding a new column mutated_data <- mutate(data, Passed = Score > 80) print(mutated_data) # Sorting rows by Age sorted_data <- arrange(data, desc(Age)) print(sorted_data) # Summarizing data summary_data <- summarize(data, Average_Score = mean(Score)) print(summary_data)
2. Overview of tidyr
The tidyr
package is used to clean and organize data. It provides functions to reshape data frames into tidy formats.
Installing and Loading tidyr
# Install tidyr install.packages("tidyr") # Load tidyr library(tidyr)
Using tidyr for Data Organization
The main functions in tidyr include:
gather()
: Convert wide data into long formatspread()
: Convert long data into wide formatunite()
: Combine multiple columns into oneseparate()
: Split one column into multiple columns
# Example data frame data <- data.frame( Name = c("Alice", "Bob"), Math = c(85, 90), Science = c(95, 80) ) # Converting wide data to long format long_data <- gather(data, Subject, Score, Math:Science) print(long_data) # Converting long data to wide format wide_data <- spread(long_data, Subject, Score) print(wide_data) # Combining columns united_data <- unite(data, Full_Name, Name, Math, sep = "-") print(united_data) # Splitting a column separated_data <- separate(united_data, Full_Name, into = c("Name", "Math"), sep = "-") print(separated_data)
Conclusion
This tutorial introduced the dplyr
and tidyr
packages for data manipulation in R. These packages provide powerful tools to clean, transform, and organize data efficiently.