Packages: dplyr and tidyr in R Programming


1. Overview of dplyr

The dplyr package is a powerful tool for data manipulation in R. It provides a set of functions that make it easy to manipulate and transform data frames.

Installing and Loading dplyr

    # Install dplyr
    install.packages("dplyr")
    
    # Load dplyr
    library(dplyr)
        

Using dplyr for Data Manipulation

The main functions in dplyr include:

  • filter(): Filter rows based on conditions
  • select(): Select specific columns
  • mutate(): Add or modify columns
  • arrange(): Sort rows
  • summarize(): Summarize data
    # Example data frame
    data <- data.frame(
      Name = c("Alice", "Bob", "Charlie", "David"),
      Age = c(25, 30, 35, 40),
      Score = c(85, 90, 95, 80)
    )
    
    # Filtering rows where Age > 30
    filtered_data <- filter(data, Age > 30)
    print(filtered_data)
    
    # Selecting specific columns
    selected_data <- select(data, Name, Score)
    print(selected_data)
    
    # Adding a new column
    mutated_data <- mutate(data, Passed = Score > 80)
    print(mutated_data)
    
    # Sorting rows by Age
    sorted_data <- arrange(data, desc(Age))
    print(sorted_data)
    
    # Summarizing data
    summary_data <- summarize(data, Average_Score = mean(Score))
    print(summary_data)
        

2. Overview of tidyr

The tidyr package is used to clean and organize data. It provides functions to reshape data frames into tidy formats.

Installing and Loading tidyr

    # Install tidyr
    install.packages("tidyr")
    
    # Load tidyr
    library(tidyr)
        

Using tidyr for Data Organization

The main functions in tidyr include:

  • gather(): Convert wide data into long format
  • spread(): Convert long data into wide format
  • unite(): Combine multiple columns into one
  • separate(): Split one column into multiple columns
    # Example data frame
    data <- data.frame(
      Name = c("Alice", "Bob"),
      Math = c(85, 90),
      Science = c(95, 80)
    )
    
    # Converting wide data to long format
    long_data <- gather(data, Subject, Score, Math:Science)
    print(long_data)
    
    # Converting long data to wide format
    wide_data <- spread(long_data, Subject, Score)
    print(wide_data)
    
    # Combining columns
    united_data <- unite(data, Full_Name, Name, Math, sep = "-")
    print(united_data)
    
    # Splitting a column
    separated_data <- separate(united_data, Full_Name, into = c("Name", "Math"), sep = "-")
    print(separated_data)
        

Conclusion

This tutorial introduced the dplyr and tidyr packages for data manipulation in R. These packages provide powerful tools to clean, transform, and organize data efficiently.





Advertisement