Filtering and Selecting in R Programming


1. Introduction

Filtering and selecting are essential operations in data manipulation. In R, the filter() and select() functions from the dplyr package, combined with logical conditions, make these tasks straightforward and efficient.

2. Installing and Loading dplyr

    # Install dplyr
    install.packages("dplyr")
    
    # Load dplyr
    library(dplyr)
        

3. Example Data Frame

Let us start with an example data frame:

    data <- data.frame(
      Name = c("Alice", "Bob", "Charlie", "David"),
      Age = c(25, 30, 35, 40),
      Score = c(85, 90, 95, 80)
    )
    
    # Display the data frame
    print(data)
        

4. Filtering Rows with filter()

The filter() function is used to filter rows based on conditions.

Filtering with a Single Condition

    # Filter rows where Age > 30
    filtered_data <- filter(data, Age > 30)
    print(filtered_data)
        

Filtering with Multiple Conditions

    # Filter rows where Age > 30 and Score > 80
    filtered_data <- filter(data, Age > 30, Score > 80)
    print(filtered_data)
        

Filtering with OR Condition

    # Filter rows where Age > 30 or Score < 90
    filtered_data <- filter(data, Age > 30 | Score < 90)
    print(filtered_data)
        

5. Selecting Columns with select()

The select() function is used to select specific columns from a data frame.

Selecting Specific Columns

    # Select the Name and Score columns
    selected_data <- select(data, Name, Score)
    print(selected_data)
        

Excluding Specific Columns

    # Exclude the Age column
    excluded_data <- select(data, -Age)
    print(excluded_data)
        

Selecting Columns by Pattern

    # Select columns that start with 'S'
    pattern_data <- select(data, starts_with("S"))
    print(pattern_data)
        

6. Combining filter() and select()

Both functions can be combined for efficient data manipulation.

    # Filter rows where Age > 30 and select Name and Score columns
    result <- data %>%
      filter(Age > 30) %>%
      select(Name, Score)
    print(result)
        

7. Conclusion

This tutorial demonstrated how to use the filter() and select() functions with logical conditions in R. These functions are powerful tools for efficient data manipulation.





Advertisement