Creating and Manipulating DataFrames in Pandas


Pandas is a powerful library in Python for data manipulation and analysis. One of its primary data structures is the DataFrame, which represents data in a tabular format. This article explores how to create and manipulate DataFrames in Pandas with examples.

Importing Pandas

Before using Pandas, you need to import the library:

    import pandas as pd
        

Creating DataFrames

You can create a DataFrame from various data sources such as dictionaries, lists, or CSV files.

Creating DataFrame from a Dictionary

    # Creating a DataFrame from a dictionary
    data = {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "City": ["New York", "Los Angeles", "Chicago"]
    }
    df = pd.DataFrame(data)
    print(df)
        

Creating DataFrame from a List of Lists

    # Creating a DataFrame from a list of lists
    data = [
        ["Alice", 25, "New York"],
        ["Bob", 30, "Los Angeles"],
        ["Charlie", 35, "Chicago"]
    ]
    df = pd.DataFrame(data, columns=["Name", "Age", "City"])
    print(df)
        

Creating DataFrame from a CSV File

    # Creating a DataFrame from a CSV file
    df = pd.read_csv("data.csv")
    print(df)
        

Basic DataFrame Operations

Once a DataFrame is created, you can perform various operations on it.

Accessing Columns

    # Accessing a single column
    print(df["Name"])

    # Accessing multiple columns
    print(df[["Name", "City"]])
        

Adding a New Column

    # Adding a new column
    df["Salary"] = [50000, 60000, 70000]
    print(df)
        

Deleting a Column

    # Deleting a column
    df = df.drop("Salary", axis=1)
    print(df)
        

Accessing Rows

    # Accessing a single row by index
    print(df.iloc[1])

    # Accessing multiple rows
    print(df.iloc[0:2])
        

Filtering Data

    # Filtering rows based on a condition
    filtered_df = df[df["Age"] > 25]
    print(filtered_df)
        

Updating Data

    # Updating a value in the DataFrame
    df.loc[1, "City"] = "San Francisco"
    print(df)
        

DataFrame Aggregation and Statistics

You can perform aggregation and statistical operations on DataFrames.

Summary Statistics

    # Summary statistics
    print(df.describe())
        

GroupBy Operations

    # Grouping data and calculating the mean
    grouped = df.groupby("City").mean()
    print(grouped)
        

Sorting Data

    # Sorting by a column
    sorted_df = df.sort_values("Age")
    print(sorted_df)
        

Conclusion

DataFrames are a fundamental feature of Pandas, allowing you to store and manipulate structured data easily. By mastering DataFrame creation and manipulation, you can perform efficient data analysis and preprocessing in Python.





Advertisement