Creating and Manipulating DataFrames in Pandas
Pandas is a powerful library in Python for data manipulation and analysis. One of its primary data structures is the DataFrame, which represents data in a tabular format. This article explores how to create and manipulate DataFrames in Pandas with examples.
Importing Pandas
Before using Pandas, you need to import the library:
import pandas as pd
Creating DataFrames
You can create a DataFrame from various data sources such as dictionaries, lists, or CSV files.
Creating DataFrame from a Dictionary
# Creating a DataFrame from a dictionary data = { "Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "City": ["New York", "Los Angeles", "Chicago"] } df = pd.DataFrame(data) print(df)
Creating DataFrame from a List of Lists
# Creating a DataFrame from a list of lists data = [ ["Alice", 25, "New York"], ["Bob", 30, "Los Angeles"], ["Charlie", 35, "Chicago"] ] df = pd.DataFrame(data, columns=["Name", "Age", "City"]) print(df)
Creating DataFrame from a CSV File
# Creating a DataFrame from a CSV file df = pd.read_csv("data.csv") print(df)
Basic DataFrame Operations
Once a DataFrame is created, you can perform various operations on it.
Accessing Columns
# Accessing a single column print(df["Name"]) # Accessing multiple columns print(df[["Name", "City"]])
Adding a New Column
# Adding a new column df["Salary"] = [50000, 60000, 70000] print(df)
Deleting a Column
# Deleting a column df = df.drop("Salary", axis=1) print(df)
Accessing Rows
# Accessing a single row by index print(df.iloc[1]) # Accessing multiple rows print(df.iloc[0:2])
Filtering Data
# Filtering rows based on a condition filtered_df = df[df["Age"] > 25] print(filtered_df)
Updating Data
# Updating a value in the DataFrame df.loc[1, "City"] = "San Francisco" print(df)
DataFrame Aggregation and Statistics
You can perform aggregation and statistical operations on DataFrames.
Summary Statistics
# Summary statistics print(df.describe())
GroupBy Operations
# Grouping data and calculating the mean grouped = df.groupby("City").mean() print(grouped)
Sorting Data
# Sorting by a column sorted_df = df.sort_values("Age") print(sorted_df)
Conclusion
DataFrames are a fundamental feature of Pandas, allowing you to store and manipulate structured data easily. By mastering DataFrame creation and manipulation, you can perform efficient data analysis and preprocessing in Python.