Creating and Manipulating DataFrames in Pandas
Pandas is a powerful library in Python for data manipulation and analysis. One of its primary data structures is the DataFrame, which represents data in a tabular format. This article explores how to create and manipulate DataFrames in Pandas with examples.
Importing Pandas
Before using Pandas, you need to import the library:
import pandas as pd
Creating DataFrames
You can create a DataFrame from various data sources such as dictionaries, lists, or CSV files.
Creating DataFrame from a Dictionary
# Creating a DataFrame from a dictionary
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)
print(df)
Creating DataFrame from a List of Lists
# Creating a DataFrame from a list of lists
data = [
["Alice", 25, "New York"],
["Bob", 30, "Los Angeles"],
["Charlie", 35, "Chicago"]
]
df = pd.DataFrame(data, columns=["Name", "Age", "City"])
print(df)
Creating DataFrame from a CSV File
# Creating a DataFrame from a CSV file
df = pd.read_csv("data.csv")
print(df)
Basic DataFrame Operations
Once a DataFrame is created, you can perform various operations on it.
Accessing Columns
# Accessing a single column
print(df["Name"])
# Accessing multiple columns
print(df[["Name", "City"]])
Adding a New Column
# Adding a new column
df["Salary"] = [50000, 60000, 70000]
print(df)
Deleting a Column
# Deleting a column
df = df.drop("Salary", axis=1)
print(df)
Accessing Rows
# Accessing a single row by index
print(df.iloc[1])
# Accessing multiple rows
print(df.iloc[0:2])
Filtering Data
# Filtering rows based on a condition
filtered_df = df[df["Age"] > 25]
print(filtered_df)
Updating Data
# Updating a value in the DataFrame
df.loc[1, "City"] = "San Francisco"
print(df)
DataFrame Aggregation and Statistics
You can perform aggregation and statistical operations on DataFrames.
Summary Statistics
# Summary statistics
print(df.describe())
GroupBy Operations
# Grouping data and calculating the mean
grouped = df.groupby("City").mean()
print(grouped)
Sorting Data
# Sorting by a column
sorted_df = df.sort_values("Age")
print(sorted_df)
Conclusion
DataFrames are a fundamental feature of Pandas, allowing you to store and manipulate structured data easily. By mastering DataFrame creation and manipulation, you can perform efficient data analysis and preprocessing in Python.