Working with CSV and Excel Files in Pandas


Pandas provides robust functionality for working with CSV and Excel files, making it easier to load, analyze, and manipulate tabular data. This article covers how to read, write, and manipulate data from CSV and Excel files using Pandas.

Importing Pandas

Before you can use Pandas, you need to import the library:

    import pandas as pd
        

Working with CSV Files

CSV (Comma Separated Values) files are commonly used for storing tabular data. You can easily read and write CSV files using Pandas.

Reading a CSV File

You can use the read_csv() function to load a CSV file into a DataFrame:

    # Reading a CSV file
    df = pd.read_csv("data.csv")
    print(df)
        

If the CSV file is located at a specific path or URL, you can specify the full path:

    # Reading a CSV file from a path
    df = pd.read_csv("path/to/your/data.csv")
    print(df)
        

Writing to a CSV File

To write a DataFrame to a CSV file, use the to_csv() function:

    # Writing DataFrame to a CSV file
    df.to_csv("output.csv", index=False)  # index=False to avoid writing row numbers
        

Handling CSV Files with Different Delimiters

CSV files may use different delimiters, such as semicolons or tabs. You can specify the delimiter using the sep parameter:

    # Reading a CSV with a different delimiter
    df = pd.read_csv("data.csv", sep=";")
    print(df)
        

Working with Excel Files

Pandas provides excellent support for reading and writing Excel files with the help of the openpyxl library for .xlsx files.

Reading an Excel File

Use the read_excel() function to load an Excel file into a DataFrame:

    # Reading an Excel file
    df = pd.read_excel("data.xlsx")
    print(df)
        

If the Excel file has multiple sheets, you can specify the sheet name:

    # Reading a specific sheet by name
    df = pd.read_excel("data.xlsx", sheet_name="Sheet1")
    print(df)
        

You can also load all sheets into a dictionary of DataFrames:

    # Reading all sheets from an Excel file
    df_dict = pd.read_excel("data.xlsx", sheet_name=None)
    for sheet_name, df in df_dict.items():
        print(f"Data from {sheet_name}:")
        print(df)
        

Writing to an Excel File

To write a DataFrame to an Excel file, use the to_excel() function:

    # Writing DataFrame to an Excel file
    df.to_excel("output.xlsx", index=False)
        

Handling Multiple Sheets in Excel

You can write multiple DataFrames to different sheets in an Excel file:

    # Writing multiple DataFrames to different sheets
    with pd.ExcelWriter("output.xlsx") as writer:
        df.to_excel(writer, sheet_name="Sheet1", index=False)
        df.to_excel(writer, sheet_name="Sheet2", index=False)
        

Conclusion

Working with CSV and Excel files in Pandas is straightforward, thanks to the powerful read_csv(), to_csv(), read_excel(), and to_excel() functions. These tools make it easy to load, analyze, and save tabular data, making Pandas an essential library for data manipulation and analysis in Python.





Advertisement