Working with Structured Arrays in Numpy

Structured arrays in Numpy allow you to define arrays with multiple fields, where each field can have a different datatype. This feature is useful when you need to handle heterogeneous data, like a table or a record with multiple attributes.

1. Creating Structured Arrays

A structured array is created by specifying the dtype (data type) of each field. The fields are defined as a list of tuples where each tuple contains the field name and the data type.

Example: Creating a Structured Array

    import numpy as np
    
    # Define the dtype for the structured array
    dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f4')]
    
    # Create a structured array with 3 records
    data = np.array([('Alice', 25, 5.5), ('Bob', 30, 5.8), ('Charlie', 35, 6.1)], dtype=dtype)
    
    # Display the structured array
    print(data)
        

In this example, the structured array contains three fields: name (string of length 10), age (integer), and height (float). We then create an array with 3 records, each containing values for these fields.

Result

    [('Alice', 25,  5.5) ('Bob', 30,  5.8) ('Charlie', 35,  6.1)]
        

2. Accessing Fields in a Structured Array

Once a structured array is created, you can access the fields by their names. Fields are accessed using dot notation, similar to accessing attributes of a class.

Example: Accessing Fields

    # Accessing the 'name' field
    names = data['name']
    print("Names:", names)
    
    # Accessing the 'age' field
    ages = data['age']
    print("Ages:", ages)
    
    # Accessing the 'height' field
    heights = data['height']
    print("Heights:", heights)
        

In this example, we access the name, age, and height fields from the structured array.

Result

    Names: ['Alice' 'Bob' 'Charlie']
    Ages: [25 30 35]
    Heights: [5.5 5.8 6.1]
        

3. Modifying Fields in a Structured Array

Structured arrays allow you to modify individual fields directly using the field names. This can be useful when you need to update specific data in your array.

Example: Modifying Fields

    # Updating the age of Bob to 32
    data['age'][1] = 32
    print("Updated data:", data)
        

In this example, we update the age of Bob (at index 1) to 32. The other records remain unchanged.

Result

    [('Alice', 25,  5.5) ('Bob', 32,  5.8) ('Charlie', 35,  6.1)]
        

4. Adding New Fields to a Structured Array

It is possible to add new fields to a structured array after it has been created. You can use the np.lib.recfunctions.append_fields function to append new fields.

Example: Adding New Fields

    import numpy.lib.recfunctions as rfn
    
    # Adding a new field 'weight' with default values
    data = rfn.append_fields(data, 'weight', [60.5, 72.3, 80.2])
    
    # Displaying the updated structured array
    print("Updated data with new field 'weight':")
    print(data)
        

In this example, we use append_fields to add a new field weight to the structured array, assigning default values for each record.

Result

    Updated data with new field 'weight':
    [('Alice', 25,  5.5, 60.5) ('Bob', 32,  5.8, 72.3) ('Charlie', 35,  6.1, 80.2)]
        

5. Filtering Structured Arrays

You can filter structured arrays based on field values. This is helpful when you need to select records that satisfy certain conditions.

Example: Filtering Data

    # Filter rows where age is greater than 30
    filtered_data = data[data['age'] > 30]
    
    # Displaying the filtered data
    print("Filtered data (age > 30):")
    print(filtered_data)
        

In this example, we filter the structured array to include only records where the age is greater than 30.

Result

    Filtered data (age > 30):
    [('Bob', 32,  5.8, 72.3) ('Charlie', 35,  6.1, 80.2)]
        

6. Sorting Structured Arrays

You can sort structured arrays based on one or more fields. The np.sort function can be used to sort arrays by a specified field.

Example: Sorting Data by Age

    # Sorting the array by the 'age' field
    sorted_data = np.sort(data, order='age')
    
    # Displaying the sorted data
    print("Sorted data by age:")
    print(sorted_data)
        

In this example, we sort the structured array by the age field in ascending order.

Result

    Sorted data by age:
    [('Alice', 25,  5.5, 60.5) ('Bob', 32,  5.8, 72.3) ('Charlie', 35,  6.1, 80.2)]
        

Conclusion

Structured arrays in Numpy are powerful tools for handling heterogeneous data. They allow you to store data in a tabular form with different datatypes for each field. You can easily access, modify, filter, and sort the data, making structured arrays an essential feature when working with complex datasets.





Advertisement