Pandas: How to Access a Column Using iterrows()
In Pandas, iterrows() is commonly used to iterate over the rows of a DataFrame as (index, Series) pairs. During iteration, you can access specific columns of the DataFrame by referencing them within the loop. In this article, we’ll show how to access a column in Pandas using iterrows() and provide examples for better understanding.
Accessing Columns with iterrows()
To access a specific column in each row during iteration, you can reference the column name from the row object, which is a Pandas Series. Here’s an example of how to access a column using iterrows().
Example: Accessing a Column Using iterrows()
Consider a DataFrame where we want to access the “Name” and “Age” columns using iterrows().
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 22],
'Gender': ['Male', 'Female', 'Male']
})
# Accessing columns using iterrows
for index, row in df.iterrows():
print(f"Name: {row['Name']}, Age: {row['Age']}")
Output:
Name: John, Age: 25
Name: Alice, Age: 30
Name: Bob, Age: 22
In this example, the loop iterates over each row, and within each iteration, we access the “Name” and “Age” columns using the column names inside the row object. The column values are printed for each row.
Why Use iterrows() to Access Columns?
iterrows() is helpful when you need to iterate row by row and perform specific operations or access multiple columns for each row. However, it’s important to note that iterrows() is generally slower for large DataFrames, and vectorized operations are preferred for performance optimization. However, for small to medium-sized datasets or certain row-wise operations, it can be quite useful.
Alternative: Using apply() for Column Access
While iterrows() works fine for small DataFrames, it can be inefficient for larger ones. A faster alternative for accessing columns in a vectorized manner is the apply() method. Here’s how you can achieve the same result using apply():
# Using apply to access columns
df.apply(lambda row: print(f"Name: {row['Name']},
Age: {row['Age']}"), axis=1)
Output:
Name: John, Age: 25
Name: Alice, Age: 30
Name: Bob, Age: 22
In this example, apply() is used to iterate through each row (set with axis=1) and access the “Name” and “Age” columns more efficiently.
Summary
Using iterrows() in Pandas allows you to iterate over rows and access specific columns easily. You can access column values from the row object in the iteration, and it’s useful for row-based operations. However, for better performance with large DataFrames, consider using vectorized operations or apply() for column access.
Frequently Asked Questions — Accessing Columns using iterrows() in Pandas
How do I access a column while iterating with iterrows() in Pandas?
Inside the loop, each row is a Pandas Series. You can access a column using its label:
for index, row in df.iterrows():
print(row['column_name'])
Can I modify column values inside an iterrows() loop?
Yes, but it’s not efficient. Instead, use vectorized operations or apply() for better performance. If needed:
for i, row in df.iterrows():
df.at[i, 'column_name'] = row['column_name'] * 2
What is returned by iterrows() in Pandas?
It returns an iterator that yields each index and row (as a Pandas Series): (index, row).
How do I access multiple columns using iterrows()?
Access each column by its name within the loop:
for index, row in df.iterrows():
print(row['A'], row['B'])
How to access columns dynamically inside iterrows()?
You can use a variable for the column name:
col = 'price'
for i, row in df.iterrows():
print(row[col])
Why is iterrows() slow for large DataFrames?
Because it converts each row to a Series, causing overhead. Use itertuples() or vectorized Pandas operations for faster performance.
Can I access DataFrame columns directly without using iterrows()?
Yes, vectorized column operations are faster and more efficient:
df['new'] = df['A'] + df['B']
How to convert a row Series to a dictionary while using iterrows()?
Use row.to_dict() inside the loop:
for _, row in df.iterrows():
print(row.to_dict())
How do I access the index while using iterrows()?
The index is returned as the first value in the loop:
for idx, row in df.iterrows():
print(idx)
What is the best alternative to iterrows() for performance?
Use itertuples() or vectorized operations for faster iteration in large DataFrames:
for row in df.itertuples():
print(row.column_name)