How can I iterate over rows in a Pandas DataFrame?

How can I iterate over rows in a Pandas DataFrame?

When working with the pandas, you must have come across a need to process each row in a DataFrame. Although pandas are designed for faster and vectorized operations, row-wise iteration is also important in particular scenarios. Let’s dive into this blog to learn about various methods through which we can iterate over DataFrames and when to use each of them.

Table of Contents

Method 1: Using iterrows() – For smaller datasets

The iterrows() method allows you to loop through each row in an (index, Series) pair. Even Though it is simple to use, it is slower for larger datasets. It is best used for small datasets that require quick operations.

Let us now understand this with the help of an example:

Imagine you are grading a small list of students based on their marks.

a. Code

import pandas as pd
test_data = {'Name': ['Eva', 'Bobby', 'Charles'], 'Score': [85, 62, 90]}
df = pd.DataFrame(test_data)
for index, row in df.iterrows():
    test_grade = 'A' if row['Score'] >= 80 else 'B'
    print(f"{row['Name']} scored {row['Score']} and got grade {test_grade}.")

b. Output

Eva scored 85 and got grade A.  
Bobby scored 62 and got grade B.  
Charles scored 90 and got grade A.

Method 2: Using itertuples() – For larger datasets

itertuples() returns the values as namedtuples, which makes it faster and memory-efficient compared to that of iterrows(). It is best to be used for larger datasets where performance is important.

Now, if we take an example where you are required to calculate the salary of your employees in a larger dataset:

a. Code

test_data = {'Employee': ['Harry', 'Hermione', 'Ron'], 'Monthly Salary': [3000, 4000, 3500]}
df = pd.DataFrame(test_data)
for row in df.itertuples():
    annual_salary = row._2 * 12
    print(f"{row.Employee} earns {annual_salary} annually.")

b. Output

Harry earns 36000 annually.  
Hermione earns 48000 annually.  
Ron earns 42000 annually.

Method 3: Using apply() – For complex row-wise transformations

The apply() enables you to apply a function to each row and column. It is mostly useful or ideal if you want to perform concise and vectorized row-wise calculations. It is mostly used for complex or mathematical operations across rows and columns.

Now, we can take an example where you want to calculate the Body Mass Index(BMI) for a group of people.

a. Code

test_data = {'Name': ['Eva', 'Bobby'], 'Weight (kg)': [70, 85], 'Height (m)': [1.75, 1.80]}
df = pd.DataFrame(test_data)

df['BMI'] = df.apply(lambda row: row['Weight (kg)'] / (row['Height (m)'] ** 2), axis=1)
print(df)

b. Output

NameWeight (kg) Height (m)BMI
0Eva70 1.7522.857143 
1Bobby85  1.8026.234568

Method 4: Index-based Iteration (iloc[] or loc[]) – For specific rows

iloc[] and loc[] give you precise indexing when you want to process or update the specific rows in a dataframe. It is very useful when you need control over rows to access, modify them, and apply conditional updates.

If we take an example where you are trying to flag transactions above a certain amount in a financial dataset.

test_data = {‘Transaction ID’: [101, 102, 103], ‘Amount’: [500, 1500, 750]}

a. Code

test_data = {'Transaction ID': [101, 102, 103], 'Amount': [500, 1500, 750]}
df = pd.DataFrame(test_data)

for i in range(len(df)):
    if df.loc[i, 'Amount'] > 1000:
        df.loc[i, 'Flag'] = 'High'
    else:
        df.loc[i, 'Flag'] = 'Normal'
print(df)

b. Output

Transaction IDAmountFlag
0101500Normal
11021500High
2103750Normal

Which Method: When to Use

Method Best For
iterrows()Smaller datasets or the ones that require quick exploratory tasks.
itertuples()It is best when you have larger datasets that require better performance.
apply()It is used when there is a requirement for complex row-wise transformations or vectorized logic.
iloc[]/loc[]This gives you precise control over particular rows with conditional logic.

Conclusion

In conclusion, while there are multiple ways to iterate over rows in a Pandas DataFrame, the choice depends on your task’s complexity and dataset size. For small datasets or custom logic, you can use iterrows() or apply(). For better performance on larger datasets, use vectorized operations or itertuples(). If you want to learn more about this technique on data manipulation using Pandas, then you should check out our Data Science Course using Python.

Method to Iterate Over Rows in Pandas Dataframe – FAQs

What is the best way to iterate over the rows of a panda DataFrame?

For smaller datasets, you can use iterrows() and for smaller datasets, or for all those datasets that are performance-critical, you can use itertuples().

How do you iterate over multiple rows in pandas?

If you want to iterate over multiple rows in pandas, you can use slicing with iloc[] or loc[] to iterate over a subset of rows. Code:

for _, row in df.loc[0:5].iterrows():

print(row)
What is the alternative to loop in pandas?

Vectorized operations and methods such as apply() or transform() are the alternatives that are faster to explicit loops.

Is itertuples() faster than iterrows()?

Yes, itertuples() is faster as it avoids or does not convert every row to a series object.

About the Author

Senior Consultant Analytics & Data Science

Sahil Mattoo, a Senior Software Engineer at Eli Lilly and Company, is an accomplished professional with 14 years of experience in languages such as Java, Python, and JavaScript. Sahil has a strong foundation in system architecture, database management, and API integration. 

Full Stack Developer Course Banner