# 5 Best Ways to Iterate Over Rows in a Pandas DataFrame : Emily Rosemary Collins

**5 Best Ways to Iterate Over Rows in a Pandas DataFrame**

**by: Emily Rosemary Collins**

*blow post content copied from Be on the Right Side of Change*

click here to view original post

click here to view original post

**Problem Formulation:**When working with data in Python, a common task is iterating over rows in a pandas DataFrame to perform operations on each row. For example, you may have a DataFrame containing stock prices and would like to calculate the daily return for each stock. You need efficient ways to loop through rows to compute the desired result. Here, we will discuss some best methods for row iteration, including their syntax and best-use scenarios.

## Method 1: Using `iterrows()`

Iterating through a DataFrame can be done using `iterrows()`

, which returns an iterator yielding index and row data as pairs. This method is straightforward and useful for iterating while considering the index.
Here’s an example:

import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) # Using iterrows to iterate for index, row in df.iterrows(): print(f'Index: {index}, A: {row["A"]}, B: {row["B"]}')

Output:

Index: 0, A: 1, B: 4 Index: 1, A: 2, B: 5 Index: 2, A: 3, B: 6

This code snippet creates a pandas DataFrame and iterates over each row using `iterrows()`

. The loop prints the index along with the values in columns ‘A’ and ‘B’ for each row. It’s a convenient method for row-wise operations where index plays an important role.

## Method 2: Using `itertuples()`

The `itertuples()`

method for data frames is faster than `iterrows()`

and returns a namedtuple for each row, which makes it more memory efficient and typically better for performance.
Here’s an example:

# Using itertuples to iterate for row in df.itertuples(): print(f'Index: {row.Index}, A: {row.A}, B: {row.B}')

Output:

Index: 0, A: 1, B: 4 Index: 1, A: 2, B: 5 Index: 2, A: 3, B: 6

In this snippet, `itertuples()`

is used to iterate over the DataFrame rows as namedtuples. This method improves readability and performance, especially in large DataFrames.

## Method 3: Using Vectorization with `pandas`

Series and DataFrame methods

Vectorization is the use of operations on complete arrays instead of individual elements, which is the optimal way to perform operations in pandas. It is the most efficient way to work with pandas and should be your first choice before considering iteration.
Here’s an example:

# Vectorized operation df['C'] = df['A'] + df['B'] print(df)

Output:

A B C 0 1 4 5 1 2 5 7 2 3 6 9

This code uses vectorization to add columns ‘A’ and ‘B’ to create a new column ‘C’. It avoids explicit iteration and is usually the fastest method when performing calculations across rows.

## Method 4: Applying a Function with `apply()`

For more complex operations that may not be vectorizable, or when you want to use a custom function across rows, the `apply()`

method can be a lifesaver.
Here’s an example:

# Custom function to apply def custom_operation(row): return row['A'] * row['B'] # Applying the function to each row df['D'] = df.apply(custom_operation, axis=1) print(df)

Output:

A B C D 0 1 4 5 4 1 2 5 7 10 2 3 6 9 18

The custom function `custom_operation`

is applied to each row thanks to `apply()`

. The function multiplies elements in columns ‘A’ and ‘B’, storing the result in a new column ‘D’.

## Bonus One-Liner Method 5: Using List Comprehensions

While list comprehensions aren’t specifically part of pandas, they can be used to iterate over DataFrame rows quickly. They’re concise and can be written in a single line of code.Here’s an example:

# List comprehension to create a list of sums column_sum = [row.A + row.B for row in df.itertuples()] print(column_sum)

Output:

[5, 7, 9]

The list comprehension iterates over each row, accessed via `itertuples()`

, and computes the sum of columns ‘A’ and ‘B’ for each row, storing the results in a list.

## Summary/Discussion

**Method 1:**Easy to use. Suitable for operations where the index is significant. It’s less memory efficient and slower than other methods.`iterrows()`

.**Method 2:**Faster and more memory-efficient than`itertuples()`

.`iterrows()`

. Best for per-row operations where index matters.**Method 3: Vectorization.**The most efficient way to operate across a DataFrame. Avoids explicit iteration. Use when possible.**Method 4:**Flexible for complex operations. More efficient than row-wise iteration but typically slower than vectorization.`apply()`

.**Bonus Method 5: List Comprehensions.**A Pythonic, readable approach. Quick for simple operations, but lacks the direct access to DataFrame features.

February 19, 2024 at 02:53AM

Click here for more details...

=============================

The original post is available in Be on the Right Side of Change by Emily Rosemary Collins

this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.

The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.

============================

## Post a Comment