Drop Rows in pandas DataFrame (by Index, Condition, NaN & Practical Examples)

Drop Rows in pandas DataFrame (by Index, Condition, NaN & Practical Examples)

When working with real-world datasets, it is often necessary to remove unwanted rows before performing analysis. Pandas provides multiple ways to drop rows from a DataFrame, such as removing rows by index, filtering rows using conditions, deleting rows with missing values, or eliminating duplicate entries.

Understanding these different methods helps you clean datasets efficiently and prepare them for accurate data analysis.


Quick Reference: Drop Rows in pandas

TaskMethodExample
Drop a single row by indexdrop()df.drop(2)
Drop multiple rowsdrop(list)df.drop([1,3,5])
Drop rows by index positiondf.indexdf.drop(df.index[2])
Drop rows using index listdf.indexdf.drop(df.index[[1,3]])
Drop rows using index rangerange()df.drop(range(1,4))
Drop rows based on conditionBoolean filteringdf.drop(df[df["age"] < 30].index)
Drop rows matching column valueconditiondf.drop(df[df["city"]=="Chicago"].index)
Drop rows using multiple conditionslogical operatorsdf.drop(df[(df["age"] < 30) & (df["city"]=="NY")].index)
Drop rows containing NaN valuesdropna()df.dropna()
Drop rows where specific column has NaNsubsetdf.dropna(subset=["age"])
Drop rows where all values are NaNhow="all"df.dropna(how="all")
Drop duplicate rowsdrop_duplicates()df.drop_duplicates()
Drop duplicates based on columnsubsetdf.drop_duplicates(subset="name")
Drop rows and modify original DataFrameinplace=Truedf.drop(2, inplace=True)
Drop rows safely ignoring missing indexerrors="ignore"df.drop([100], errors="ignore")
Drop rows containing specific textfilteringdf.drop(df[df["name"]=="John"].index)
Drop rows after filteringconditional filterdf = df[df["age"] > 25]
Drop rows using query syntaxquery()df.query("age > 30")

Drop a Single Row in pandas

Drop row using index label

The most common way to remove a row from a pandas DataFrame is by using the drop() method with the row's index label.

python
import pandas as pd

df = pd.DataFrame({
    "name": ["John", "Anna", "Peter"],
    "age": [28, 22, 35]
})

df = df.drop(1)

print(df)

This removes the row whose index label is 1.

Drop multiple rows using list of indices

You can also drop multiple rows by passing a list of index labels.

python
df = df.drop([0, 2])

This removes rows with index 0 and 2 from the DataFrame.

Drop row using index position

If you want to drop a row using its position instead of label, you can use df.index.

python
df = df.drop(df.index[1])

Here, the row at position 1 is removed regardless of its index label.


Drop Rows by Index

Drop row by index in pandas

You can drop rows directly by specifying their index label.

python
df.drop(3)

This removes the row where the index label equals 3.

Drop rows using df.index

You can use df.index to drop rows using their positional index.

python
df.drop(df.index[[1, 3]])

This removes rows at positions 1 and 3.

Drop rows using index range

To drop a continuous range of rows, you can combine drop() with the range() function.

python
df.drop(range(1, 4))

This removes rows with indices 1, 2, and 3.

Reset index after dropping rows

After removing rows, the index may become non-sequential. You can reset it using reset_index().

python
df = df.reset_index(drop=True)

print(df)

This reassigns sequential index values starting from 0.


Drop Rows Based on Conditions

Drop rows where column value matches condition

You can remove rows based on a condition applied to a column.

python
df = df.drop(df[df["age"] < 30].index)

This drops all rows where the age column is less than 30.

Drop rows using multiple conditions

Multiple conditions can be applied using logical operators such as & (AND) or | (OR).

python
df = df.drop(df[(df["age"] < 30) & (df["name"] == "Anna")].index)

This removes rows where age is less than 30 and the name is "Anna".

Drop rows where column value equals specific value

You can drop rows where a column matches a specific value.

python
df = df.drop(df[df["name"] == "John"].index)

This removes rows where the name column equals "John".

Drop rows using Boolean indexing

Another common approach is to filter rows you want to keep instead of dropping them explicitly.

python
df = df[df["age"] >= 30]

print(df)

This keeps only rows where age is greater than or equal to 30.


Drop Rows with Missing Values

Drop rows containing NaN values

In many datasets, missing values appear as NaN. You can remove rows containing missing values using the dropna() method.

python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    "name": ["John", "Anna", "Peter"],
    "age": [28, np.nan, 35]
})

df = df.dropna()

print(df)

This removes all rows that contain at least one NaN value.

Drop rows where specific column contains NaN

If you want to drop rows only when a specific column contains missing values, use the subset parameter.

python
df = df.dropna(subset=["age"])

This removes rows where the age column contains NaN.

Drop rows only if all values are NaN

Sometimes rows contain multiple columns and you only want to remove rows where all values are missing.

python
df = df.dropna(how="all")

This drops rows only if every column in that row contains NaN.

Keep rows with at least one non-null value

To ensure rows remain if at least one column contains data, you can use the thresh parameter.

python
df = df.dropna(thresh=1)

This keeps rows with at least one non-null value.


Drop Duplicate Rows

Drop duplicate rows in pandas

Duplicate rows may occur during data collection or merging datasets. Pandas provides the drop_duplicates() function to remove them.

python
df = df.drop_duplicates()

This removes duplicate rows across the entire DataFrame.

Drop duplicates based on specific columns

Sometimes duplicates should only be evaluated based on certain columns.

python
df = df.drop_duplicates(subset=["name"])

This removes duplicate rows based only on the name column.

Keep first occurrence while removing duplicates

By default, pandas keeps the first occurrence of a duplicate row.

python
df = df.drop_duplicates(keep="first")

This removes later duplicate entries while preserving the first instance.

Remove all duplicate rows

If you want to remove all duplicate entries, including the first occurrence, use keep=False.

python
df = df.drop_duplicates(keep=False)

This removes every row that appears more than once.


Drop Rows While Cleaning Data

Remove rows containing unwanted values

You may want to remove rows that contain specific unwanted values.

python
df = df[df["city"] != "Unknown"]

This removes rows where the city column contains "Unknown".

Drop rows containing empty strings

Datasets sometimes contain empty strings that should be treated as invalid values.

python
df = df[df["name"] != ""]

This removes rows where the name column is empty.

Remove rows containing specific text patterns

You can drop rows where a column contains specific text patterns using str.contains().

python
df = df[~df["name"].str.contains("test", case=False)]

This removes rows where the name column contains the word "test".

Drop rows after replacing invalid values

Sometimes values must be cleaned before dropping rows.

python
df["age"] = pd.to_numeric(df["age"], errors="coerce")

df = df.dropna(subset=["age"])

This converts invalid values to NaN and removes rows where the conversion failed.


Drop Rows Safely Using inplace

Drop rows using inplace parameter

The inplace parameter allows you to modify the original DataFrame directly instead of returning a new DataFrame.

python
import pandas as pd

df = pd.DataFrame({
    "name": ["John", "Anna", "Peter"],
    "age": [28, 22, 35]
})

df.drop(1, inplace=True)

print(df)

This removes the row with index 1 and updates the original DataFrame.

Difference between inplace=True and returning new DataFrame

By default, pandas returns a new DataFrame after dropping rows.

python
df2 = df.drop(1)

Here, df remains unchanged while df2 contains the modified DataFrame.

Using inplace=True modifies the existing DataFrame.

python
df.drop(1, inplace=True)

In this case, the operation happens directly on df.

When to avoid inplace operations

Although inplace=True can be convenient, it is not always recommended. Many data workflows benefit from keeping the original dataset unchanged.

python
df_clean = df.drop(1)

This approach preserves the original DataFrame while creating a cleaned version.


Drop Rows Efficiently in Large DataFrames

Filtering vs dropping rows

For large datasets, filtering rows is often faster than repeatedly calling drop().

python
df = df[df["age"] >= 30]

Instead of dropping rows where age is less than 30, this approach keeps only rows that satisfy the condition.

Best method for large datasets

Boolean filtering is usually more efficient than multiple drop() operations.

python
df_filtered = df[df["city"] != "Unknown"]

This removes unwanted rows while avoiding unnecessary copying of the DataFrame.

Avoid unnecessary DataFrame copies

Repeated row operations may slow down processing for large datasets.

python
df = df[df["age"] > 25]

Applying conditions in a single step helps improve performance and reduces memory usage.


Common Errors When Dropping Rows

KeyError when index does not exist

If you attempt to drop a row with an index that does not exist, pandas raises a KeyError.

python
df.drop(100)

If index 100 is not present, the operation will fail.

Dropping rows with incorrect axis

The drop() function can remove either rows or columns depending on the axis parameter.

python
df.drop(1, axis=0)

Here axis=0 specifies rows. Using axis=1 would attempt to drop a column instead.

Handling errors using errors="ignore"

To prevent errors when dropping rows that may not exist, use the errors="ignore" parameter.

python
df.drop([100, 101], errors="ignore")

If these indices are not present, pandas simply ignores them instead of raising an error.


Frequently Asked Questions

1. How do I drop rows in pandas?

You can drop rows in pandas using the DataFrame.drop() function by specifying the index label or a list of indices.

2. How do I drop rows with NaN values in pandas?

You can remove rows with missing values using df.dropna(), which drops rows containing NaN values.

3. How do I drop rows based on a condition in pandas?

You can drop rows using a condition with Boolean indexing such as df.drop(df[df['age'] < 30].index).

4. What does inplace=True do in pandas drop?

The inplace=True parameter modifies the original DataFrame directly instead of returning a new DataFrame.

5. How do I drop multiple rows in pandas?

You can drop multiple rows by passing a list of indices like df.drop([1, 3, 5]).

Summary

Dropping rows is an essential part of cleaning and preparing datasets in pandas. The drop() method allows you to remove rows by index, while conditional filtering helps remove rows based on column values. Pandas also provides specialized functions such as dropna() and drop_duplicates() to efficiently remove missing or duplicate data.

By understanding these different approaches, you can safely and efficiently clean your DataFrame while improving data quality for further analysis and visualization.


Official Documentation

Deepak Prasad

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels across development, DevOps, networking, and security, delivering robust and efficient solutions for diverse projects.