When working with real-world datasets, it is often necessary to remove unwanted rows before performing analysis. Pandas provides multiple ways to drop rows from a DataFrame, such as removing rows by index, filtering rows using conditions, deleting rows with missing values, or eliminating duplicate entries.
Understanding these different methods helps you clean datasets efficiently and prepare them for accurate data analysis.
Quick Reference: Drop Rows in pandas
| Task | Method | Example |
|---|---|---|
| Drop a single row by index | drop() | df.drop(2) |
| Drop multiple rows | drop(list) | df.drop([1,3,5]) |
| Drop rows by index position | df.index | df.drop(df.index[2]) |
| Drop rows using index list | df.index | df.drop(df.index[[1,3]]) |
| Drop rows using index range | range() | df.drop(range(1,4)) |
| Drop rows based on condition | Boolean filtering | df.drop(df[df["age"] < 30].index) |
| Drop rows matching column value | condition | df.drop(df[df["city"]=="Chicago"].index) |
| Drop rows using multiple conditions | logical operators | df.drop(df[(df["age"] < 30) & (df["city"]=="NY")].index) |
| Drop rows containing NaN values | dropna() | df.dropna() |
| Drop rows where specific column has NaN | subset | df.dropna(subset=["age"]) |
| Drop rows where all values are NaN | how="all" | df.dropna(how="all") |
| Drop duplicate rows | drop_duplicates() | df.drop_duplicates() |
| Drop duplicates based on column | subset | df.drop_duplicates(subset="name") |
| Drop rows and modify original DataFrame | inplace=True | df.drop(2, inplace=True) |
| Drop rows safely ignoring missing index | errors="ignore" | df.drop([100], errors="ignore") |
| Drop rows containing specific text | filtering | df.drop(df[df["name"]=="John"].index) |
| Drop rows after filtering | conditional filter | df = df[df["age"] > 25] |
| Drop rows using query syntax | query() | df.query("age > 30") |
Drop a Single Row in pandas
Drop row using index label
The most common way to remove a row from a pandas DataFrame is by using the drop() method with the row's index label.
import pandas as pd
df = pd.DataFrame({
"name": ["John", "Anna", "Peter"],
"age": [28, 22, 35]
})
df = df.drop(1)
print(df)This removes the row whose index label is 1.
Drop multiple rows using list of indices
You can also drop multiple rows by passing a list of index labels.
df = df.drop([0, 2])This removes rows with index 0 and 2 from the DataFrame.
Drop row using index position
If you want to drop a row using its position instead of label, you can use df.index.
df = df.drop(df.index[1])Here, the row at position 1 is removed regardless of its index label.
Drop Rows by Index
Drop row by index in pandas
You can drop rows directly by specifying their index label.
df.drop(3)This removes the row where the index label equals 3.
Drop rows using df.index
You can use df.index to drop rows using their positional index.
df.drop(df.index[[1, 3]])This removes rows at positions 1 and 3.
Drop rows using index range
To drop a continuous range of rows, you can combine drop() with the range() function.
df.drop(range(1, 4))This removes rows with indices 1, 2, and 3.
Reset index after dropping rows
After removing rows, the index may become non-sequential. You can reset it using reset_index().
df = df.reset_index(drop=True)
print(df)This reassigns sequential index values starting from 0.
Drop Rows Based on Conditions
Drop rows where column value matches condition
You can remove rows based on a condition applied to a column.
df = df.drop(df[df["age"] < 30].index)This drops all rows where the age column is less than 30.
Drop rows using multiple conditions
Multiple conditions can be applied using logical operators such as & (AND) or | (OR).
df = df.drop(df[(df["age"] < 30) & (df["name"] == "Anna")].index)This removes rows where age is less than 30 and the name is "Anna".
Drop rows where column value equals specific value
You can drop rows where a column matches a specific value.
df = df.drop(df[df["name"] == "John"].index)This removes rows where the name column equals "John".
Drop rows using Boolean indexing
Another common approach is to filter rows you want to keep instead of dropping them explicitly.
df = df[df["age"] >= 30]
print(df)This keeps only rows where age is greater than or equal to 30.
Drop Rows with Missing Values
Drop rows containing NaN values
In many datasets, missing values appear as NaN. You can remove rows containing missing values using the dropna() method.
import pandas as pd
import numpy as np
df = pd.DataFrame({
"name": ["John", "Anna", "Peter"],
"age": [28, np.nan, 35]
})
df = df.dropna()
print(df)This removes all rows that contain at least one NaN value.
Drop rows where specific column contains NaN
If you want to drop rows only when a specific column contains missing values, use the subset parameter.
df = df.dropna(subset=["age"])This removes rows where the age column contains NaN.
Drop rows only if all values are NaN
Sometimes rows contain multiple columns and you only want to remove rows where all values are missing.
df = df.dropna(how="all")This drops rows only if every column in that row contains NaN.
Keep rows with at least one non-null value
To ensure rows remain if at least one column contains data, you can use the thresh parameter.
df = df.dropna(thresh=1)This keeps rows with at least one non-null value.
Drop Duplicate Rows
Drop duplicate rows in pandas
Duplicate rows may occur during data collection or merging datasets. Pandas provides the drop_duplicates() function to remove them.
df = df.drop_duplicates()This removes duplicate rows across the entire DataFrame.
Drop duplicates based on specific columns
Sometimes duplicates should only be evaluated based on certain columns.
df = df.drop_duplicates(subset=["name"])This removes duplicate rows based only on the name column.
Keep first occurrence while removing duplicates
By default, pandas keeps the first occurrence of a duplicate row.
df = df.drop_duplicates(keep="first")This removes later duplicate entries while preserving the first instance.
Remove all duplicate rows
If you want to remove all duplicate entries, including the first occurrence, use keep=False.
df = df.drop_duplicates(keep=False)This removes every row that appears more than once.
Drop Rows While Cleaning Data
Remove rows containing unwanted values
You may want to remove rows that contain specific unwanted values.
df = df[df["city"] != "Unknown"]This removes rows where the city column contains "Unknown".
Drop rows containing empty strings
Datasets sometimes contain empty strings that should be treated as invalid values.
df = df[df["name"] != ""]This removes rows where the name column is empty.
Remove rows containing specific text patterns
You can drop rows where a column contains specific text patterns using str.contains().
df = df[~df["name"].str.contains("test", case=False)]This removes rows where the name column contains the word "test".
Drop rows after replacing invalid values
Sometimes values must be cleaned before dropping rows.
df["age"] = pd.to_numeric(df["age"], errors="coerce")
df = df.dropna(subset=["age"])This converts invalid values to NaN and removes rows where the conversion failed.
Drop Rows Safely Using inplace
Drop rows using inplace parameter
The inplace parameter allows you to modify the original DataFrame directly instead of returning a new DataFrame.
import pandas as pd
df = pd.DataFrame({
"name": ["John", "Anna", "Peter"],
"age": [28, 22, 35]
})
df.drop(1, inplace=True)
print(df)This removes the row with index 1 and updates the original DataFrame.
Difference between inplace=True and returning new DataFrame
By default, pandas returns a new DataFrame after dropping rows.
df2 = df.drop(1)Here, df remains unchanged while df2 contains the modified DataFrame.
Using inplace=True modifies the existing DataFrame.
df.drop(1, inplace=True)In this case, the operation happens directly on df.
When to avoid inplace operations
Although inplace=True can be convenient, it is not always recommended. Many data workflows benefit from keeping the original dataset unchanged.
df_clean = df.drop(1)This approach preserves the original DataFrame while creating a cleaned version.
Drop Rows Efficiently in Large DataFrames
Filtering vs dropping rows
For large datasets, filtering rows is often faster than repeatedly calling drop().
df = df[df["age"] >= 30]Instead of dropping rows where age is less than 30, this approach keeps only rows that satisfy the condition.
Best method for large datasets
Boolean filtering is usually more efficient than multiple drop() operations.
df_filtered = df[df["city"] != "Unknown"]This removes unwanted rows while avoiding unnecessary copying of the DataFrame.
Avoid unnecessary DataFrame copies
Repeated row operations may slow down processing for large datasets.
df = df[df["age"] > 25]Applying conditions in a single step helps improve performance and reduces memory usage.
Common Errors When Dropping Rows
KeyError when index does not exist
If you attempt to drop a row with an index that does not exist, pandas raises a KeyError.
df.drop(100)If index 100 is not present, the operation will fail.
Dropping rows with incorrect axis
The drop() function can remove either rows or columns depending on the axis parameter.
df.drop(1, axis=0)Here axis=0 specifies rows. Using axis=1 would attempt to drop a column instead.
Handling errors using errors="ignore"
To prevent errors when dropping rows that may not exist, use the errors="ignore" parameter.
df.drop([100, 101], errors="ignore")If these indices are not present, pandas simply ignores them instead of raising an error.
Frequently Asked Questions
1. How do I drop rows in pandas?
You can drop rows in pandas using the DataFrame.drop() function by specifying the index label or a list of indices.2. How do I drop rows with NaN values in pandas?
You can remove rows with missing values using df.dropna(), which drops rows containing NaN values.3. How do I drop rows based on a condition in pandas?
You can drop rows using a condition with Boolean indexing such as df.drop(df[df['age'] < 30].index).4. What does inplace=True do in pandas drop?
The inplace=True parameter modifies the original DataFrame directly instead of returning a new DataFrame.5. How do I drop multiple rows in pandas?
You can drop multiple rows by passing a list of indices like df.drop([1, 3, 5]).Summary
Dropping rows is an essential part of cleaning and preparing datasets in pandas. The drop() method allows you to remove rows by index, while conditional filtering helps remove rows based on column values. Pandas also provides specialized functions such as dropna() and drop_duplicates() to efficiently remove missing or duplicate data.
By understanding these different approaches, you can safely and efficiently clean your DataFrame while improving data quality for further analysis and visualization.



