Boolean Indexing in Python

Boolean Indexing in Python

5 mins read1.4K Views Comment
Updated on Oct 3, 2023 11:50 IST

In this article, we will learn the concept of Boolean indexing, methods for Boolean indexing in pandas and NumPy. Later in the blog we will also discuss how to filter data using Boolean indexing.

2023_02_MicrosoftTeams-image-168.jpg

When you’re performing data analysis using Python, a common operation is filtering the data. It allows you to extract relevant patterns and insights from the data. One way to filter data is through Boolean vectors. The process of doing this is commonly known as Boolean indexing.

In this article, we will learn how Boolean indexing is performed in Python using Pandas and NumPy packages. We will be covering the following sections:

What is Boolean Indexing?

Boolean indexing is used to filter data by selecting subsets of the data from a given Pandas DataFrame. The subsets are chosen based on the actual values of the data in the DataFrame and not their row/column labels. 

In Boolean indexing, we filter the values by using a Boolean vector. Let’s look at the different methods through which we perform Boolean indexing:

Methods for Boolean Indexing in Pandas

In Pandas, Boolean indexing can be performed on DataFrames using two ways:

  • .loc[ ]
  • .iloc[ ]

But before that, we have to first create a DataFrame such that the index of the DataFrame contains a Boolean value that is either True or False.

 
#Importing pandas
import pandas as pd
#Create a dictionary
dict = {'name':["Rachel", "Monica", "Joey", "Phoebe"],
'job': ["Doctor", "Chef", "Actor", "Singer"],
'Age':[28, 28, 30, 31]}
#Create a dataframe with boolean values
df = pd.DataFrame(dict, index = [False, True, True, False])
print(df)
Copy code

Output:

2023_02_image-6.jpg

Now we have created a DataFrame with the Boolean index, let’s see how we can access the DataFrame using the two methods –

Method 1 – Boolean Indexing using .loc[ ]

To access a Pandas DataFrame with a Boolean index using .loc[ ],  we simply pass the Boolean value (True or False) to the .loc[ ] function, as shown below:

 
#Accessing the dataframe using .loc[] function
print(df.loc[True])
Copy code

Output:

2023_02_image-7.jpg
What is Programming What is Python
What is Data Science What is Machine Learning

Method 2 – Boolean Indexing using .iloc[ ]

When accessing the DataFrame through .iloc[ ] function, we need to keep in mind that .iloc[ ]accepts only an integer as an argument.

Let’s understand this through the following example:

Code 1:

 
#Accessing the dataframe using .iloc[] function
print(df.iloc[False])
Copy code

Output:

2023_02_image-8.jpg

As expected, the function throws a TypeError if we do not pass an integer. So, we pass the index of the value in the DataFrame, as shown below:

Code 2:

 
#Accessing the dataframe using .iloc[] function
print(df.iloc[2])
Copy code

Output:

2023_02_image-9.jpg
Programming Online Courses and Certification Python Online Courses and Certifications
Data Science Online Courses and Certifications Machine Learning Online Courses and Certifications

Boolean Indexing Using NumPy

Boolean indexing in NumPy uses a Boolean array to select elements from an array that meet a certain condition. The Boolean array is a binary mask that indicates whether each element in the array should be selected or not.

For example, you can create a Boolean array that has a True value for elements in the original array that are greater than a certain value and False values for elements that are less than or equal to that value.

Creating a Boolean Mask

 
import numpy as np
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
mask = data > 5
Copy code

The mask array will have the same shape as the original array, but its values will be either True or False depending on whether the elements in the original array satisfy the condition or not. In this case, the mask array will have False values for elements 1 to 5 and True values for elements 6 to 10.

Using the Boolean Mask to Select Data

Once you have created the Boolean mask, you can use it to select elements from the original array that meet the condition. You can do this by indexing the original array with the Boolean mask:

 
filtered_data = data[mask]
Copy code

The filtered_data array will only contain the elements from the original array that satisfy the condition. In this case, the filtered_data array will contain elements 6 to 10.

Combining Conditions

You can also combine multiple conditions to create a more complex mask that selects elements based on multiple criteria. For example, you can select elements that are greater than 5 and less than or equal to 8:

 
mask = (data > 5) & (data <= 8) filtered_data = data[mask]
Copy code

In this case, the filtered_data array will contain elements 6, 7, and 8.

Recursion Function in Python count() Function in Python
len() Function in Python float() Function in Python
range() Function in Python lambda() Function in Python

Filtering Data Using Boolean Indexing

Using NumPy

Boolean Indexing can be used to filter data in Python by creating a Boolean mask, as discussed above, that corresponds to the data you want to select based on a certain condition.

Here’s an example to illustrate the use of Boolean Indexing to filter data in a NumPy array:

 
import numpy as np
# Create an array of numbers
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Define a condition for filtering the data
condition = data > 5
# Apply the condition to create the boolean mask
mask = np.array(condition, dtype=bool)
# Use the boolean mask to select only those elements in the data that satisfy the condition
filtered_data = data[mask]
print("Original data:", data)
print("Boolean mask:", mask)
print("Filtered data:", filtered_data)
Copy code

Output:

2023_02_image-10.jpg

In this example, the condition data>5 creates a Boolean mask with True values corresponding to the elements in the data array that are greater than 5. The filtered data is then obtained by indexing the original data array with this Boolean mask.

Using Pandas

You can use Boolean Indexing with pandas DataFrames to filter rows based by selecting elements from a DataFrame that meet certain conditions.

Here’s an example to illustrate the use of Boolean Indexing to filter data in a pandas DataFrame:

 
import pandas as pd
# create a sample DataFrame
data = {'name': ['John', 'Jane', 'Jim', 'Joan'],
'age': [32, 28, 35, 40],
'city': ['New York', 'London', 'Paris', 'Berlin']}
df = pd.DataFrame(data)
# create a boolean mask to select rows where the age is greater than 30
mask = df['age'] > 30
# use the boolean mask to filter the DataFrame
filtered_df = df[mask]
Copy code

The filtered_df DataFrame will contain only the rows where the age is greater than 30. In this case, the rows for John, Jim, and Joan.

You can also combine multiple conditions to create a more complex filter. For example, you can select rows where the age is greater than 30 and the city is ‘Paris’ or ‘Berlin’:

 
mask = (df['age'] > 30) & ((df['city'] == 'Paris') | (df['city'] == 'Berlin'))
filtered_df = df[mask]
Copy code

In this case, the filtered_df DataFrame will contain only the row for Jim.

Endnotes

Hope this article was helpful for you to understand Boolean Indexing in Python. It’s an essential technique for data analysis and scientific computing, and it’s widely used in many areas, including machine learning, image processing, and data visualization. Whether you’re working with small or large datasets, Boolean Indexing is a simple and efficient way to manipulate and analyze your data in Python.

Contributed By: Prerna Singh

Top Trending Article

Top Online Python Compiler | How to Check if a Python String is Palindrome | Feature Selection Technique | Conditional Statement in Python | How to Find Armstrong Number in Python | Data Types in Python | How to Find Second Occurrence of Sub-String in Python String | For Loop in Python |Prime Number | Inheritance in Python | Validating Password using Python Regex | Python List |Market Basket Analysis in Python | Python Dictionary | Python While Loop | Python Split Function | Rock Paper Scissor Game in Python | Python String | How to Generate Random Number in Python | Python Program to Check Leap Year | Slicing in Python

Interview Questions

Data Science Interview Questions | Machine Learning Interview Questions | Statistics Interview Question | Coding Interview Questions | SQL Interview Questions | SQL Query Interview Questions | Data Engineering Interview Questions | Data Structure Interview Questions | Database Interview Questions | Data Modeling Interview Questions | Deep Learning Interview Questions |

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio