Difference between loc and iloc in Pandas

Difference between loc and iloc in Pandas

5 mins read4.1K Views Comment
Updated on Oct 3, 2023 12:04 IST

loc[ ] and iloc[ ] in Pandas are used for convenient data selection and filtering in Pandas. The article covers the differences between loc and iloc in Pandas.

2022_03_Differences-between-loc-and-iloc-in-Pandas.jpg

The Pandas library of Python is widely used for data manipulation by data scientists and data analysts. It comprises many methods and functions that help manage and analyze your data efficiently and quickly. The loc and iloc in pandas are used for slicing the data – means creating subsets of data from a Pandas Data frame. As a newbie Python programmer, one of the most fundamental questions you’re going to ask yourself is this – What’s the difference between loc and iloc in Pandas?

Let’s learn how they differ from each other.

In this blog we will cover the following sections:

Difference between loc and iloc in Pandas data frame

Let’s summarize the difference between the loc and iloc in Pandas in a table:

loc in Pandas iloc in Pandas
Label-based data selector Index-based data selector
Indices should be sorted in order, or loc[ ] will only select the mentioned indices when slicing Indices need not be sorted in order when slicing
Indices should be numerical, else slicing cannot be done Indices can be numerical or categorical
The end index is included during slicing The end index is excluded during slicing
Accepts bool series or list in conditions Only accepts bool list in conditions

loc vs iloc – How does it differ? – Try it Yourself

Click the below colab icon to run and practice the demo

google-collab

What is loc Method?

The loc[ ] is a label-based method used for selecting data as well as updating it. This is done by passing the name (label) of the row or column that we wish to select. 

 
Syntax: loc[row_label, column_label]
Copy code

Let’s understand this through an example. Let’s create a sample DataFrame using Pandas:

 
#Importing Pandas Library
import pandas as pd
#Creating a Sample DataFrame
df = pd.DataFrame({
'id': [ 101, 102, 103, 104, 105, 106, 107],
'age': [ 20, 22, 23, 21, 22, 21, 25],
'group': [ 'A', 'B', 'C', 'C', 'B', 'A', 'A'],
'city': [ 'Tier1', 'Tier2', 'Tier2', 'Tier3', 'Tier1', 'Tier2', 'Tier1'],
'gender': [ 'M', 'F', 'F', 'M', 'M', 'M', 'F'],
'degree': [ 'econ', 'econ', 'maths', 'finance', 'history', 'science', 'marketing']
})
df
Copy code

Output:

2022_09_image-122.jpg

We have created a sample student dataset comprising 6 columns – ‘id’, ‘age’, ‘group’, ‘city’, ‘gender’, and ‘degree’. As you can see, it contains both numerical and categorical variables.

Firstly, let’s set the ‘id’ column as the index using set_index():

 
df = df.set_index('id')
df.head()
Copy code

Output:

2022_09_image-123.jpg

This will help us understand the difference between loc[ ] and iloc[ ] better. 

Operations using loc method

Selecting a row using loc[ ]

Let’s select a row using loc[ ]:

 
#Selecting a row with label
df.loc[102]
Copy code
2022_09_image-124.jpg

Once you set the ‘id’ column as the index, its values become the labels. So, selecting label 102 will display the record for that row.

Slicing using loc[ ]

Let’s use loc[ ] to perform slicing:

 
#Slicing using loc[]
df.loc[101:103]
Copy code
2022_09_image-125.jpg

Slicing simply means selecting a range of values. Here, we have selected and displayed all records between labels 101 and 103 (end label included).

Filtering rows using loc[ ]

Let’s set a condition to filter rows:

 
#Selecting all rows with a given condition
df.loc[df.age >= 22]
Copy code
2022_09_image-129.jpg

As you can see, all records where age is greater than or equal to 22 are displayed.

How about we set multiple conditions to filter the rows?

 
#Selecting rows with multiple conditions
df.loc[(df.age >= 22) & (df.city == 'Tier1')]
Copy code
2022_09_image-131.jpg

Here we’ve displayed records where age is greater than or equal to 22 and the city is tier 1.

Filtering columns using loc[ ]

Let’s set a condition to filter columns:

 
#Selecting columns with a given condition
df.loc[(df.gender == 'M'), ['group', 'degree']]
Copy code
2022_09_image-132.jpg

Here we’ve chosen to display two columns where the gender is male.

Updating columns using loc[ ]

Let’s set a condition to update columns:

 
#Updating a column with a given condition
df.loc[(df.gender == 'M'), ['group']] = 'A'
df
Copy code
2022_09_image-133.jpg

If the gender of an individual is male, then their group would be updated to ‘A’. 

We can also update multiple columns by setting a condition:

 
#Updating multiple columns with a given condition
df.loc[(df.gender == 'F'), ['group', 'city']] = ['B','Tier2']
df
Copy code
2022_09_image-134.jpg

So, if the gender of an individual is female, then their group and city would be updated to ‘B’ and ‘Tier2’ respectively. 

What is iloc Method?

The iloc[ ] is an index-based method used for data selection. In this case, we pass the positions of the row or column that we wish to select (0-based integer index). 

 
Syntax: iloc[row_position, column_position]
Copy code
2022_09_image-137.jpg

For the given dataset, we can visualize the indices for rows and columns as follows:

Operations using iloc method

Selecting a row using iloc

Let’s select a row using iloc:

 
#Selecting rows with index
df.iloc[[2,4]]
Copy code
2022_09_image-151.jpg

Since we’ve used the iloc[ ] method, 2 and 4 refer to the index number, and hence the second and the fourth row would be displayed, regardless of the label of the index.

Selecting rows and columns using iloc

Now let’s see how to select rows and columns using iloc:

 
#Selecting rows with particular index and particular columns
df.iloc[[0,4],[1,3]]
Copy code
2022_09_image-140.jpg

[0,4] refers to index numbers 0 and 4 for rows and [1,3] refers to index numbers for columns.

Slicing using iloc

Let’s use iloc to select a range of rows:

 
#Selecting range of rows
data.iloc[1:5]
Copy code
2022_09_image-142.jpg

Here, we have selected and displayed all records between indices 1 and 5 (index number 5, that is the endpoint, is not included).

We can also select a range of rows and columns:

 
#Selecting range of rows and columns
df.iloc[1:3,2:4]
Copy code
2022_09_image-143.jpg

Here, we have displayed 

  • All rows between indices 1 and 3 (endpoint excluded).
  • All columns between indices 2 and 4 (endpoint excluded).

Comparisons between loc and iloc

loc and iloc with callable

loc accepts a callable function as an index. The function takes one argument and returns a value valid for indexing. For demonstration:

  • Selecting columns using callable
 
#Selecting columns using callable
df.loc[:, lambda df: ['gender', 'degree']]
Copy code
2022_09_image-145.jpg
  • Filtering columns using callable
 
#Filtering data using callable
df.loc[lambda df: df.age > 24, :]\n <span style="background-color: inherit; font-size: inherit;">\n \n </span style="background-color: inherit; font-size: inherit;">
Copy code
2022_09_image-146.jpg

iloc[ ] also accepts a callable function as an index. But iloc[ ] required a list() to convert the output of conditions into a Boolean list:

  • Filtering columns using callable
 
#Filtering data using callable function
df.iloc[lambda df: list(df.age > 24), :]\n <span style="background-color: inherit; font-size: inherit;">\n \n </span style="background-color: inherit; font-size: inherit;">
Copy code
2022_09_image-147.jpg

loc and iloc are interchangeable when the labels of the DataFrame are 0-based integers

For demonstration, let’s create a new DataFrame from the above example with 0-based integers as labels:

 
#Create a DataFrame with 0-based integers as headers and index labels
df.to_csv('data.csv')
data = pd.read_csv('data.csv', header=None)
data
Copy code
2022_09_image-148.jpg

When the header is specified to None, Pandas will generate 0-based integer values as headers. 

Now, loc[ ] can accept integer values as labels:

 
data.loc[3, 5]
Copy code
2022_09_image-149.jpg

Here, integer values 3 and 5 are interpreted as labels of the index. Hence, in this case loc[ ] and iloc[ ] are interchangeable:

 
data.loc[3, 5] == data.iloc[3, 5]
Copy code
2022_09_image-150.jpg

Endnotes

Pandas is a very powerful data processing tool for the Python programming language. It provides a rich set of functions to process and manipulate data for analysis. Hope this article on the difference between loc and iloc in Pandas gave you relevant insight on how to use loc[ ] and iloc[ ] methods for data selection.


Top Trending Articles:

Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio