Sorting Data using Pandas

4 mins read802 Views Comment

Call 8585951111Got Doubts?

Updated on Oct 3, 2023 12:05 IST

In this article, we will discuss how to do sorting in Pandas. Sorting makes it easier to comprehend and analyze data.

We are already aware that Pandas DataFrames are tabular data structures similar to an Excel or CSV file – storing data in rows and columns. Sorting is a common excel operation that involves ordering data in ascending or descending manner.

Sorting makes it easier to comprehend and analyze data, which is very useful for Data Scientists.

Today, we are going to see how you can perform sorting using Pandas, the popular Python library mainly used for data pre-processing purposes such as data cleaning, manipulation, and transformation.

For our purpose today, we are going to use a Kaggle dataset. This dataset contains the data on bank customers. Let’s load this dataset into the Pandas DataFrame as shown below:

#Importing Pandas Library
import pandas as pd
 
#Loading the dataset
df = pd.read_csv("Churn Modeling.csv")
df.head()

Use info() to get information about the dataset:

Stay updated with the latest blogs on online courses and skills

Enter Mobile Number

df.info()

We can see all the 14 columns listed above along with their data types.

Let’s check if our data contains any null values:

df.isna().any()

Now that it’s established that there are null values in our dataset, let’s start with understanding how sorting works in Pandas – The pandas.sort_values() method is used to sort a DataFrame by its column or row values.

Let’s see how:

Recommended online courses

Best-suited Python courses for you

Learn Python with these high-rated online courses

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

3 months

Python for beginners

MicrosoftCertificate

4.9

Total Fees

Free

Duration

3 hours

Python for Data Science, AI & Development

IBMCertificate

4.7

Total Fees

Free

Duration

19 hours

Programming for Everybody (Getting Started with Python)

CourseraCertificate

4.5

Total Fees

– / –

Duration

19 hours

PCAP: Programming Essentials in Python

CISCO Regional Academy ( Centre for Electronic Governance)Certificate

4.6

Total Fees

Free

Duration

75 hours

PLC Programming

CRISPCertificate

Total Fees

₹3 K

Duration

3 weeks

Financial Modelling with Python & Excel : DCF Valuation

UDEMYCertificate

Total Fees

₹3 K

Duration

10 hours

Python For Absolute Beginners

AICTECertificate

4.0

Total Fees

Free

Duration

2 hours

python-training

Naresh i TechnologiesCertificate

4.5

Total Fees

– / –

Duration

5 weeks

Python: Code Your Future

IBMCertificate

Total Fees

₹5.7 K

Duration

3 months

Sorting by a Column

Let’s sort our DataFrame by the ‘Balance‘ column, as shown:

balance = df.sort_values(by = 'Balance')
balance.head(10)

From what we can infer looking at the output above, is that the lowest balance is, obviously, zero.

By default, sorting always happens in ascending order, unless mentioned otherwise:

balance = df.sort_values(by = 'Balance', ascending=False)
balance.head(10)

Sorting by Multiple Columns

We can also sort our DataFrame by more than one column at a time:

df.sort_values(by=['Geography','CreditScore']).head(10)

Sorting by multiple column with different sort orders

When we are sorting by multiple columns, we can pass different sort orders for each of them:

df.sort_values(by=['Geography','CreditScore'],
        ascending=[False, True]).head(10)

As you can see above, the DataFrame is sorted on the ‘Credit Score’ column in ascending order and on the ‘Geography’ column in descending order.

Handling Missing Values in Sorting

If your data contains null values, you can specify the na_position parameter as first or last. This way, you can choose to put NaNs at the beginning or at the end.

Our DataFrame today does not have any null values, but the following is the code snippet for handling missing values if there were:

#NaN placed first
df.sort_values(by='Balance', na_position='first')
 
#NaN placed in the end
df.sort_values(by='Balance', na_position='last')

Sorting by Index

You can also sort your DataFrame by its index. Let’s see how this is done by using the balance DataFrame we created above:

balance.sort_index()

Alternatively, you can sort the index in descending order by passing in the ascending = False argument in the pandas.sort_index() function above.

Ignore the index while sorting

When sorting the DataFrame, the index column can also be ignored entirely.

This can be done by passing in the ignore_index=True argument in the sort_values() function.

This will keep the order of the index from 0 to n-1 where n refers to the number of observations:

df.sort_values(by='CreditScore', ascending=False, ignore_index=True).head()

Sorting by Column Names

The sort_index() method can also be used to sort the DataFrame using the column names instead of rows.

For this, we need to set the axis parameter to 1:

df.sort_index(axis=1).head(10)

Sorting through a Sorting Algorithm

The sort_values() method offers a kind parameter that can take three algorithms:

quicksort (default)
mergesort
heapsort

df.sort_values(by='Age', kind='heapsort')

Note that this option is only applied when sorting on a single column or label.

Sorting in-place

The sort_values() method offers an inplace parameter. When inplace = False, the operations take place on a copy of the DataFrame, which is then returned. The original DataFrame remains unchanged.

When inplace = True, modifications are done in-place. This means that nothing is returned, and the existing DataFrame gets updated:

df.sort_values(by='Age', inplace=True)
df.head()

Other Sorting Methods

nlargest() method:

This method works returns the first n largest values in descending order, where n is the input from the user.

df['Balance'].nlargest(10)

The output displays the 10 largest values of the ‘Balance’ column.

nsmallest() method:

This method works returns the first n smallest values in descending order, where n is the input from the user.

df['Balance'].nsmallest(10)

The output displays the 10 smallest values of the ‘Balance’ column, which are all zero.

Conclusion

Sorting is an essential step during your data analysis. In this article, we looked at how sorting is done using the sort_values() as well as the sort_index() methods along with their parameters. Pandas is a very powerful data processing tool and provides a rich set of functions to process and manipulate data for analysis.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski Read Full Bio

Sorting Data using Pandas

Table of Content

Best-suited Python courses for you

Databases and SQL for Data Science with Python

Python for beginners

Python for Data Science, AI & Development

Programming for Everybody (Getting Started with Python)

PCAP: Programming Essentials in Python

PLC Programming

Financial Modelling with Python & Excel : DCF Valuation

Python For Absolute Beginners

python-training

Python: Code Your Future

Sorting by a Column

Sorting by Multiple Columns

Handling Missing Values in Sorting

Sorting by Index

Sorting by Column Names

Sorting through a Sorting Algorithm

Sorting in-place

Other Sorting Methods

nlargest() method:

nsmallest() method:

Conclusion

Comments

Top Picks & New Arrivals