Sorting Data using Pandas

Sorting Data using Pandas

4 mins read802 Views Comment
Updated on Oct 3, 2023 12:05 IST

In this article, we will discuss how to do sorting in Pandas. Sorting makes it easier to comprehend and analyze data.

2022_03_PANDAS-SORTING.jpg

We are already aware that Pandas DataFrames are tabular data structures similar to an Excel or CSV file – storing data in rows and columns. Sorting is a common excel operation that involves ordering data in ascending or descending manner.

Sorting makes it easier to comprehend and analyze data, which is very useful for Data Scientists.

Today, we are going to see how you can perform sorting using Pandas, the popular Python library mainly used for data pre-processing purposes such as data cleaning, manipulation, and transformation.

Table of Content

For our purpose today, we are going to use a Kaggle dataset. This dataset contains the data on bank customers. Let’s load this dataset into the Pandas DataFrame as shown below:

#Importing Pandas Library
import pandas as pd
 
#Loading the dataset
df = pd.read_csv("Churn Modeling.csv")
df.head()
2022_03_dfdata_sorting-in-pandas.jpg

Use info() to get information about the dataset:

df.info()
2022_03_df_info_sorting-in-pandas.jpg

We can see all the 14 columns listed above along with their data types.

Let’s check if our data contains any null values:

df.isna().any()
2022_03_isna_sorting.jpg

Now that it’s established that there are null values in our dataset, let’s start with understanding how sorting works in Pandas – The pandas.sort_values() method is used to sort a DataFrame by its column or row values.

Let’s see how:

Sorting by a Column

Let’s sort our DataFrame by the Balance column, as shown:

balance = df.sort_values(by = 'Balance')
balance.head(10)
2022_03_df_sort_values_sorting.jpg

From what we can infer looking at the output above, is that the lowest balance is, obviously, zero.

By default, sorting always happens in ascending order, unless mentioned otherwise:

balance = df.sort_values(by = 'Balance', ascending=False)
balance.head(10)
2022_03_sort_values_sorting-in-pandas.jpg

Sorting by Multiple Columns

We can also sort our DataFrame by more than one column at a time:

df.sort_values(by=['Geography','CreditScore']).head(10)
2022_03_sorting-by-multiple-columns.jpg

Sorting by multiple column with different sort orders

When we are sorting by multiple columns, we can pass different sort orders for each of them:

df.sort_values(by=['Geography','CreditScore'],
        ascending=[False, True]).head(10)
2022_03_sorting-by-multiple-sort-type.jpg

As you can see above, the DataFrame is sorted on the ‘Credit Score’ column in ascending order and on the ‘Geography’ column in descending order.

Handling Missing Values in Sorting

If your data contains null values, you can specify the na_position parameter as first or last. This way, you can choose to put NaNs at the beginning or at the end.

Our DataFrame today does not have any null values, but the following is the code snippet for handling missing values if there were:

#NaN placed first
df.sort_values(by='Balance', na_position='first')
 
#NaN placed in the end
df.sort_values(by='Balance', na_position='last')

Sorting by Index

You can also sort your DataFrame by its index. Let’s see how this is done by using the balance DataFrame we created above:

balance.sort_index()
2022_03_sorting-by-index.jpg

Alternatively, you can sort the index in descending order by passing in the ascending = False argument in the pandas.sort_index() function above.

Ignore the index while sorting

When sorting the DataFrame, the index column can also be ignored entirely.

This can be done by passing in the ignore_index=True argument in the sort_values() function.

This will keep the order of the index from 0 to n-1 where n refers to the number of observations:

df.sort_values(by='CreditScore', ascending=False, ignore_index=True).head()
2022_03_alternative_sorting-by-index.jpg

Sorting by Column Names

The sort_index() method can also be used to sort the DataFrame using the column names instead of rows.

For this, we need to set the axis parameter to 1:

df.sort_index(axis=1).head(10)
2022_03_sorting-by-column-names.jpg

Sorting through a Sorting Algorithm

The sort_values() method offers a kind parameter that can take three algorithms:

  • quicksort (default)
  • mergesort
  • heapsort
df.sort_values(by='Age', kind='heapsort')
2022_03_sorting-through-sorting-algorithm.jpg

Note that this option is only applied when sorting on a single column or label.

Sorting in-place

The sort_values() method offers an inplace parameter. When inplace = False, the operations take place on a copy of the DataFrame, which is then returned. The original DataFrame remains unchanged.

When inplace = True, modifications are done in-place. This means that nothing is returned, and the existing DataFrame gets updated:

df.sort_values(by='Age', inplace=True)
df.head()
2022_03_sorting_in_place.jpg

Other Sorting Methods

nlargest() method:

This method works returns the first n largest values in descending order, where n is the input from the user.

df['Balance'].nlargest(10)
2022_03_nlargest.jpg

The output displays the 10 largest values of the ‘Balance’ column.

nsmallest() method:

This method works returns the first n smallest values in descending order, where n is the input from the user.

df['Balance'].nsmallest(10)
2022_03_nsmallest.jpg

The output displays the 10 smallest values of the ‘Balance’ column, which are all zero.

Conclusion

Sorting is an essential step during your data analysis. In this article, we looked at how sorting is done using the sort_values() as well as the sort_index() methods along with their parameters. Pandas is a very powerful data processing tool and provides a rich set of functions to process and manipulate data for analysis.

Top Trending Articles:
Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst
About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio