# Sorting Data using Pandas

In this article, we will discuss how to do sorting in Pandas. Sorting makes it easier to comprehend and analyze data.

We are already aware that Pandas DataFrames are tabular data structures similar to an Excel or CSV file – storing data in rows and columns. Sorting is a common excel operation that involves ordering data in ascending or descending manner.

Sorting makes it easier to comprehend and analyze data, which is very useful for Data Scientists.

Today, we are going to see how you can perform sorting using Pandas, the popular Python library mainly used for data pre-processing purposes such as data cleaning, manipulation, and transformation.

**Table of Content**

- Sorting by a Column
- Sorting by Multiple Columns
- Handling Missing Values in Sorting
- Sorting by Index
- Sorting by Column Names
- Sorting through a Sorting Algorithm
- Sorting in-place
- Other Sorting Methods

For our purpose today, we are going to use a Kaggle dataset. This dataset contains the data on bank customers. Let’s load this dataset into the Pandas DataFrame as shown below:

#Importing Pandas Library import pandas as pd #Loading the dataset df = pd.read_csv("Churn Modeling.csv") df.head()

Use **info()** to get information about the dataset:

df.info()

We can see all the 14 columns listed above along with their data types.

Let’s check if our data contains any null values:

df.isna().any()

Now that it’s established that there are null values in our dataset, let’s start with understanding how sorting works in Pandas – The **pandas.sort_values()** method is used to sort a DataFrame by its column or row values.

Let’s see how:

**Sorting by a Column**

Let’s sort our DataFrame by the *‘ Balance‘* column, as shown:

balance = df.sort_values(by = 'Balance') balance.head(10)

From what we can infer looking at the output above, is that the lowest balance is, *obviously*, zero.

By default, sorting always happens in ascending order, unless mentioned otherwise:

balance = df.sort_values(by = 'Balance', ascending=False) balance.head(10)

**Sorting by Multiple Columns**

We can also sort our DataFrame by more than one column at a time:

df.sort_values(by=['Geography','CreditScore']).head(10)

Sorting by multiple column with different sort orders

When we are sorting by multiple columns, we can pass different sort orders for each of them:

df.sort_values(by=['Geography','CreditScore'], ascending=[False, True]).head(10)

As you can see above, the DataFrame is sorted on the *‘Credit Score’* column in ascending order and on the *‘Geography’* column in descending order.

**Handling Missing Values in Sorting**

If your data contains null values, you can specify the *na_position* parameter as first or last. This way, you can choose to put NaNs at the beginning or at the end.

Our DataFrame today does not have any null values, but the following is the code snippet for handling missing values if there were:

#NaN placed first df.sort_values(by='Balance', na_position='first') #NaN placed in the end df.sort_values(by='Balance', na_position='last')

**Sorting by Index**

You can also sort your DataFrame by its index. Let’s see how this is done by using the *balance* DataFrame we created above:

balance.sort_index()

Alternatively, you can sort the index in descending order by passing in the ascending = False argument in the pandas.sort_index() function above.

Ignore the index while sorting

When sorting the DataFrame, the index column can also be ignored entirely.

This can be done by passing in the ignore_index=True argument in the **sort_values()** function.

This will keep the order of the index from *0* to *n-1* where *n* refers to the number of observations:

df.sort_values(by='CreditScore', ascending=False, ignore_index=True).head()

**Sorting by Column Names**

The **sort_index()** method can also be used to sort the DataFrame using the column names instead of rows.

For this, we need to set the axis parameter to 1:

df.sort_index(axis=1).head(10)

**Sorting through a Sorting Algorithm**

The **sort_values()** method offers a *kind* parameter that can take three algorithms:

- quicksort
*(default)* - mergesort
- heapsort

df.sort_values(by='Age', kind='heapsort')

Note that this option is only applied when sorting on a single column or label.

**Sorting in-place**

The **sort_values()** method offers an *inplace* parameter. When inplace = False, the operations take place on a copy of the DataFrame, which is then returned. The original DataFrame remains unchanged.

When inplace = True, modifications are done in-place. This means that nothing is returned, and the existing DataFrame gets updated:

df.sort_values(by='Age', inplace=True) df.head()

**Other Sorting Methods**

**nlargest() method:**

**nlargest() method:**

This method works returns the first *n *largest values in descending order, where *n* is the input from the user.

df['Balance'].nlargest(10)

The output displays the 10 largest values of the *‘Balance’* column.

**nsmallest() method:**

This method works returns the first *n *smallest values in descending order, where *n* is the input from the user.

df['Balance'].nsmallest(10)

The output displays the 10 smallest values of the *‘Balance’* column, which are all zero.

**Conclusion**

Sorting is an essential step during your data analysis. In this article, we looked at how sorting is done using the sort_values() as well as the sort_index() methods along with their parameters. Pandas is a very powerful data processing tool and provides a rich set of functions to process and manipulate data for analysis.

**Top Trending Articles:**

Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst

**About the Author**

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio