How to Read and Write Files Using Pandas

4 mins read2.1K Views Comment

Call 8585951111Got Doubts?

Updated on Jan 24, 2023 13:00 IST

In this tutorial, we are going to see how to read data and files using Pandas.

Pandas is a very popular Python library that offers a set of functions and data structures that aid in data analysis more efficiently. The Pandas package is mainly used for data pre-processing purposes such as data cleaning, manipulation, and transformation. Hence, it is a very handy tool for data scientists and analysts. Let’s find out how to read and write files using pandas.

We will cover the following sections:

Data Structures in Pandas
Writing a File Using Pandas
Reading a File Using Pandas
Importing a CSV File into the DataFrame
Endnotes

Data Structures in Pandas

There are two main types of Data Structures in Pandas –

Pandas Series: 1D labeled homogeneous array, size-immutable
Pandas DataFrame: 2D labeled tabular structure, size-mutable

Mutability refers to the tendency to change. When we say a value is mutable, it means that it can be changed.

DataFrame is a widely used and one of the most important data structures. It stores data as stored in Excel with rows and columns.

Let’s see how we create a DataFrame using Pandas, shall we?

Importing Pandas Library
Creating a Pandas DataFrame

Importing Pandas Library

#Importing Pandas Library
import pandas as pd
Copy code

Creating a Pandas DataFrame

#Creating a Sample DataFrame
data = pd.DataFrame({
    'id': [ 1, 2, 3, 4, 5, 6, 7],
    'age': [ 27, 32, 23, 41, 37, 31, 49],
    'gender': [ 'M', 'F', 'F', 'M', 'M', 'M', 'F'],
    'occupation': [ 'Salesman', 'Doctor', 'Manager', 'Teacher', 'Mechanic', 'Lawyer', 'Nurse']
})
 
data
Copy code

Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

MCA in Machine Learning & Artificial Intelligence (ML & AI) (Online MCA)

TCS ionDegree

Total Fees

₹2.75 L

Duration

2 years

Writing a File Using Pandas

Save the DataFrame we created above as a CSV file using pandas .to_csv() function, as shown:

Stay updated with the latest blogs on online courses and skills

Enter Mobile Number

#Writing to CSV file
data.to_csv('data.csv')
Copy code

We can also save the DataFrame as an Excel file using pandas .to_excel() function, as shown:

#Writing to Excel file
data.to_excel('data2.xlsx')
Copy code

Save the DataFrame we created above as a Text file using the same function that we use for CSV files:

#Writing to Text file
data.to_csv('data3.txt')
Copy code

There are various other file formats that you can write your data to. For example:

.to_json()
.to_html()
.to_sql()

Take note that this isn’t an exhaustive list. There are more formats you can write to, but they are out of the scope of this article.

Reading a File Using Pandas

Once your data is saved in a file, you can load it whenever required using pandas .read_csv() function:

#Reading the CSV file
df = pd.read_csv('data.csv')
df
Copy code

You can use this function to load your Text file as well.

But to load your Excel file, you will use the pandas .read_excel() function:

#Reading the Excel file
df2 = pd.read_excel('data2.xlsx')
df2
Copy code

There are various other file formats that you can import your data from. For example:

.read_json()
.read_html()
.read_sql()

Take note that this isn’t an exhaustive list either. There are more formats you can read from, but they are out of the scope of this article.

For now, we are going to focus on the most common data format you are going to work with – the CSV format.

File path

We have seen above how you can read a CSV file into a DataFrame. You can specify the path where your file is stored when you’re using the .read_csv() function:

#Specifying the file path
df.read_csv('C:/Users/abc/Desktop/file_name.csv')
Copy code

Header

You can use the header parameter to specify the row that will be used as column names for your DataFrame. By default, header=0 which means that the first row is considered as the header.

When the header is specified as None, there will be no header:

df = pd.read_csv('data.csv', header=None)
df
Copy code

Column delimiter

You can use the sep parameter to specify a custom delimiter for the CSV input. By default, it is a comma.

#Use tab to separate values
df = pd.read_csv('data.csv', header=None, sep='\t')
df
Copy code

Difference between loc and iloc in Pandas

In Pandas, loc is used to access rows and columns by labels or boolean arrays, while iloc is used to access rows and columns by integer positions. Both functions facilitate...read more

Read Later

Data Cleaning Using Pandas

Data preparation involves data collection and data cleaning. When working with multiple sources of data, there are instances where the collected data could be incorrect, mislabeled, or even duplicated. This...read more

Read Later

Index column

The index_col parameter allows you to set the columns that can be used as the index of the DataFrame. By default, it is set to None. It can be set as a column name or column index:

df = pd.read_csv('data.csv', index_col='gender')
df
Copy code

Select columns

The usecols parameter specifies which columns to import to the DataFrame. It can be set as a list of column names or column indices:

df = pd.read_csv('data.csv', usecols=['age','gender'])
df
Copy code

Omit rows

The skiprows parameter specifies the number of rows to skip at the start of the file:

df = pd.read_csv('data.csv', skiprows=2)
df
Copy code

The skipfooter parameter specifies the number of rows to skip at the end of the file:

df = pd.read_csv('data.csv', skipfooter=2)
df
Copy code

The nrows parameter specifies the number of rows to read:

df = pd.read_csv('data.csv', nrows=3)
df
Copy code

Missing values

The na_values parameter specifies which values are to be considered as NaN. It can be set as a list of strings:

df = pd.read_csv('data.csv', na_values=['Doctor', 23])
df
Copy code

Alter values

The converters parameter converts the column values according to the defined functions:

#A function that reduces each value by 3
f = lambda x: int(x) – 3
 
df = pd.read_csv('data.csv', converters={'age':f})
df
Copy code

Compress and decompress files

You can compress the DataFrame when writing to a file. The .read_csv() function decompresses the file before reading it into the DataFrame.

The compression parameter decompresses the file and can be set to any of the following methods:

{'infer', 'gzip', 'bz2', 'zip', 'xz', 'zstd', None, dict}

infer is the default value.

#Compress the DataFrame while writing to the CSV file
df.to_csv('data.csv.zip')
 
#Decompress the file while reading
df = pd.read_csv('data.csv.zip', compression='zip')
df
Copy code

Compressing the files particular helps when you’re dealing with big data problems.

Endnotes

Pandas is a very powerful data processing tool for the Python programming language. It provides a rich set of functions to import and process various types of file formats from multiple data sources. The Pandas library is specifically useful for data scientists working with data cleaning and analysis. Hope this article on read and write files using pandas was helpful to you.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski Read Full Bio