Techno Blender
Digitally Yours.

How To Load Data From Text File into Pandas

0 51


Loading data stored in text files into pandas DataFrames with Python

Photo by Bruce Hong on Unsplash

Pandas has been the de-facto Python package the lets users perform data transformation and analysis in-memory. In many cases, this data initially resides into external sources, such as text files. Through its powerful API, pandas allows users to load data from such sources with various methods.

In today’s article we will demonstrate how to use some of these methods in order to load data from text files into pandas DataFrames. Additionally, we will discuss about how to deal with delimiters and column names (aka headers).

First, let’s create an example text file called employees.txt that we will be using in today’s short tutorial in order to demonstrate a few concepts. Note that the fields are separated by a single space character and the first line corresponds to the headers.

name surname dob department
George Brown 12/02/1993 Engineering
Andrew Black 15/04/1975 HR
Maria Green 12/02/1989 Engineering
Helen Fox 21/10/2000 Marketing
Joe Xiu 10/11/1998 Engineering
Ben Simsons 01/12/1987 Engineering
Jess Middleton 12/12/1997 Marketing

Using read_csv()

A comma separated file (csv) is on fact a text file that uses commas as delimiters in order to separate the record values for each field. Therefore, it then makes sense to use pandas.read_csv() method in order to load data from a text file, even if the file itself does not have a .csv extension.

In order to read our text file and load it into a pandas DataFrame all we need to provide to the read_csv() method is the filename, the separator/delimiter (which in our case is a whitespace) and the row containing the columns names which seems to be the first row.

import pandas as pddf = pd.read_csv('employees.txt', sep=' ', header=0)print(df)
name surname dob department
0 George Brown 12/02/1993 Engineering
1 Andrew Black 15/04/1975 HR
2 Maria Green 12/02/1989 Engineering
3 Helen Fox 21/10/2000 Marketing
4 Joe Xiu 10/11/1998 Engineering
5 Ben Simsons 01/12/1987 Engineering
6 Jess Middleton 12/12/1997 Marketing

Note that if the file you are looking into loading as a pandas DataFrame has a different separator, such as a comma ,, colon : or tab \t, all you need to do is specify that character in sep or delimiter argument when calling read_csv().

Using read_table()

Alternatively, you can take advantage of pandas.read_table() method that is used to read general delimited file into a pandas DataFrame.

import pandas as pd df = pd.read_table('employees.txt', sep=' ', header=0)print(df)
name surname dob department
0 George Brown 12/02/1993 Engineering
1 Andrew Black 15/04/1975 HR
2 Maria Green 12/02/1989 Engineering
3 Helen Fox 21/10/2000 Marketing
4 Joe Xiu 10/11/1998 Engineering
5 Ben Simsons 01/12/1987 Engineering
6 Jess Middleton 12/12/1997 Marketing

Final Thoughts

One of the most common ways to create a pandas DataFrame is by loading data stored in external sources, such as text or csv files. In today’s short tutorial we went through a step by step process that can eventually help you construct a pandas DataFrame by loading data from a text file where every field is separated by a specific character (tab, space or whatever).

Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Related articles you may also like


Loading data stored in text files into pandas DataFrames with Python

Photo by Bruce Hong on Unsplash

Pandas has been the de-facto Python package the lets users perform data transformation and analysis in-memory. In many cases, this data initially resides into external sources, such as text files. Through its powerful API, pandas allows users to load data from such sources with various methods.

In today’s article we will demonstrate how to use some of these methods in order to load data from text files into pandas DataFrames. Additionally, we will discuss about how to deal with delimiters and column names (aka headers).

First, let’s create an example text file called employees.txt that we will be using in today’s short tutorial in order to demonstrate a few concepts. Note that the fields are separated by a single space character and the first line corresponds to the headers.

name surname dob department
George Brown 12/02/1993 Engineering
Andrew Black 15/04/1975 HR
Maria Green 12/02/1989 Engineering
Helen Fox 21/10/2000 Marketing
Joe Xiu 10/11/1998 Engineering
Ben Simsons 01/12/1987 Engineering
Jess Middleton 12/12/1997 Marketing

Using read_csv()

A comma separated file (csv) is on fact a text file that uses commas as delimiters in order to separate the record values for each field. Therefore, it then makes sense to use pandas.read_csv() method in order to load data from a text file, even if the file itself does not have a .csv extension.

In order to read our text file and load it into a pandas DataFrame all we need to provide to the read_csv() method is the filename, the separator/delimiter (which in our case is a whitespace) and the row containing the columns names which seems to be the first row.

import pandas as pddf = pd.read_csv('employees.txt', sep=' ', header=0)print(df)
name surname dob department
0 George Brown 12/02/1993 Engineering
1 Andrew Black 15/04/1975 HR
2 Maria Green 12/02/1989 Engineering
3 Helen Fox 21/10/2000 Marketing
4 Joe Xiu 10/11/1998 Engineering
5 Ben Simsons 01/12/1987 Engineering
6 Jess Middleton 12/12/1997 Marketing

Note that if the file you are looking into loading as a pandas DataFrame has a different separator, such as a comma ,, colon : or tab \t, all you need to do is specify that character in sep or delimiter argument when calling read_csv().

Using read_table()

Alternatively, you can take advantage of pandas.read_table() method that is used to read general delimited file into a pandas DataFrame.

import pandas as pd df = pd.read_table('employees.txt', sep=' ', header=0)print(df)
name surname dob department
0 George Brown 12/02/1993 Engineering
1 Andrew Black 15/04/1975 HR
2 Maria Green 12/02/1989 Engineering
3 Helen Fox 21/10/2000 Marketing
4 Joe Xiu 10/11/1998 Engineering
5 Ben Simsons 01/12/1987 Engineering
6 Jess Middleton 12/12/1997 Marketing

Final Thoughts

One of the most common ways to create a pandas DataFrame is by loading data stored in external sources, such as text or csv files. In today’s short tutorial we went through a step by step process that can eventually help you construct a pandas DataFrame by loading data from a text file where every field is separated by a specific character (tab, space or whatever).

Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Related articles you may also like

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment