Techno Blender
Digitally Yours.

Reshaping a Pandas Dataframe: Long-to-Wide and Vice Versa | by Sharone Li | May, 2022

0 78


Pivot/Unpivot Pandas Dataframes with These Two Simple Methods

Image by Pixabay (Modified by Author)

Reshaping a pandas dataframe is one of the most common data wrangling tasks in the data analysis world. It is also referred to as transposing or pivoting/unpivoting a table from long to wide or from wide to long format. So what is a long data format vs. a wide data format and how do we reshape a dataframe from long-to-wide and vice versa?

Let’s take a look at a simple example below. The example shows the average food price across all the U.S. cities in 5 food categories from Jan. 2020 to Apr. 2022.

Image by Author

The dataframe on the left has a long format. The ‘Series ID’ and ‘Item’ columns represent the food category. The ‘Year Month’ is a single column that has all the months from Jan. 2020 to Apr. 2022, and the ‘Avg. Price ($)’ has a value corresponding to each month in the ‘Year Month’ column.

Notice how the dataframe on the left is structured in a long format: each food category (‘Item’) has multiple repeating rows, each of which represents a specific year/month and the average food price corresponding to that year/month. Though we only have 5 food categories (‘items’), we have a total of 139 rows, making the dataframe a ‘long’ shape.

In contrast, The dataframe on the right-hand side has a wide format — more like a spreadsheet-style format. In this format, each row represents a unique food category. We pivot the ‘Year Month’ column in the left dataframe so that each month is in a separate column — making the right dataframe a ‘wide’ shape. The values of the ‘Year Month’ column in the left table now become the column names in the right table and we have the ‘avg. price’ for each Month/Year column accordingly.

Now that we understand what a long vs. wide data format is, let’s see how we can toggle between the two formats easily in Pandas. We’ll use the sample dataset shown above as an example. You can download the sample dataset here. Let’s first read the raw CSV file into a Pandas dataframe and do some light massaging on the data:

Image by Author

Reshape From Long to Wide:

As explained in the previous section, this dataframe has a long format. To reshape the dataframe from long to wide in Pandas, we can use Pandas’ pd.pivot() method.

pd.pivot(df, index=, columns=, values=)

columns: Column to use to make new frame’s columns (e.g., ‘Year Month’).

values: Column(s) to use for populating new frame’s values (e.g., ‘Avg. Price ($)).

index: Column to use to make new frame’s index (e.g., ‘Series ID’ and ‘Item’). If None, use the existing index.

Image by Author

Reshape from Wide to Long:

Now how do we unpivot the wide-format data back to the long format? To reshape a dataframe from wide to long, we can use Pandas’ pd.melt() method.

pd.melt(df, id_vars=, value_vars=, var_name=, value_name=, ignore_index=)

id_vars: Column(s) to use as identifier variables

value_vars: Column(s) to unpivot. In our example, it would be the list of year/month columns (‘2020 Jan’, ‘2020 Feb’, ‘2020 Mar’, etc.)

var_name: Name to use for the ‘variable’ column

value_name : Name to use for the ‘value’ column

ignore_index: If ‘True’, original index is ignored. If ‘False’, the original index is retained

Image by Author

To summarize, if you need to reshape a Pandas dataframe from long to wide, use pd.pivot(). If you need to reshape a Pandas dataframe from wide to long, use pd.melt(). Thanks for reading and I hope you find this short pandas tutorial helpful!

Data Source:

U.S. BUREAU OF LABOR STATISTICS: SURVEY: Consumer Price Index — Average Price Data (https://www.bls.gov/data/). This is a public, open dataset that can be retrieved using the BLS public data API. The BLS Public Data API gives the public access to raw economic data from all BLS programs. No license is needed.

You can unlock full access to my writing and the rest of Medium by signing up for Medium membership ($5 per month) through this referral link. By signing up through this link, I will receive a portion of your membership fee at no additional cost to you. Thank you!


Pivot/Unpivot Pandas Dataframes with These Two Simple Methods

Image by Pixabay (Modified by Author)

Reshaping a pandas dataframe is one of the most common data wrangling tasks in the data analysis world. It is also referred to as transposing or pivoting/unpivoting a table from long to wide or from wide to long format. So what is a long data format vs. a wide data format and how do we reshape a dataframe from long-to-wide and vice versa?

Let’s take a look at a simple example below. The example shows the average food price across all the U.S. cities in 5 food categories from Jan. 2020 to Apr. 2022.

Image by Author

The dataframe on the left has a long format. The ‘Series ID’ and ‘Item’ columns represent the food category. The ‘Year Month’ is a single column that has all the months from Jan. 2020 to Apr. 2022, and the ‘Avg. Price ($)’ has a value corresponding to each month in the ‘Year Month’ column.

Notice how the dataframe on the left is structured in a long format: each food category (‘Item’) has multiple repeating rows, each of which represents a specific year/month and the average food price corresponding to that year/month. Though we only have 5 food categories (‘items’), we have a total of 139 rows, making the dataframe a ‘long’ shape.

In contrast, The dataframe on the right-hand side has a wide format — more like a spreadsheet-style format. In this format, each row represents a unique food category. We pivot the ‘Year Month’ column in the left dataframe so that each month is in a separate column — making the right dataframe a ‘wide’ shape. The values of the ‘Year Month’ column in the left table now become the column names in the right table and we have the ‘avg. price’ for each Month/Year column accordingly.

Now that we understand what a long vs. wide data format is, let’s see how we can toggle between the two formats easily in Pandas. We’ll use the sample dataset shown above as an example. You can download the sample dataset here. Let’s first read the raw CSV file into a Pandas dataframe and do some light massaging on the data:

Image by Author

Reshape From Long to Wide:

As explained in the previous section, this dataframe has a long format. To reshape the dataframe from long to wide in Pandas, we can use Pandas’ pd.pivot() method.

pd.pivot(df, index=, columns=, values=)

columns: Column to use to make new frame’s columns (e.g., ‘Year Month’).

values: Column(s) to use for populating new frame’s values (e.g., ‘Avg. Price ($)).

index: Column to use to make new frame’s index (e.g., ‘Series ID’ and ‘Item’). If None, use the existing index.

Image by Author

Reshape from Wide to Long:

Now how do we unpivot the wide-format data back to the long format? To reshape a dataframe from wide to long, we can use Pandas’ pd.melt() method.

pd.melt(df, id_vars=, value_vars=, var_name=, value_name=, ignore_index=)

id_vars: Column(s) to use as identifier variables

value_vars: Column(s) to unpivot. In our example, it would be the list of year/month columns (‘2020 Jan’, ‘2020 Feb’, ‘2020 Mar’, etc.)

var_name: Name to use for the ‘variable’ column

value_name : Name to use for the ‘value’ column

ignore_index: If ‘True’, original index is ignored. If ‘False’, the original index is retained

Image by Author

To summarize, if you need to reshape a Pandas dataframe from long to wide, use pd.pivot(). If you need to reshape a Pandas dataframe from wide to long, use pd.melt(). Thanks for reading and I hope you find this short pandas tutorial helpful!

Data Source:

U.S. BUREAU OF LABOR STATISTICS: SURVEY: Consumer Price Index — Average Price Data (https://www.bls.gov/data/). This is a public, open dataset that can be retrieved using the BLS public data API. The BLS Public Data API gives the public access to raw economic data from all BLS programs. No license is needed.

You can unlock full access to my writing and the rest of Medium by signing up for Medium membership ($5 per month) through this referral link. By signing up through this link, I will receive a portion of your membership fee at no additional cost to you. Thank you!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment