Pandas Exercise for Data Scientists — Part 1 | by Avi Chawla | Jun, 2022

By Jessie Hobb On Jun 8, 2022

A set of challenging Pandas Questions

Pandas library has always intrigued Data Scientists to do amazing things with it. It is undoubtedly the go-to tool for tabular data handling, manipulation, and processing.

Therefore, to scale your expertise, challenge your existing knowledge, and introduce you to numerous popular Pandas functions among Data Scientists, I am presenting Part 1 of the Pandas Exercise. The objective is to strengthen your logical muscle and to help internalize data manipulation with one of the best Python packages for data analysis.

Find the notebook with all questions for this quiz here: GitHub.

Table of Contents:

1. Sort DataFrame based on another list
2. Insert a column at a specific location in a DataFrame
3. Select columns based on the column’s Data Type
4. Count the number of Non-NaN cells for each column
5. Split DataFrame into equal parts
6. Reverse DataFrame row-wise or column-wise
7. Rearrange columns of a DataFrame
8. Get alternate rows of a DataFrame
9. Insert a row at an arbitrary position
10. Apply function to every cell of DataFrame

As an exercise, I recommend you attempt the questions yourself first and then look at the solution I have provided.

Note that the solutions I have provided here may not be the only way to solve the problem. You may come up with something different and still be correct. However, if that happens, do drop a comment, and I’ll be interested to know your approach.

Let’s begin!

Prompt: You are given a DataFrame. Additionally, you also have a list that contains all the unique values of a particular column of the DataFrame. Sort the DataFrame such that the values in the column appear in the same order as they do in the given list.

Input and Expected Output:

Solution:

The idea here is to generate a series from the given list. Each index will denote the character, and the corresponding value will indicate the position. Using this, we can map the original DataFrame to the generated series and pass it to the sort_values() method for reference, as shown below:

P.S. We can also solve this using merge. Do let me know in the comments if you can figure that out.

Prompt: Assume that you again have a similar DataFrame as used above. Additionally, you are given a list whose size is the same as the number of rows in the given DataFrame. The task is to insert the given list as a new column at a given position of the DataFrame.

Input and Expected Output:

Solution:

Here, we can use the insert() method and pass the position, column_name, and the values as arguments as shown below:

Prompt: We all are familiar with row-based filtering, aren’t we? Well, let’s try something else. Your task is to filter all the columns from a DataFrame whose entries adhere to a given data type.

Input and Expected Output:

Solution:

Here, we can use the select_dtypes() method and pass the data type we need to filter out as shown below:

Prompt: Next, given a DataFrame (with NaN values in one or more columns), you need to print the number of Non-NaN cells for each column.

Input and Expected Output:

Solution:

Here, we can use the count() method to obtain the result: This is shown below:

Prompt: Given a DataFrame, your task is to split the DataFrame into a given number of equal parts.

Input and Expected Output:

Solution:

Here, we will use NumPy’s split() method and pass the number of parts as an argument, as shown below:

Prompt: Next, consider that you have a DataFrame similar to the one we used above. Your task is to flip the entire DataFrame row-wise or column-wise.

Input and Expected Output:

Solution:

We can use the loc (or iloc) and specify the reverse indexing method using “::-1” as shown below:

Prompt: In this exercise, you are given a DataFrame. Additionally, you have a list that specifies the order in which the columns should appear in the DataFrame. Given the list and the DataFrame, print the columns in the order specified in the list.

Input and Expected Output:

Solution:

Similar to above, we can use iloc to select all the rows and specify the order of columns given in the list as shown below:

Prompt: Next, given a DataFrame, you need to print every alternate row starting from the first row of the DataFrame.

Input and Expected Output:

Solution:

This solution is also similar to the two above. Here, while defining the slicing part, we can specify the step of slicing as 2, which is shown below:

Prompt: Similar to earlier tasks, you are given the same DataFrame. Your task is to insert a given list at a specific index of the DataFrame and reassign the indexes.

Input and Expected Output:

Solution:

Given an insert position, first assign the new row to an index right between the given index and the one before that. This is what the assignment statement will do. Next, we sort the DataFrame on the index. Finally, we reassign the indexes to eliminate float-based index values.

Prompt: Lastly, you need to apply a given function to the entire DataFrame. The given DataFrame consists of just integer values. The task is to increase each entry by 1 through a function.

Input and Expected Output:

Solution:

Instead of using the apply() method, here we shall use the applymap() method as shown below:

A set of challenging Pandas Questions

Pandas library has always intrigued Data Scientists to do amazing things with it. It is undoubtedly the go-to tool for tabular data handling, manipulation, and processing.

Find the notebook with all questions for this quiz here: GitHub.

Table of Contents:

As an exercise, I recommend you attempt the questions yourself first and then look at the solution I have provided.

Let’s begin!

Input and Expected Output:

Solution:

P.S. We can also solve this using merge. Do let me know in the comments if you can figure that out.

Input and Expected Output:

Solution:

Here, we can use the insert() method and pass the position, column_name, and the values as arguments as shown below:

Input and Expected Output:

Solution:

Here, we can use the select_dtypes() method and pass the data type we need to filter out as shown below:

Prompt: Next, given a DataFrame (with NaN values in one or more columns), you need to print the number of Non-NaN cells for each column.

Input and Expected Output:

Solution:

Here, we can use the count() method to obtain the result: This is shown below:

Prompt: Given a DataFrame, your task is to split the DataFrame into a given number of equal parts.

Input and Expected Output:

Solution:

Here, we will use NumPy’s split() method and pass the number of parts as an argument, as shown below:

Prompt: Next, consider that you have a DataFrame similar to the one we used above. Your task is to flip the entire DataFrame row-wise or column-wise.

Input and Expected Output:

Solution:

We can use the loc (or iloc) and specify the reverse indexing method using “::-1” as shown below:

Input and Expected Output:

Solution:

Similar to above, we can use iloc to select all the rows and specify the order of columns given in the list as shown below:

Prompt: Next, given a DataFrame, you need to print every alternate row starting from the first row of the DataFrame.

Input and Expected Output:

Solution:

This solution is also similar to the two above. Here, while defining the slicing part, we can specify the step of slicing as 2, which is shown below:

Prompt: Similar to earlier tasks, you are given the same DataFrame. Your task is to insert a given list at a specific index of the DataFrame and reassign the indexes.

Input and Expected Output:

Solution:

Prompt: Lastly, you need to apply a given function to the entire DataFrame. The given DataFrame consists of just integer values. The task is to increase each entry by 1 through a function.

Input and Expected Output:

Solution:

Instead of using the apply() method, here we shall use the applymap() method as shown below:

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.