Techno Blender
Digitally Yours.

Pandas and Python Tricks for Data Science and Data Analysis — Part 5 | by Zoumana Keita | Apr, 2023

0 39


Photo by Andrew Neel on Unsplash

A couple of days ago, I shared some Python and Pandas tricks to help Data Analysts and Data Scientists quickly learn new valuable concepts that they might not be aware of. This is also part of the collection of tricks I share daily on LinkedIn.

Combine SQL statements and Pandas

My gut feeling is telling me that more than 80% of the Data Scientists use Pandas in their daily Data Science activities.

And, I believe that this is because of the benefits it offers of being part of the wider range of the Python universe, making it accessible to many people.

𝙒𝙝𝙖𝙩 𝙖𝙗𝙤𝙪𝙩 𝙎𝙌𝙇?
Even though not everyone uses it in their daily life (because not every company has necessary a SQL Database?), SQL’s performance is undeniable. Also, it is human-readable which makes it easily understood by even non-tech people.

❓What if we could find a way to 𝙘𝙤𝙢𝙗𝙞𝙣𝙚 𝙩𝙝𝙚 𝙗𝙚𝙣𝙚𝙛𝙞𝙩𝙨 𝙤𝙛 𝙗𝙤𝙩𝙝 𝙋𝙖𝙣𝙙𝙖𝙨 𝙖𝙣𝙙 𝙎𝙌𝙇 statements?

✅ Here is where 𝗽𝗮𝗻𝗱𝗮𝘀𝗾𝗹 comes in handy 🎉🎉🎉

Below is an illustration 💡 Also you can watch the full video here.

Update data of a given dataframe with another dataframe

There are multiple ways of replacing missing values 🧩 in Pandas, from simple imputation to more advanced methods.

But … 🚨

Sometimes, you just want to replace them using non-NA values from another DataFrame.

✅ This can be achieved using the built-in update function from Pandas.

It aligns both DataFrames on their index and columns before performing the update.

General syntax ⚙️ below:

𝗳𝗶𝗿𝘀𝘁_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲.𝘂𝗽𝗱𝗮𝘁𝗲(𝘀𝗲𝗰𝗼𝗻𝗱_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲)

✨ missing values from 𝗳𝗶𝗿𝘀𝘁_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲 dataframe are replaced with non-missing values from 𝘀𝗲𝗰𝗼𝗻𝗱_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲

✨ 𝗼𝘃𝗲𝗿𝘄𝗿𝗶𝘁𝗲=𝗧𝗿𝘂𝗲 will overwrite 𝗳𝗶𝗿𝘀𝘁_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲’s values from using 𝘀𝗲𝗰𝗼𝗻𝗱_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲 data, and this is the default value. If 𝗼𝘃𝗲𝗿𝘄𝗿𝗶𝘁𝗲=𝗙𝗮𝗹𝘀𝗲 only the missing values are replaced.

Here is an illustration 💡

From unstructured to structured data

Data preprocessing is full of challenges 🔥

Imagine you have this data with candidates’ information in the following format:

‘𝗔𝗱𝗷𝗮 𝗞𝗼𝗻𝗲: 𝗵𝗮𝘀 𝗠𝗮𝘀𝘁𝗲𝗿 𝗶𝗻 𝗦𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝘀 𝗮𝗻𝗱 𝗶𝘀 𝟮𝟯 𝘆𝗲𝗮𝗿𝘀 𝗼𝗹𝗱’

‘𝗙𝗮𝗻𝘁𝗮 𝗧𝗿𝗮𝗼𝗿𝗲: 𝗵𝗮𝘀 𝗣𝗵𝗗 𝗶𝗻 𝗦𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝘀 𝗮𝗻𝗱 𝗶𝘀 𝟯𝟬 𝘆𝗲𝗮𝗿𝘀 𝗼𝗹𝗱’

Then, your task is to generate a table with the following information per candidate for further analysis:

✨ The first and last name

✨ The degree and field of study

✨ The Age

🚨 Performing such a task can be daunting 🤯

✅ This is where the 𝘀𝘁𝗿.𝗲𝘅𝘁𝗿𝗮𝗰𝘁() function in Pandas can help!

It is a powerful text-processing function for extracting structured information from unstructured textual data.

Below is an illustration 💡

Perform multiple aggregations with the agg() function

If you want to perform multiple aggregation functions like 𝘀𝘂𝗺, 𝗮𝘃𝗲𝗿𝗮𝗴𝗲, 𝗰𝗼𝘂𝗻𝘁 … on one or multiple columns.

✅ You can combine 𝗴𝗿𝗼𝘂𝗽𝗯𝘆() 𝗮𝗻𝗱 𝗮𝗴𝗴() 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 from Pandas in one line of code.

Here is a Scenario 🎬 👇🏽

Let’s imagine this students’ data containing information about:

✨ Students’ areas of study

✨ Their grades

✨ The graduation years and the age of each student.

And, you have been requested to compute the following information per area of study and year:

→ The number of students

→ The average grade

→ The average age

Below is an image illustration 💡 for solving the scenario.

Select observations between two specified times

When working with time series data, you might want to select observations between two specified times for further analysis.

✅ This can be quickly achieved using the 𝗯𝗲𝘁𝘄𝗲𝗲𝗻_𝘁𝗶𝗺𝗲() function.

Below is an illustration 💡

Check if all elements meet a certain condition

❌ The combination of 𝗳𝗼𝗿 loops and 𝗶𝗳 statements is not always the most elegant way when writing Python code.

For instance, let’s say that you want to check if all the elements of an iterable meet a certain condition.

Two possibilities may arise:

1️⃣ Either use for loop and if statement.

OR

2️⃣ Use the all() built-in function

Below is an illustration 💡

Check if any element meets a certain condition

Similarly to the previous case, if you want to check if at least one element of an iterable meet a certain condition.

✅ Then use the any() built-in function which is more elegant than using for loop and if statement.

The illustration is similar to the above image.

Avoid nested for loops

Writing nested 𝗳𝗼𝗿 loops is almost inevitable when your program becomes bigger and more complicated.

❌ This can also make your code difficult to read and maintain.

✅ A better alternative is to use the built-in 𝗽𝗿𝗼𝗱𝘂𝗰𝘁() function instead.

Below is an illustration 💡

Automatically handle index in a list

Imagine you have to access elements in a list and their indexes at the same time.

One way of doing it is handling manually the indexes in a for loop.

✅ Instead, you can use the 𝗲𝗻𝘂𝗺𝗲𝗿𝗮𝘁𝗲() built-in function.

This has two main benefits (I can think of).

✨ First it automatically handles the index variable.

✨ Then makes the code more readable.

Below is an illustration 💡

Thank you for reading! 🎉 🍾

I hope you found this list of Python and Pandas tricks helpful! Keep an eye on here, because the content will be maintained with more tricks on a daily basis.

Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.

Would you like to buy me a coffee ☕️? → Here you go!

Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!

Before you leave find the last two parts of this series below:

Pandas & Python Tricks for Data Science & Data Analysis — Part 1

Pandas & Python Tricks for Data Science & Data Analysis — Part 2

Pandas & Python Tricks for Data Science & Data Analysis — Part 3

Pandas & Python Tricks for Data Science & Data Analysis — Part 4




Photo by Andrew Neel on Unsplash

A couple of days ago, I shared some Python and Pandas tricks to help Data Analysts and Data Scientists quickly learn new valuable concepts that they might not be aware of. This is also part of the collection of tricks I share daily on LinkedIn.

Combine SQL statements and Pandas

My gut feeling is telling me that more than 80% of the Data Scientists use Pandas in their daily Data Science activities.

And, I believe that this is because of the benefits it offers of being part of the wider range of the Python universe, making it accessible to many people.

𝙒𝙝𝙖𝙩 𝙖𝙗𝙤𝙪𝙩 𝙎𝙌𝙇?
Even though not everyone uses it in their daily life (because not every company has necessary a SQL Database?), SQL’s performance is undeniable. Also, it is human-readable which makes it easily understood by even non-tech people.

❓What if we could find a way to 𝙘𝙤𝙢𝙗𝙞𝙣𝙚 𝙩𝙝𝙚 𝙗𝙚𝙣𝙚𝙛𝙞𝙩𝙨 𝙤𝙛 𝙗𝙤𝙩𝙝 𝙋𝙖𝙣𝙙𝙖𝙨 𝙖𝙣𝙙 𝙎𝙌𝙇 statements?

✅ Here is where 𝗽𝗮𝗻𝗱𝗮𝘀𝗾𝗹 comes in handy 🎉🎉🎉

Below is an illustration 💡 Also you can watch the full video here.

Update data of a given dataframe with another dataframe

There are multiple ways of replacing missing values 🧩 in Pandas, from simple imputation to more advanced methods.

But … 🚨

Sometimes, you just want to replace them using non-NA values from another DataFrame.

✅ This can be achieved using the built-in update function from Pandas.

It aligns both DataFrames on their index and columns before performing the update.

General syntax ⚙️ below:

𝗳𝗶𝗿𝘀𝘁_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲.𝘂𝗽𝗱𝗮𝘁𝗲(𝘀𝗲𝗰𝗼𝗻𝗱_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲)

✨ missing values from 𝗳𝗶𝗿𝘀𝘁_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲 dataframe are replaced with non-missing values from 𝘀𝗲𝗰𝗼𝗻𝗱_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲

✨ 𝗼𝘃𝗲𝗿𝘄𝗿𝗶𝘁𝗲=𝗧𝗿𝘂𝗲 will overwrite 𝗳𝗶𝗿𝘀𝘁_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲’s values from using 𝘀𝗲𝗰𝗼𝗻𝗱_𝗱𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲 data, and this is the default value. If 𝗼𝘃𝗲𝗿𝘄𝗿𝗶𝘁𝗲=𝗙𝗮𝗹𝘀𝗲 only the missing values are replaced.

Here is an illustration 💡

From unstructured to structured data

Data preprocessing is full of challenges 🔥

Imagine you have this data with candidates’ information in the following format:

‘𝗔𝗱𝗷𝗮 𝗞𝗼𝗻𝗲: 𝗵𝗮𝘀 𝗠𝗮𝘀𝘁𝗲𝗿 𝗶𝗻 𝗦𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝘀 𝗮𝗻𝗱 𝗶𝘀 𝟮𝟯 𝘆𝗲𝗮𝗿𝘀 𝗼𝗹𝗱’

‘𝗙𝗮𝗻𝘁𝗮 𝗧𝗿𝗮𝗼𝗿𝗲: 𝗵𝗮𝘀 𝗣𝗵𝗗 𝗶𝗻 𝗦𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝘀 𝗮𝗻𝗱 𝗶𝘀 𝟯𝟬 𝘆𝗲𝗮𝗿𝘀 𝗼𝗹𝗱’

Then, your task is to generate a table with the following information per candidate for further analysis:

✨ The first and last name

✨ The degree and field of study

✨ The Age

🚨 Performing such a task can be daunting 🤯

✅ This is where the 𝘀𝘁𝗿.𝗲𝘅𝘁𝗿𝗮𝗰𝘁() function in Pandas can help!

It is a powerful text-processing function for extracting structured information from unstructured textual data.

Below is an illustration 💡

Perform multiple aggregations with the agg() function

If you want to perform multiple aggregation functions like 𝘀𝘂𝗺, 𝗮𝘃𝗲𝗿𝗮𝗴𝗲, 𝗰𝗼𝘂𝗻𝘁 … on one or multiple columns.

✅ You can combine 𝗴𝗿𝗼𝘂𝗽𝗯𝘆() 𝗮𝗻𝗱 𝗮𝗴𝗴() 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 from Pandas in one line of code.

Here is a Scenario 🎬 👇🏽

Let’s imagine this students’ data containing information about:

✨ Students’ areas of study

✨ Their grades

✨ The graduation years and the age of each student.

And, you have been requested to compute the following information per area of study and year:

→ The number of students

→ The average grade

→ The average age

Below is an image illustration 💡 for solving the scenario.

Select observations between two specified times

When working with time series data, you might want to select observations between two specified times for further analysis.

✅ This can be quickly achieved using the 𝗯𝗲𝘁𝘄𝗲𝗲𝗻_𝘁𝗶𝗺𝗲() function.

Below is an illustration 💡

Check if all elements meet a certain condition

❌ The combination of 𝗳𝗼𝗿 loops and 𝗶𝗳 statements is not always the most elegant way when writing Python code.

For instance, let’s say that you want to check if all the elements of an iterable meet a certain condition.

Two possibilities may arise:

1️⃣ Either use for loop and if statement.

OR

2️⃣ Use the all() built-in function

Below is an illustration 💡

Check if any element meets a certain condition

Similarly to the previous case, if you want to check if at least one element of an iterable meet a certain condition.

✅ Then use the any() built-in function which is more elegant than using for loop and if statement.

The illustration is similar to the above image.

Avoid nested for loops

Writing nested 𝗳𝗼𝗿 loops is almost inevitable when your program becomes bigger and more complicated.

❌ This can also make your code difficult to read and maintain.

✅ A better alternative is to use the built-in 𝗽𝗿𝗼𝗱𝘂𝗰𝘁() function instead.

Below is an illustration 💡

Automatically handle index in a list

Imagine you have to access elements in a list and their indexes at the same time.

One way of doing it is handling manually the indexes in a for loop.

✅ Instead, you can use the 𝗲𝗻𝘂𝗺𝗲𝗿𝗮𝘁𝗲() built-in function.

This has two main benefits (I can think of).

✨ First it automatically handles the index variable.

✨ Then makes the code more readable.

Below is an illustration 💡

Thank you for reading! 🎉 🍾

I hope you found this list of Python and Pandas tricks helpful! Keep an eye on here, because the content will be maintained with more tricks on a daily basis.

Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.

Would you like to buy me a coffee ☕️? → Here you go!

Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!

Before you leave find the last two parts of this series below:

Pandas & Python Tricks for Data Science & Data Analysis — Part 1

Pandas & Python Tricks for Data Science & Data Analysis — Part 2

Pandas & Python Tricks for Data Science & Data Analysis — Part 3

Pandas & Python Tricks for Data Science & Data Analysis — Part 4

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment