Techno Blender
Digitally Yours.

Pandas & Python Tricks for Data Science & Data Analysis — Part 4 | by Zoumana Keita | Mar, 2023

0 31


This is the fourth part of my Pandas & Python Tricks

Photo by Andrew Neel on Unsplash

A couple of days ago, I shared some Python and Pandas tricks to help Data Analysts and Data Scientists quickly learn new valuable concepts that they might not be aware of. This is also part of the collection of tricks I share daily on LinkedIn.

Change columns data type

Wrong data format is a common challenge when dealing with real-world 🌏 data.

For instance, you might have a numerical value that is stored as a string such as “34” instead of 34.

✅ Using the astypefunction, you can easily convert data from one type to another (e.g. string to numerical).

Below is an illustration 💡

Infer the correct types of columns

We have seen that the astype() function is good when it comes to changing the type of columns. However, this task can be repetitive when multiple columns are involved.

Instead of iterating through all the columns one by one, we can use the infer_objects() function to overcome this issue, by automatically inspecting the content of each column and changing it to the correct type.

Below is an illustration

Check if two DataFrames are equal

Two columns with the same name may not contain the same values, and two rows with the same index may not be identical.

To know if two DataFrames are equal, you need to go deeper 💡 to check if they have the same shape and same elements.

This is where the Pandas 𝗲𝗾𝘂𝗮𝗹𝘀() function comes in handy.

✅ It returns True if the two DataFrames are equal.

❌ It returns False if they are not equal.

Below is an illustration 🚀

Make your Python output more human-readable

Sometimes it is necessary to go beyond the default output provided by Python to make it more understandable by humans 👩🏻‍💼👩🏽‍💼👨🏻‍💼👨🏽‍💼.

✅ This can be achieved using the humanize library.

The full video tutorial is available here for more examples.

Convert natural language to numerical values

Natural language 🗣️ is everywhere 🌐, even in our DataFrames.

This is not a bad thing itself because it is the perfect 👍💯 type of data when performing natural language processing tasks.

However, their limitations 👎🚫 become obvious when trying to perform numerical computation.

🛠️✅ To tackle this issue, you can use the 𝗻𝘂𝗺𝗲𝗿𝗶𝘇𝗲() function from the python library 𝗻𝘂𝗺𝗲𝗿𝗶𝘇𝗲𝗿.

✨ It converts natural language expressions of numbers into their actual numerical values.

Below is an illustration 🚀

Combine multiple lists

Using the + sign is probably the most common approach to combine 🔗 lists.
However, typing the + sign all the time can become easily boring when you have to deal with multiple lists.

✅ Instead, you can use the add and reduce functions respectively from the operator and functools modules.

Below is an illustration 🚀

Zip Iterables of different sizes

If you have been using the zip() function, then you might be aware of this limitation: it does not work with iterables of different sizes, which can lead to information loss.

🛠️✅ You can tackle this issue with zip function’s cousin: zip_longest() function from the itertools module.

Instead of ignoring the remaining items, their values are replaced with None

That’s good, but can be even amazing using the fillvalue parameter to replace the None with a meaningful value.

Below is an illustration 🚀

Thank you for reading! 🎉 🍾

I hope you found this list of Python and Pandas tricks helpful! Keep an eye on here, because the content will be maintained with more tricks on a daily basis.

Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.

Would you like to buy me a coffee ☕️? → Here you go!

Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!

Before you leave find the last two parts of this series below:

Pandas & Python Tricks for Data Science & Data Analysis — Part 1

Pandas & Python Tricks for Data Science & Data Analysis — Part 2

Pandas & Python Tricks for Data Science & Data Analysis — Part 3




This is the fourth part of my Pandas & Python Tricks

Photo by Andrew Neel on Unsplash

A couple of days ago, I shared some Python and Pandas tricks to help Data Analysts and Data Scientists quickly learn new valuable concepts that they might not be aware of. This is also part of the collection of tricks I share daily on LinkedIn.

Change columns data type

Wrong data format is a common challenge when dealing with real-world 🌏 data.

For instance, you might have a numerical value that is stored as a string such as “34” instead of 34.

✅ Using the astypefunction, you can easily convert data from one type to another (e.g. string to numerical).

Below is an illustration 💡

Infer the correct types of columns

We have seen that the astype() function is good when it comes to changing the type of columns. However, this task can be repetitive when multiple columns are involved.

Instead of iterating through all the columns one by one, we can use the infer_objects() function to overcome this issue, by automatically inspecting the content of each column and changing it to the correct type.

Below is an illustration

Check if two DataFrames are equal

Two columns with the same name may not contain the same values, and two rows with the same index may not be identical.

To know if two DataFrames are equal, you need to go deeper 💡 to check if they have the same shape and same elements.

This is where the Pandas 𝗲𝗾𝘂𝗮𝗹𝘀() function comes in handy.

✅ It returns True if the two DataFrames are equal.

❌ It returns False if they are not equal.

Below is an illustration 🚀

Make your Python output more human-readable

Sometimes it is necessary to go beyond the default output provided by Python to make it more understandable by humans 👩🏻‍💼👩🏽‍💼👨🏻‍💼👨🏽‍💼.

✅ This can be achieved using the humanize library.

The full video tutorial is available here for more examples.

Convert natural language to numerical values

Natural language 🗣️ is everywhere 🌐, even in our DataFrames.

This is not a bad thing itself because it is the perfect 👍💯 type of data when performing natural language processing tasks.

However, their limitations 👎🚫 become obvious when trying to perform numerical computation.

🛠️✅ To tackle this issue, you can use the 𝗻𝘂𝗺𝗲𝗿𝗶𝘇𝗲() function from the python library 𝗻𝘂𝗺𝗲𝗿𝗶𝘇𝗲𝗿.

✨ It converts natural language expressions of numbers into their actual numerical values.

Below is an illustration 🚀

Combine multiple lists

Using the + sign is probably the most common approach to combine 🔗 lists.
However, typing the + sign all the time can become easily boring when you have to deal with multiple lists.

✅ Instead, you can use the add and reduce functions respectively from the operator and functools modules.

Below is an illustration 🚀

Zip Iterables of different sizes

If you have been using the zip() function, then you might be aware of this limitation: it does not work with iterables of different sizes, which can lead to information loss.

🛠️✅ You can tackle this issue with zip function’s cousin: zip_longest() function from the itertools module.

Instead of ignoring the remaining items, their values are replaced with None

That’s good, but can be even amazing using the fillvalue parameter to replace the None with a meaningful value.

Below is an illustration 🚀

Thank you for reading! 🎉 🍾

I hope you found this list of Python and Pandas tricks helpful! Keep an eye on here, because the content will be maintained with more tricks on a daily basis.

Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.

Would you like to buy me a coffee ☕️? → Here you go!

Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!

Before you leave find the last two parts of this series below:

Pandas & Python Tricks for Data Science & Data Analysis — Part 1

Pandas & Python Tricks for Data Science & Data Analysis — Part 2

Pandas & Python Tricks for Data Science & Data Analysis — Part 3

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment