Pandas & Python Tricks for Data Science & Data Analysis — Part 4 | by Zoumana Keita | Mar, 2023
This is the fourth part of my Pandas & Python Tricks
A couple of days ago, I shared some Python and Pandas tricks to help Data Analysts and Data Scientists quickly learn new valuable concepts that they might not be aware of. This is also part of the collection of tricks I share daily on LinkedIn.
Change columns data type
Wrong data format is a common challenge when dealing with real-world 🌏 data.
For instance, you might have a numerical value that is stored as a string such as “34” instead of 34.
✅ Using the astype
function, you can easily convert data from one type to another (e.g. string to numerical).
Below is an illustration 💡
Infer the correct types of columns
We have seen that the astype()
function is good when it comes to changing the type of columns. However, this task can be repetitive when multiple columns are involved.
Instead of iterating through all the columns one by one, we can use the infer_objects()
function to overcome this issue, by automatically inspecting the content of each column and changing it to the correct type.
Below is an illustration
Check if two DataFrames are equal
Two columns with the same name may not contain the same values, and two rows with the same index may not be identical.
To know if two DataFrames are equal, you need to go deeper 💡 to check if they have the same shape and same elements.
This is where the Pandas 𝗲𝗾𝘂𝗮𝗹𝘀() function comes in handy.
✅ It returns True if the two DataFrames are equal.
❌ It returns False if they are not equal.
Below is an illustration 🚀
Make your Python output more human-readable
Sometimes it is necessary to go beyond the default output provided by Python to make it more understandable by humans 👩🏻💼👩🏽💼👨🏻💼👨🏽💼.
✅ This can be achieved using the humanize library.
The full video tutorial is available here for more examples.
Convert natural language to numerical values
Natural language 🗣️ is everywhere 🌐, even in our DataFrames.
This is not a bad thing itself because it is the perfect 👍💯 type of data when performing natural language processing tasks.
However, their limitations 👎🚫 become obvious when trying to perform numerical computation.
🛠️✅ To tackle this issue, you can use the 𝗻𝘂𝗺𝗲𝗿𝗶𝘇𝗲() function from the python library 𝗻𝘂𝗺𝗲𝗿𝗶𝘇𝗲𝗿.
✨ It converts natural language expressions of numbers into their actual numerical values.
Below is an illustration 🚀
Combine multiple lists
Using the +
sign is probably the most common approach to combine 🔗 lists.
However, typing the +
sign all the time can become easily boring when you have to deal with multiple lists.
✅ Instead, you can use the add
and reduce
functions respectively from the operator
and functools
modules.
Below is an illustration 🚀
Zip Iterables of different sizes
If you have been using the zip()
function, then you might be aware of this limitation: it does not work with iterables of different sizes, which can lead to information loss.
🛠️✅ You can tackle this issue with zip
function’s cousin: zip_longest()
function from the itertools
module.
Instead of ignoring the remaining items, their values are replaced with None
That’s good, but can be even amazing using the fillvalue
parameter to replace the None
with a meaningful value.
Below is an illustration 🚀
Thank you for reading! 🎉 🍾
I hope you found this list of Python and Pandas tricks helpful! Keep an eye on here, because the content will be maintained with more tricks on a daily basis.
Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.
Would you like to buy me a coffee ☕️? → Here you go!
Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!
Before you leave find the last two parts of this series below:
Pandas & Python Tricks for Data Science & Data Analysis — Part 1
Pandas & Python Tricks for Data Science & Data Analysis — Part 2
Pandas & Python Tricks for Data Science & Data Analysis — Part 3
This is the fourth part of my Pandas & Python Tricks
A couple of days ago, I shared some Python and Pandas tricks to help Data Analysts and Data Scientists quickly learn new valuable concepts that they might not be aware of. This is also part of the collection of tricks I share daily on LinkedIn.
Change columns data type
Wrong data format is a common challenge when dealing with real-world 🌏 data.
For instance, you might have a numerical value that is stored as a string such as “34” instead of 34.
✅ Using the astype
function, you can easily convert data from one type to another (e.g. string to numerical).
Below is an illustration 💡
Infer the correct types of columns
We have seen that the astype()
function is good when it comes to changing the type of columns. However, this task can be repetitive when multiple columns are involved.
Instead of iterating through all the columns one by one, we can use the infer_objects()
function to overcome this issue, by automatically inspecting the content of each column and changing it to the correct type.
Below is an illustration
Check if two DataFrames are equal
Two columns with the same name may not contain the same values, and two rows with the same index may not be identical.
To know if two DataFrames are equal, you need to go deeper 💡 to check if they have the same shape and same elements.
This is where the Pandas 𝗲𝗾𝘂𝗮𝗹𝘀() function comes in handy.
✅ It returns True if the two DataFrames are equal.
❌ It returns False if they are not equal.
Below is an illustration 🚀
Make your Python output more human-readable
Sometimes it is necessary to go beyond the default output provided by Python to make it more understandable by humans 👩🏻💼👩🏽💼👨🏻💼👨🏽💼.
✅ This can be achieved using the humanize library.
The full video tutorial is available here for more examples.
Convert natural language to numerical values
Natural language 🗣️ is everywhere 🌐, even in our DataFrames.
This is not a bad thing itself because it is the perfect 👍💯 type of data when performing natural language processing tasks.
However, their limitations 👎🚫 become obvious when trying to perform numerical computation.
🛠️✅ To tackle this issue, you can use the 𝗻𝘂𝗺𝗲𝗿𝗶𝘇𝗲() function from the python library 𝗻𝘂𝗺𝗲𝗿𝗶𝘇𝗲𝗿.
✨ It converts natural language expressions of numbers into their actual numerical values.
Below is an illustration 🚀
Combine multiple lists
Using the +
sign is probably the most common approach to combine 🔗 lists.
However, typing the +
sign all the time can become easily boring when you have to deal with multiple lists.
✅ Instead, you can use the add
and reduce
functions respectively from the operator
and functools
modules.
Below is an illustration 🚀
Zip Iterables of different sizes
If you have been using the zip()
function, then you might be aware of this limitation: it does not work with iterables of different sizes, which can lead to information loss.
🛠️✅ You can tackle this issue with zip
function’s cousin: zip_longest()
function from the itertools
module.
Instead of ignoring the remaining items, their values are replaced with None
That’s good, but can be even amazing using the fillvalue
parameter to replace the None
with a meaningful value.
Below is an illustration 🚀
Thank you for reading! 🎉 🍾
I hope you found this list of Python and Pandas tricks helpful! Keep an eye on here, because the content will be maintained with more tricks on a daily basis.
Also, If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.
Would you like to buy me a coffee ☕️? → Here you go!
Feel free to follow me on Medium, Twitter, and YouTube, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!
Before you leave find the last two parts of this series below:
Pandas & Python Tricks for Data Science & Data Analysis — Part 1
Pandas & Python Tricks for Data Science & Data Analysis — Part 2
Pandas & Python Tricks for Data Science & Data Analysis — Part 3