Techno Blender
Digitally Yours.
Browsing Tag

Bex

Git For the Modern Data Scientist: 9 Git Concepts You Can’t Ignore | by Bex T. | May, 2023

3. Staging areaBy talking about commits, we have got ahead of ourselves. Before closing the cap of the commit capsule, you have to make sure the contents within are right.This involves telling Git exactly which changes from which files you want to commit. Sometimes, new changes might come from several files and you may only want to commit some of them and leave the rest for future commits.This is where we lift the curtains and reveal the staging area (pun intended):Image by me. The staging area is changed after the…

A Proven Method to Remember Data Science Concepts For as Long as You Need | by Bex T. | Apr, 2023

And tools to put the method into practice in the age of AIImage by me. Via my good pal, MidjourneyThe problem with self-learning data scienceEvery time I want to install a library with Anaconda, the -c part of the command keeps moving around. So, like most people, I google it, sometimes 3-4 times a day:conda install -c conda-forge library_nameSounds familiar?This little example signals a fundamental flaw in the way most of us learn data science and machine learning today: Data science knowledge is cheaper than air, so we…

6 Underdog Data Science Libraries That Deserve Much More Attention | by Bex T. | Apr, 2023

Time to go out of the shadowsImage by me via Midjourney.While the big guys, Pandas, Scikit-learn, NumPy, Matplotlib, TensorFlow, etc., hog all your attention, it is easy to miss some down-to-earth and yet, incredible libraries.They may not be GitHub rock stars, or taught in expensive Coursera specializations, but thousands of open-source developers pour their blood and sweat into writing them. They quietly fill the gaps left by popular libraries from the shadows.The purpose of this article is to shine a light on some of…

Goodbye os.path: 15 Pathlib Tricks to Quickly Master The File System in Python | by Bex T. | Apr, 2023

No headaches and unreadable code from os.pathA robot pal. — Via MidjourneyPathlib may be my favorite library (after Sklearn, obviously). And given there are over 130 thousand libraries, that’s saying something. Pathlib helps me turn code like this written in os.path:import osdir_path = "/home/user/documents"# Find all text files inside a directoryfiles = into this:from pathlib import Path# Find all text files inside a directoryfiles = list(dir_path.glob("*.txt"))Pathlib came out in Python 3.4 as a replacement for the…

7 Easy Steps To Switch From Pandas to Lightning Fast Polars And Never Return | by Bex T. | Apr, 2023

A cheat sheet of the most common Pandas operations translated into PolarsImage by author via MidjourneyTime for goodbyes!Pandas can do anything. Virtually anything. But (and this is an I-wish-a-million-times-it-was-any-other-way but) it lacks speed. Pandas just can't keep up with the pace at which the size and complexity of today's datasets are growing.Pandas author, Wes McKinney, states that when he wrote Pandas, he had this rule of thumb in mind for his library:Have 5 to 10 times as much RAM as the size of your…

Measuring The Speed of New Pandas 2.0 Against Polars and Datatable — Still Not Good Enough | by Bex T. | Mar, 2023

Even though the new PyArrow backend for Pandas is bringing exciting features, it still looks disappointing in terms of speed.Image by author via MidjourneyPeople have been complaining about Pandas' speed ever since they tried reading their first gigabyte-sized dataset with read_csv and realized they had to wait for - gasp - five seconds. And yes, I was one of those complainers.Five seconds might not sound a lot, but when loading the dataset itself takes that much runtime, it usually means subsequent operations will take…

Create Stunning Fractal Art with Python: A Tutorial For Beginners And Hardcore Math Lovers | by Bex T. | Mar, 2023

With a single line of code or even lessIntroductionThe phrase "I've never seen anything more beautiful" should only be used for fractals. Sure, there is the Mona Lisa, The Starry Night, and The Birth of Venus (which all have been ruined by AI-generated art, by the way), but I don't think any artist or human could create anything royally amazing as fractals.On the left, we have the iconic fractal, the Mandelbrot's set, discovered in 1979 when no Python or graphing software was available.GIF by the author using Fraqtive, an…

What to Do With Outliers Once You Find Them | by Bex T. | Feb, 2023

Image by Ralf Kunze from PixabayOutlier detection is only part of the story. The real challenge comes in figuring out what to do with these anomalies. It's all too easy just to brush outliers aside, but there are a lot of nuances and factors to consider.Blindly removing them can have unwanted consequences. While finding a certain type of outliers might suggest a more serious problem with data, detecting another type may not be a problem at all. Therefore, it is important for you not to make hasty decisions as they might…

5 Best Python Synthetic Data Generators And How to Use Them When You Lack Data | by Bex T. | Jan, 2023

Let's get even more dataPhoto by Maxim BergIn 2021, 2.5 quintillion bytes (2.5 million terabytes) of data were produced daily. Today, it is even more. But apparently, that's not enough because the Python ecosystem has many libraries to produce synthetic data. Maybe some of them are created just for the sake of being able to generate synthetic data, but most have beneficial applications such as:Machine learning: when real-world data is not available or difficult to obtain for model trainingData privacy and security:…

3-Step Feature Selection Guide in Sklearn to Superchage Your Models | by Bex T. | Oct, 2022

Develop a robust Feature Selection workflow for any supervised problemLearn how to face one of the biggest challenges of machine learning with the best of Sklearn feature selectors.Photo by Steve JohnsonIntroductionToday, it is common for datasets to have hundreds if not thousands of features. On the surface, this might seem like a good thing — more features give more information about each sample. But more often than not, these additional features don’t provide much value and introduce complexity.The biggest challenge of…