Techno Blender
Digitally Yours.
Browsing Tag

Barba

Python Concurrency — concurrent.futures | by Diego Barba | Sep, 2022

Part 3 of the Python Concurrency series. Interface simplicity brought to multi-threading and multi-processing.Image by author.previous stories:After successive headaches dealing with multi-threaded and multi-process code, a dream begins to take shape: is there a way to do this in a simpler way? Is there a way to hide the creation of threads, processes, queues, and pipes? Is there a way to offload the computation elsewhere and get the result back?It turns out that there is such a way.concurrent.futures implements a simple,…

Python Concurrency — Multiprocessing | by Diego Barba | Aug, 2022

Part 2 of the Python Concurrency series. The multiprocessing module enables us to perform genuinely parallel tasks. Yet there are many things to be aware of.Photo by Nick Fewings on UnsplashMulti-processing advertises multiple core usage. Is it the answer to many Python users’ prayers? Is it a way to finally bypass the GIL?Well, for starters, it is not that new; it is actually quite old now. The multiprocessing module was introduced in Python 2.6 back in 2008.Indeed, we can submit different tasks to different OS processes…

Async for Data Scientists — Don’t Block the Event Loop | by Diego Barba | Jul, 2022

CPU-hungry tasks or non-async I/O libraries may block the event loop of your program. Learn how to avoid this in Python.Image by author.Asynchronous programming has become the standard paradigm for API design and most services. The scope for a data scientist’s skill set has also evolved. Today is not enough to create good models or visualizations; in most cases, deploying them through an API or another service is also necessary. If you haven’t been dealing with async programming in your deployments, the odds are you will…

NumPy ufuncs — The Magic Behind Vectorized Functions | by Diego Barba | Jul, 2022

Learn about NumPy universal functions (ufuncs) and how to create them. Code your own vectorized functions.Photo by Jeremy Bezanger on UnsplashHave you ever wondered about the origin of NumPy’s magical performance? NumPy powers the performance, under the hood, of many daily drivers of the data scientist, such as pandas, among an extensive list. Of course, you’d be right to think about optimized arrays written in C and Fortran. Half right, at least. The other half is not the arrays but NumPy’s functions themselves. NumPy…

Python Collections Module: The Forgotten Data Containers | by Diego Barba | Jun, 2022

If you are not using the container datatypes from the collections module, you shouldImage by author.In the learning journey of a programming language is not uncommon to develop our own hacks and tricks that allow us to implement specific tasks. As data scientists, we might end up with little recipes of our own making that enable us to manipulate data in particular ways. We tend to cling to these recipes in our treasured notebooks.Sometimes these recipes use tools that are not the best suited for the task at hand. Still,…

Decorator Tricks for Data Scientists | by Diego Barba | Jun, 2022

If you are not using Python decorators yet, you should. Pure syntactic sugar.Image by author.I remember the first time I saw an “@” sign on top of a function in Python code. I felt compelled to research what was this weird syntax. It marked a before and after, that is for sure. The “@” sign on top of the function is called a decorator, a function of the function it decorates.You can spend years as a data scientist and not use decorators. Or maybe you have used them but have not learned how to code your own. This story…

Execution Times in Python. Measure the execution time of your code… | by Diego Barba | Jun, 2022

Measure the execution time of your code in Python the right wayImage by author.Measuring code execution time is a crucial endeavor, whether for algorithm selection in a data science project or optimizing for speed in software development. Whatever the case, measuring time in Python has its subtleties, and we better get it right.In this story, we will go through many ways to measure time in Python. Our aim, time the execution of a block of code.You may be as baffled as I am to learn that there are not many pythonic ways of…

Pydantic or dataclasses? Why not both? Convert Between Them | by Diego Barba | Jun, 2022

Use dataclasses in your FastAPI projects, and speed up your pydantic model operationsImage by author.Python dataclasses are fantastic. Pydantic is fantastic. It is a tough choice if indeed we are confronted with choosing one or the other. I would say that comparing these two great modules is like comparing pears with apples, albeit similar in some regards, different overall.Pydantic’s arena is data parsing and sanitization, while dataclasses a is a fast and memory-efficient (especially using slots, Python 3.10+)…

Cointegration Popular Methods [1/2]: The Engle-Granger Approach | by Diego Barba | Jun, 2022

Simple cointegration methods in Python. The Engle-Granger approach, the most intuitive method.Image by author.The mathematics and concepts of most cointegration methods are not always very straightforward. Often, complex mathematical tools obscure the intent and steps of the methods. Such complexities also detract some from further exploration of the subject.The Engle-Granger approach to cointegration does not suffer from this. It may not be the most reliable method, nor the most stable, but it is simple and intuitive.…

Python Interfaces: Why should a Data Scientist Care? | by Diego Barba | Jun, 2022

Class interfaces, abstraction layers, inheritance, isn’t that a software developer problem? Why should you, as a data scientist, care?Image by author.Interfaces make almost all of our favorite data science libraries possible. That is a good enough reason, at least for me, to care. But let us go deep into the subject. In the context of the present story, interfaces are an Object-Oriented (OO) concept to define other objects’ properties and behavior.Interfaces are handy when we are to design a piece of software that:depends…