From Jupyter to Kubernetes: Refactoring and Deploying Notebooks Using Open-Source Tools | by Eduardo Blancas | Jun, 2022

By Jessie Hobb On Jun 23, 2022

Software Engineering For Data Science

A step-by-step guide to going from a messy notebook to a pipeline running in Kubernetes

Notebooks are great for rapid iterations and prototyping but quickly get messy. After working on a notebook, my code becomes difficult to manage and unsuitable for deployment. In production, code organization is essential for maintainability (it’s much easier to improve and debug organized code than a long, messy notebook).

In this post, I’ll describe how you can use our open-source tools to cover the entire life cycle of a Data Science project: starting from a messy notebook until you have that code running in production. Let’s get started!

The first step is to clean up our notebook with automated tools; then, we’ll automatically refactor our monolithic notebook into a modular pipeline with soorgeon; after that, we’ll test that our pipeline runs; and, finally, we’ll deploy our pipeline to Kubernetes. The main benefit of this workflow is that all steps are fully automated, so we can return to Jupyter, iterate (or fix bugs), and deploy again effortlessly.

The interactivity of notebooks makes it simple to try out new ideas, but it also yields messy code. While exploring data, we often rush to write code without considering readability. Lucky for us, there are tools like isort and black which allow us to easily re-format our code to improve readability. Unfortunately, these tools only work with .py files; however, soorgeon enable us to run them on notebook files (.ipynb):

Software Engineering For Data Science

A step-by-step guide to going from a messy notebook to a pipeline running in Kubernetes

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.