Split Jupyter Notebooks into Separate Parts | by Dhia Hmila | Medium


Is Your Notebook Too Large?

Written by: Amal Hasni & Dhia Hmila

AI-Generated with Stable Diffusion

IPython Notebooks are really useful if you want to test a new idea in a quick and dirty way or to explore Data for the first time. Either way, Notebooks have a tendency to quickly get large and bulky and usually require cleaning and refactoring when you’re done exploring at least if you want to share them with your manager, coworker, or future self.

One operation that is surprisingly common is to split the notebook into multiple sub-notebooks, usually based on headings/titles. If you want to do this in Jupyter, you’d have to duplicate the notebook multiple times and delete in each notebook the relevant Cells.

What if there was a faster way to do this systematically? Well this is what you’ll learn in this article, using nbmanips , a python package / CLI that I created to easily manage Notebooks.

Table of Contents :
· Installing nbmanips
· 1 - Splitting notebooks
· 2 — Splitting a Notebook based on the Table of Contents
· 3 — Splitting notebooks using a Python script
· Bonus: Concatenate multiple notebooks

Installing nbmanips is very simple if you use pip:

pip install nbmanips

Make sure the CLI works and that you have at least version 1.3.0 installed, by running the following command:

You can test the following library with your own notebook files if you want but in case you need notebooks, here’s a great Git repository with over 30 Machine Learning related notebooks.

Once you’ve installed nbmanips, you can use the CLI, to easily split the notebook. It’s up to you to tell the package at which level you wish to perform the split. Say you need to make a split every time there is a markdown cell with a title (h1 HTML tag), all you need to do is specify that in the command as follows:

nb select has_html_tag h1 | nb split -s nb.ipynb
  • The first part of the command ( nb select has_html_tag h1 ) will tell nbmanips on which cells to perform the split.
  • The second part ( nb split -s nb.ipynb ) will split the notebook based on the piped selection. The -s flag tells nbmanips to use the selection instead of a cell index.

In the example, the selection is performed on Markdown cells that have a level 1 heading, but you can customize that to your liking. For example, you can also split on level 2 headings:

nb select has_html_tag h1,h2 | nb split -s nb.ipynb

If you want to learn about other selectors or other use cases, feel free to check this other article:

By default, the result notebooks will be named nb-%d.ipynb , but you can customize that by adding the --output/-o option:

nb select has_html_tag h1 | nb split -s nb.ipynb -o 'custom-name-%d.ipynb'

A simpler way to split a notebook is using the index of the cell itself, using this command, this can be helpful if you want to split on a specific title or code cell, for example:

nb split nb.ipynb 5,9

The downside is that finding the cell index can be tedious in a large Notebook. Thankfully, there are easier ways to find the index.

For example, you can display the table of contents with the following command:

nb toc nb.ipynb

Another example, that is less obvious, if you want to figure out the index of a cell that contains an import statement and that is amongst the last 10 cells of the notebook:

nbmanips is a python package, which means you can use it inside a python script, which can be useful if you want to do more complex stuff or automate treatments for a bunch of files.

Before you start any treatment, you have to read the notebook:

Now that you have the notebook, you can split using a selection as we have seen in the first example:

Or like we’ve seen in the previous example, using the Table of contents:

You can concatenate multiple notebooks using the following command:

nb cat nb1.ipynb nb2.ipynb -o result.ipynb

Or if you’re using a python script:

nbmanips tries to be a Swiss Army Knife but for Jupyter Notebooks, so you can easily, split, merge and explore Notebooks without having to think about it.

I think it’s a nice tool to have in your pocket, it won’t necessarily be useful every day, but when you need it, you’ll be thankful to have it.

Another use case, you might have is to concatenate multiple notebooks. You can, check out our other article that not only shows how to do that but also goes into detail about the structure of a Jupyter Notebook file, in case you are interested:

If you have questions, don’t hesitate to leave them in the response section and we’ll be more than happy to answer.

Thank you for sticking around this far, stay safe and we will see you in our next article! 😊

More Articles To Read


Is Your Notebook Too Large?

Written by: Amal Hasni & Dhia Hmila

AI-Generated with Stable Diffusion

IPython Notebooks are really useful if you want to test a new idea in a quick and dirty way or to explore Data for the first time. Either way, Notebooks have a tendency to quickly get large and bulky and usually require cleaning and refactoring when you’re done exploring at least if you want to share them with your manager, coworker, or future self.

One operation that is surprisingly common is to split the notebook into multiple sub-notebooks, usually based on headings/titles. If you want to do this in Jupyter, you’d have to duplicate the notebook multiple times and delete in each notebook the relevant Cells.

What if there was a faster way to do this systematically? Well this is what you’ll learn in this article, using nbmanips , a python package / CLI that I created to easily manage Notebooks.

Table of Contents :
· Installing nbmanips
· 1 - Splitting notebooks
· 2 — Splitting a Notebook based on the Table of Contents
· 3 — Splitting notebooks using a Python script
· Bonus: Concatenate multiple notebooks

Installing nbmanips is very simple if you use pip:

pip install nbmanips

Make sure the CLI works and that you have at least version 1.3.0 installed, by running the following command:

You can test the following library with your own notebook files if you want but in case you need notebooks, here’s a great Git repository with over 30 Machine Learning related notebooks.

Once you’ve installed nbmanips, you can use the CLI, to easily split the notebook. It’s up to you to tell the package at which level you wish to perform the split. Say you need to make a split every time there is a markdown cell with a title (h1 HTML tag), all you need to do is specify that in the command as follows:

nb select has_html_tag h1 | nb split -s nb.ipynb
  • The first part of the command ( nb select has_html_tag h1 ) will tell nbmanips on which cells to perform the split.
  • The second part ( nb split -s nb.ipynb ) will split the notebook based on the piped selection. The -s flag tells nbmanips to use the selection instead of a cell index.

In the example, the selection is performed on Markdown cells that have a level 1 heading, but you can customize that to your liking. For example, you can also split on level 2 headings:

nb select has_html_tag h1,h2 | nb split -s nb.ipynb

If you want to learn about other selectors or other use cases, feel free to check this other article:

By default, the result notebooks will be named nb-%d.ipynb , but you can customize that by adding the --output/-o option:

nb select has_html_tag h1 | nb split -s nb.ipynb -o 'custom-name-%d.ipynb'

A simpler way to split a notebook is using the index of the cell itself, using this command, this can be helpful if you want to split on a specific title or code cell, for example:

nb split nb.ipynb 5,9

The downside is that finding the cell index can be tedious in a large Notebook. Thankfully, there are easier ways to find the index.

For example, you can display the table of contents with the following command:

nb toc nb.ipynb

Another example, that is less obvious, if you want to figure out the index of a cell that contains an import statement and that is amongst the last 10 cells of the notebook:

nbmanips is a python package, which means you can use it inside a python script, which can be useful if you want to do more complex stuff or automate treatments for a bunch of files.

Before you start any treatment, you have to read the notebook:

Now that you have the notebook, you can split using a selection as we have seen in the first example:

Or like we’ve seen in the previous example, using the Table of contents:

You can concatenate multiple notebooks using the following command:

nb cat nb1.ipynb nb2.ipynb -o result.ipynb

Or if you’re using a python script:

nbmanips tries to be a Swiss Army Knife but for Jupyter Notebooks, so you can easily, split, merge and explore Notebooks without having to think about it.

I think it’s a nice tool to have in your pocket, it won’t necessarily be useful every day, but when you need it, you’ll be thankful to have it.

Another use case, you might have is to concatenate multiple notebooks. You can, check out our other article that not only shows how to do that but also goes into detail about the structure of a Jupyter Notebook file, in case you are interested:

If you have questions, don’t hesitate to leave them in the response section and we’ll be more than happy to answer.

Thank you for sticking around this far, stay safe and we will see you in our next article! 😊

More Articles To Read

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsDhiaHmilaJupytermachine learningmediumNotebooksPartsSeparatesplitTechnology
Comments (0)
Add Comment