Techno Blender
Digitally Yours.

Why I Use Weights & Biases for My Machine Learning Ph.D. Research | by Pascal Janetzky | Sep, 2022

0 41


Experiment tracking, hyperparameter optimization, private hosting

Photo by freddie marriage on Unsplash

The number of tools that make Machine Learning easier is growing. At all stages of a typical project, we can utilize dedicated software to gain insights. We can monitor our training pipeline, watch the deployment processes, automate the re-training, and track our experiments. Especially the last category, experiment tracking, has gained attention, with ever more dedicated tools being introduced. Neptune.AI has collected an overview of 15 (!) different ML experiment management tools.

One of the tools covered in Neptune’s overview is Weights & Biases, often abbreviated as W&B. If I am not mistaken, I think I’ve been using this tool when it was still at release version 0.9. After beginning my ML Ph.D. studies, I’ve come to appreciate the benefits W&B offers even more, and I’ve used it with almost all projects I’ve been part of. The software is mature, compatible with anything python, and helps me comprehend and compare my experiments’ results. In the following, I list the primary reasons why this tool is valuable for any ML researcher.

Experiment tracking across different frameworks

Easily, this is the number one reason I value Weights & Biases’ framework that high: It allows users to track their experiments regardless of the Machine Learning framework used. The native integration into all major ML frameworks (TF, PyTorch, sklearn, etc.) means that we can collect the results from Jupyter Notebooks, TensorFlow scripts, PyTorch scripts, Sklearn-based code, and any other library in a single place. The usefulness of this feature is hard to overstate: we have a single space where we collect all logs and results! We no longer have to manually track where we stored the plain-text logs for experiment A and whether this place is different than that of B. Very convenient — and simple.

The integration of the wandb tracking tool is straightforward. We can quickly update pre-existing scripts by adding a couple of lines of code at appropriate places. To showcase this process, take a look at the script below. It shows a (simple) experiment before logging with Weights & Biases:

Upgrading this script to support logging with Weights & Biases is straightforward. After adding an additional import statement for the wandb package (the Weights & Biases python library), we log in to our (free!) account in line 4. Then, in lines 21 to 24, we utilize the integrated callback to log the training progress and the wandb.log statement to log the test results manually:

Beyond tracking the raw experiment results, such as the test scores, W&B also captures metadata (i.e., data about how the data came to be). This point leads me to the following reason.

Meta-data tracking & querying

Beyond tracking the actual data — metrics, predictions, even gradients — the Weights & Biases library also tracks the used compute unit, the experiment’s duration, and the starting command. At first glance, this information only seems to be a nice benefit, but nothing truly useful. However, we users can query the W&B servers through python code.

With a short script, we can collect statistics about our experiments, such as the number of compute hours we used or rank individual runs based on user-defined criteria. The ability to automatically query for this data allows me to gather statistics about my compute usage. Based on the compute usage, I can derive the amount of money my experiments would have cost. Here, a single hour might cost less than a Dollar, but a single hour of computation is hardly sufficient. Especially when I optimize hyperparameter (see further down the list), I quickly amass multiple hundreds of hours. And this is only for a single research idea. Thus, you reading my blog posts helps me pay any upcoming computational bills. Thanks!

Transparency, reproducibility, and comprehensibility

In the end, we want to get research published and models deployed. Especially for the former, having a transparent view of the whole experiment allows others to comprehend and validate that what you did was correct. Anybody can type in some good-looking numbers — say, an accuracy of 98 % — but only when hard results back this up is it trustworthy.

By default, the experiments logged to Weights & Biases are private, meaning only you and your team members get to see them. However, the experiments can be made public for increased transparency, especially when submitting work for publication. With this setting, anybody with the link can access your results and validate them themselves.

Even if we only log the data for ourselves, we benefit from the transparency that Weights & Biases provides. We can see the parameters (batch size, epochs, learning rate, etc.) we parsed to our script, visualize the results they produced, and gain insights from this at any (later) point. Best, we could even reproduce the results using the information we stored!

Interactive and powerful GUI

While W&B can be interacted with through its API, I more often use the browser-based interface. It lets me navigate through all my experiments with ease. For increased clarity, I specify an experiment’s group and job type when initiating Weights & Biases. The first feature allows me to group experiments in the UI; the latter further divides the runs into more refined categories. To give you an example of what it would look like, see the screenshot below:

On the left, we see many (over 10k, in fact!) experiments listed. Had I not ordered them into meaningful groups, I would have lost the overview in no time! Because of the grouping feature, we can see that the runs are placed into subcategories. Further, within each group, we can add more layers of granularity by separating the runs into different job types. Most practically, this will be “train” or “evaluate.” In the screenshot above, I used the job_type argument when starting a script to differentiate between the actual training runs and helper experiments. Had I run these helpers as part of the training, I’d have tremendously slowed it down. However, as part of a separate script, this additional information — niceties, actually — can be gathered a posteriori and be linked to finished experiments.

Notably, the interactive GUI makes it easy to organize experiments based on many criteria. This feature lets us derive more insights: we can see which batch sizes work, find an appropriate number of training epochs, or see if augmentations improve the metrics. Beyond what I have covered here, Weights & Biases offers many more insight-boosting features. To learn more, I recommend going through the example notebooks provided by the people behind this ML research tool to gain hands-on experience.

It’s free

This argument is short: W&B is free to use until you hit a fair usage quota. At the time of writing, the size of your data may not exceed 100 GB, but that’s plenty of capacity. Beyond these limits, you can buy additional storage for little money.

Easy collaboration through (free) teams

Weights & Biases offers teams: different users collaborate on a project. For academic research and open-source groups alike, teams can be created free of charge. Depending on your subscription level, the number of collaborators and the number of distinctive teams you can create varies. Common to all is that team members can log data to a single project, making collaboration across multiple experiments a breeze.

Self-hosting available

For users working with sensitive data that must not leave their realm, there is the opportunity to self-host the W&B application. As of writing this, there are three ways of hosting the software. First, using Google Cloud or Amazon Web Services. This option leads to a custom W&B version running in a cloud of choice. The second option is similar but uses Weights & Biases’ cloud structure.

Finally, the last option best suits sensitive data and total data control. Still, it comes with the overhead of having to provide the underlying resources and setting up a running W&B instance. Most critically, the storage capacity must be monitored, as regular per-user storage quotas are around 100 GB. Heavy usage, such as when working on large projects (GPT-style, Neo, etc.), can exceed terabytes and might knock at the petabyte mark. So, use a scalable data system well in advance.

To summarize, the self-hosting route is well suited for (academic research) teams that work with sensitive data or require total control over the logged data. Thus, in the typical case where a research group works with a business’s data (and utilizes ML tools), ML operators should consider private hosting.

Built-in hyperparameter optimization

Beyond providing countless ways to log all data types, Weights & Biases readily provides the tools to conduct parameter searches. Just think about it for a moment: there is no necessity to set up an additional service (i.e., the search coordinator). Instead, one can let the W&B application handle all the nasty stuff. Having done a k-fold cross-validated hyperparameter search, I can assure you that Weights & Biases is well suited even for such extreme cases. If you have already set up Optuna, then keep using it. But if you have been using Weights & Biases for logging purposes so far but look to expand, then give the provided optimization tools a try.

As parameter optimization is a frequent practice — I’d wager that it might even be standard practice in many ML research directions —, having one single tool that combines both logging and parameter studies is a blessing. Moreover, as the experiments are already logged via W&B, it’s natural to also use their optimization framework.


Experiment tracking, hyperparameter optimization, private hosting

Photo by freddie marriage on Unsplash

The number of tools that make Machine Learning easier is growing. At all stages of a typical project, we can utilize dedicated software to gain insights. We can monitor our training pipeline, watch the deployment processes, automate the re-training, and track our experiments. Especially the last category, experiment tracking, has gained attention, with ever more dedicated tools being introduced. Neptune.AI has collected an overview of 15 (!) different ML experiment management tools.

One of the tools covered in Neptune’s overview is Weights & Biases, often abbreviated as W&B. If I am not mistaken, I think I’ve been using this tool when it was still at release version 0.9. After beginning my ML Ph.D. studies, I’ve come to appreciate the benefits W&B offers even more, and I’ve used it with almost all projects I’ve been part of. The software is mature, compatible with anything python, and helps me comprehend and compare my experiments’ results. In the following, I list the primary reasons why this tool is valuable for any ML researcher.

Experiment tracking across different frameworks

Easily, this is the number one reason I value Weights & Biases’ framework that high: It allows users to track their experiments regardless of the Machine Learning framework used. The native integration into all major ML frameworks (TF, PyTorch, sklearn, etc.) means that we can collect the results from Jupyter Notebooks, TensorFlow scripts, PyTorch scripts, Sklearn-based code, and any other library in a single place. The usefulness of this feature is hard to overstate: we have a single space where we collect all logs and results! We no longer have to manually track where we stored the plain-text logs for experiment A and whether this place is different than that of B. Very convenient — and simple.

The integration of the wandb tracking tool is straightforward. We can quickly update pre-existing scripts by adding a couple of lines of code at appropriate places. To showcase this process, take a look at the script below. It shows a (simple) experiment before logging with Weights & Biases:

Upgrading this script to support logging with Weights & Biases is straightforward. After adding an additional import statement for the wandb package (the Weights & Biases python library), we log in to our (free!) account in line 4. Then, in lines 21 to 24, we utilize the integrated callback to log the training progress and the wandb.log statement to log the test results manually:

Beyond tracking the raw experiment results, such as the test scores, W&B also captures metadata (i.e., data about how the data came to be). This point leads me to the following reason.

Meta-data tracking & querying

Beyond tracking the actual data — metrics, predictions, even gradients — the Weights & Biases library also tracks the used compute unit, the experiment’s duration, and the starting command. At first glance, this information only seems to be a nice benefit, but nothing truly useful. However, we users can query the W&B servers through python code.

With a short script, we can collect statistics about our experiments, such as the number of compute hours we used or rank individual runs based on user-defined criteria. The ability to automatically query for this data allows me to gather statistics about my compute usage. Based on the compute usage, I can derive the amount of money my experiments would have cost. Here, a single hour might cost less than a Dollar, but a single hour of computation is hardly sufficient. Especially when I optimize hyperparameter (see further down the list), I quickly amass multiple hundreds of hours. And this is only for a single research idea. Thus, you reading my blog posts helps me pay any upcoming computational bills. Thanks!

Transparency, reproducibility, and comprehensibility

In the end, we want to get research published and models deployed. Especially for the former, having a transparent view of the whole experiment allows others to comprehend and validate that what you did was correct. Anybody can type in some good-looking numbers — say, an accuracy of 98 % — but only when hard results back this up is it trustworthy.

By default, the experiments logged to Weights & Biases are private, meaning only you and your team members get to see them. However, the experiments can be made public for increased transparency, especially when submitting work for publication. With this setting, anybody with the link can access your results and validate them themselves.

Even if we only log the data for ourselves, we benefit from the transparency that Weights & Biases provides. We can see the parameters (batch size, epochs, learning rate, etc.) we parsed to our script, visualize the results they produced, and gain insights from this at any (later) point. Best, we could even reproduce the results using the information we stored!

Interactive and powerful GUI

While W&B can be interacted with through its API, I more often use the browser-based interface. It lets me navigate through all my experiments with ease. For increased clarity, I specify an experiment’s group and job type when initiating Weights & Biases. The first feature allows me to group experiments in the UI; the latter further divides the runs into more refined categories. To give you an example of what it would look like, see the screenshot below:

On the left, we see many (over 10k, in fact!) experiments listed. Had I not ordered them into meaningful groups, I would have lost the overview in no time! Because of the grouping feature, we can see that the runs are placed into subcategories. Further, within each group, we can add more layers of granularity by separating the runs into different job types. Most practically, this will be “train” or “evaluate.” In the screenshot above, I used the job_type argument when starting a script to differentiate between the actual training runs and helper experiments. Had I run these helpers as part of the training, I’d have tremendously slowed it down. However, as part of a separate script, this additional information — niceties, actually — can be gathered a posteriori and be linked to finished experiments.

Notably, the interactive GUI makes it easy to organize experiments based on many criteria. This feature lets us derive more insights: we can see which batch sizes work, find an appropriate number of training epochs, or see if augmentations improve the metrics. Beyond what I have covered here, Weights & Biases offers many more insight-boosting features. To learn more, I recommend going through the example notebooks provided by the people behind this ML research tool to gain hands-on experience.

It’s free

This argument is short: W&B is free to use until you hit a fair usage quota. At the time of writing, the size of your data may not exceed 100 GB, but that’s plenty of capacity. Beyond these limits, you can buy additional storage for little money.

Easy collaboration through (free) teams

Weights & Biases offers teams: different users collaborate on a project. For academic research and open-source groups alike, teams can be created free of charge. Depending on your subscription level, the number of collaborators and the number of distinctive teams you can create varies. Common to all is that team members can log data to a single project, making collaboration across multiple experiments a breeze.

Self-hosting available

For users working with sensitive data that must not leave their realm, there is the opportunity to self-host the W&B application. As of writing this, there are three ways of hosting the software. First, using Google Cloud or Amazon Web Services. This option leads to a custom W&B version running in a cloud of choice. The second option is similar but uses Weights & Biases’ cloud structure.

Finally, the last option best suits sensitive data and total data control. Still, it comes with the overhead of having to provide the underlying resources and setting up a running W&B instance. Most critically, the storage capacity must be monitored, as regular per-user storage quotas are around 100 GB. Heavy usage, such as when working on large projects (GPT-style, Neo, etc.), can exceed terabytes and might knock at the petabyte mark. So, use a scalable data system well in advance.

To summarize, the self-hosting route is well suited for (academic research) teams that work with sensitive data or require total control over the logged data. Thus, in the typical case where a research group works with a business’s data (and utilizes ML tools), ML operators should consider private hosting.

Built-in hyperparameter optimization

Beyond providing countless ways to log all data types, Weights & Biases readily provides the tools to conduct parameter searches. Just think about it for a moment: there is no necessity to set up an additional service (i.e., the search coordinator). Instead, one can let the W&B application handle all the nasty stuff. Having done a k-fold cross-validated hyperparameter search, I can assure you that Weights & Biases is well suited even for such extreme cases. If you have already set up Optuna, then keep using it. But if you have been using Weights & Biases for logging purposes so far but look to expand, then give the provided optimization tools a try.

As parameter optimization is a frequent practice — I’d wager that it might even be standard practice in many ML research directions —, having one single tool that combines both logging and parameter studies is a blessing. Moreover, as the experiments are already logged via W&B, it’s natural to also use their optimization framework.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment