Techno Blender
Digitally Yours.

Avoid Using “pip freeze” — Use “pipreqs” instead | by Zoumana Keita | Nov, 2022

0 47


Your project dependencies are important — manage them efficiently

Image by Ankhesenamun on Unsplash

Package management is one of the best practices of software development workflow because it facilitates the automation of software delivery.

Nowadays, most Data Scientists and Machine Learning Engineers have been adopting this best practice for their pipeline automation. Even though this process is considered a good practice, the approach adopted by most practitioners might not always be efficient: the use of pip freeze.

In this conceptual blog, you will understand that there is a better option than using pip freeze.

Imagine that you are working on a project that requires 5 dependencies: dep1, dep2, dep3, dep4, and dep5. The reaction that most people will have when generating the dependencies file is to use the following magic command:

pip freeze > requirements.txt

But how can this be an issue?

The installation of most libraries requires other libraries which are automatically installed. Below is an illustration.

# Install the transformers library
pip install transformers

The installation of the transformers library generates the following message:

Successfully installed huggingface-hub-0.10.1 tokenizers-0.13.1 transformers-4.23.1

This means that these two additional libraries huggingface-hub-0.10.1 tokenizers-0.13.1 have been installed along with the transformer library, and those two libraries will be automatically included in the requirement.txt file.

Illustration of pip freeze command (Image by Author)

But I still do not see an issue!
Don’t worry, we are getting there…

Now, imagine that the transformers library was upgraded and requires different librariesnew_lib-1.2.3 , and another_new_lib-0.2.1 . This means that the previous dependencies huggingface-hub-0.10.1 tokenizers-0.13.1 are not relevant anymore right? At least for the project ❌.

Here is the problem 👇🏽

Generating the new version of the requirements.txt file will still include the old dependencies in addition to the new ones; If you want to only keep the relevant dependencies, you will have to manually remove the old ones. Then imagine dealing with a project that requires 20, 30, or 50 libraries! That can quickly become a headache🤯.

pip freeze keeps old packages from the previous install not needed anymore (Image by Author)

In one word, pip freeze is not smart enough to efficiently manage the dependencies, and here are a few reasons:

pip freeze is only for “pip install”: it is only aware of the packages installed using the pip install command. This means that any packages installed using a different approach such as peotry, setuptools, condaetc. won’t be included in the final requirements.txt file.

pip freeze only considers libraries installed using pip command (image by Author)

pip freeze does not account for dependency versioning conflicts: a project lifecycle is interative, hence could require complete new or upgraded versions of existing libraries. Using pip freezesaves all packages in the environment including those that are not relevent to the project.

pip freeze grabs everything: if you are not using a virtual environment, pip freezegenerate the requirement file containing all the libraries in including those beyond the scope of your project.

So, what can I do to solve these issues?
Good question!

The answer is by using pipreqs 🎉

pipreqs starts by scanning all the python files (.py) in your project, then generates the requirements.txt file based on the import statements in each python file of the project. Also, it tackles all the issues faced when using pip freeze.

pireqs scanning process (Image by Author)

The installation is straightforward with the following pip command.

pip install pipreqs

Once you have installed the library, you just need to provide the root location of your project and run this command to generate the requirements.txt file of the project.

pipreqs /<your_project_root_path>/

Sometimes you might want to update the requirement file. In this case, you need to use the --forceoption to force the regeneration of the file.

pipreqs --force /<your_project_root_path>/

Imagine that you want to ignore the libraries of some python files from a specific subfolder. This can be achieved by using the --ignoreoption before specifying the subfolder that needs to be ignored.

pipreqs /<your_project_root_path>/ --ignore /<your_project_root_path>/folder_to_ignore/

Congratulations!🎉🍾 You have just learned about a new, yet efficient way to manage your project dependencies.

If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.

Feel free to follow me on Medium, Twitter, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!




Your project dependencies are important — manage them efficiently

Image by Ankhesenamun on Unsplash

Package management is one of the best practices of software development workflow because it facilitates the automation of software delivery.

Nowadays, most Data Scientists and Machine Learning Engineers have been adopting this best practice for their pipeline automation. Even though this process is considered a good practice, the approach adopted by most practitioners might not always be efficient: the use of pip freeze.

In this conceptual blog, you will understand that there is a better option than using pip freeze.

Imagine that you are working on a project that requires 5 dependencies: dep1, dep2, dep3, dep4, and dep5. The reaction that most people will have when generating the dependencies file is to use the following magic command:

pip freeze > requirements.txt

But how can this be an issue?

The installation of most libraries requires other libraries which are automatically installed. Below is an illustration.

# Install the transformers library
pip install transformers

The installation of the transformers library generates the following message:

Successfully installed huggingface-hub-0.10.1 tokenizers-0.13.1 transformers-4.23.1

This means that these two additional libraries huggingface-hub-0.10.1 tokenizers-0.13.1 have been installed along with the transformer library, and those two libraries will be automatically included in the requirement.txt file.

Illustration of pip freeze command (Image by Author)

But I still do not see an issue!
Don’t worry, we are getting there…

Now, imagine that the transformers library was upgraded and requires different librariesnew_lib-1.2.3 , and another_new_lib-0.2.1 . This means that the previous dependencies huggingface-hub-0.10.1 tokenizers-0.13.1 are not relevant anymore right? At least for the project ❌.

Here is the problem 👇🏽

Generating the new version of the requirements.txt file will still include the old dependencies in addition to the new ones; If you want to only keep the relevant dependencies, you will have to manually remove the old ones. Then imagine dealing with a project that requires 20, 30, or 50 libraries! That can quickly become a headache🤯.

pip freeze keeps old packages from the previous install not needed anymore (Image by Author)

In one word, pip freeze is not smart enough to efficiently manage the dependencies, and here are a few reasons:

pip freeze is only for “pip install”: it is only aware of the packages installed using the pip install command. This means that any packages installed using a different approach such as peotry, setuptools, condaetc. won’t be included in the final requirements.txt file.

pip freeze only considers libraries installed using pip command (image by Author)

pip freeze does not account for dependency versioning conflicts: a project lifecycle is interative, hence could require complete new or upgraded versions of existing libraries. Using pip freezesaves all packages in the environment including those that are not relevent to the project.

pip freeze grabs everything: if you are not using a virtual environment, pip freezegenerate the requirement file containing all the libraries in including those beyond the scope of your project.

So, what can I do to solve these issues?
Good question!

The answer is by using pipreqs 🎉

pipreqs starts by scanning all the python files (.py) in your project, then generates the requirements.txt file based on the import statements in each python file of the project. Also, it tackles all the issues faced when using pip freeze.

pireqs scanning process (Image by Author)

The installation is straightforward with the following pip command.

pip install pipreqs

Once you have installed the library, you just need to provide the root location of your project and run this command to generate the requirements.txt file of the project.

pipreqs /<your_project_root_path>/

Sometimes you might want to update the requirement file. In this case, you need to use the --forceoption to force the regeneration of the file.

pipreqs --force /<your_project_root_path>/

Imagine that you want to ignore the libraries of some python files from a specific subfolder. This can be achieved by using the --ignoreoption before specifying the subfolder that needs to be ignored.

pipreqs /<your_project_root_path>/ --ignore /<your_project_root_path>/folder_to_ignore/

Congratulations!🎉🍾 You have just learned about a new, yet efficient way to manage your project dependencies.

If you like reading my stories and wish to support my writing, consider becoming a Medium member. With a $ 5-a-month commitment, you unlock unlimited access to stories on Medium.

Feel free to follow me on Medium, Twitter, or say Hi on LinkedIn. It is always a pleasure to discuss AI, ML, Data Science, NLP, and MLOps stuff!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment