Python Virtual Environments | Python Package Management


Photo by Billy Huynh on Unsplash

Have you ever started a data science project to find that you don’t have the proper dependencies installed? It’s more than frustrating to get to code block 2 and discover that your Pandas version isn’t compatible with Pandas-Profiling or that Graphviz won’t import into your notebook. Previously, Manuel Treffer published an article related to this issue, focusing on python dependency conflicts and how to manage them. The author listed some common problems that arise and ways to solve them, but I wanted to focus on another option that can solve even more problems — managing virtual environments.

As a beginner, you might approach python coding like this — install all the necessary packages for all of your data science projects in the global environment. There are some benefits to this, one being simplicity. However, some of the drawbacks include facing dependency conflicts and more importantly global environment corruption. Have you ever experienced a time when your package “could no longer be found”? If so, you’ve probably run into global environment corruption and it’s time to find a new approach to the issue.

Python documentation defines a virtual environment as:

“a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.”

Basically, a virtual environment is something you can create to isolate your data science projects.

This seems like a fairly straightforward question, and it is. There is one over-arching answer… You’re working on multiple python-based projects.

For instance, if you’re working on an NLP project one day and a CNN the next, there’s no need to install all the necessary packages for BOTH projects in one environment. Rather, you can create an environment for NLP and an environment for your CNN.

Athreya Anand summed this up very nicely in his article, Why you should use a virtual environment for EVERY python project. If you read this article, what you’ll see is that the main reason for creating virtual environments is to improve efficiency.

Your computer is like a public library. Every time you install packages in your global environment, you’re adding more and more books. In the end, this leads to disorganization and conflicting material. Think about it — if you owned a library, you would break it up into sections. You would also check the books regularly to make sure the content is up-to-date. The same goes for your computer.

When you’re working on an important data science project, the last thing you want to waste time on is rectifying package dependencies. It decreases productivity, and frankly, it really grinds my gears. As a newer data scientist, it took me a while to learn about virtual environments, but I can assure you that this is a core skill in a data scientist’s toolkit. Therefore, I want to walk through how virtual environments work.

There are a few options for creating and managing virtual environments, which we will cover below:

  1. Venv

This provides a lightweight option for creating virtual environments. To start, you’ll want to run the following in the command line:

python3 -m venv /path/to/new/virtual/environment

This command creates a directory with the name of your new virtual environment with a few different items:

  • pyvenv.cfg
  • home key
  • a bin (or Scripts on Windows)
  • lib/pythonX.Y/site-packages subdirectory (on Windows, this is Lib\site-packages)

Once these items are set up, your environment is activated and you can start working on your project.

  • You’re not tied to a specific version of python
  • Your code is reproducible. If someone wants to run your script, they just need to create a venv with the same requirements.
  • The global environments doesn’t get bombarded with python packages that you don’t always need

It takes up some space…but that’s really it. If you can spare a few hundred megabytes, then this shouldn’t be an issue.

If you’re typically using anaconda navigator or conda, then you might be more apt to use conda environments. This is also fairly simple to accomplish.

There are a few options for creating a conda environment:

 conda create — name myenv
  • Use a environment.yml file
  • Use anaconda navigator’s UI to create your environment
author’s image

This is a great option for new coders who want to use a point-and-click interface for creating environments. If you’re looking for more information on Anaconda Navigator, click here.

For those who are a little more advanced when it comes to python programming, you might want to check out docker — specifically docker containers.

According to Docker, a container is

“a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.”

This means that there are three core benefits of docker:

  • Code isolation —isolate software from the global environment and ensure that it works according to a standard
  • Independence — Docker containers ensure that someone else can run your code on another platform
  • Portability — Docker doesn’t require an OS per application, which improves server efficiency and cuts costs

If you’re packaging a python application in a Docker image, the following code can be used as a template for activating virtual environment:

FROM python:[VERSION]

CMD ["python", "yourcode.py"]

RUN virtualenv /ve

ENV PATH="/ve/bin:$PATH"

RUN /ve/bin/[PIP INSTALL PACKAGES]

CMD ["/ve/bin/python", "CODE.py"]

Now that you have the basics for creating and activating virtual environments (via venv, conda, and/or docker) the next step is practicing this process for a new python project.

An example of a project that might be helpful for practice is one that uses pandas-profiling. This is because pandas-profiling has certain dependencies based on which version of python, numpy, pandas, etc. you are using. In this quick example, I’ll go over how to use pandas-profiling 2.9, which requires me to create a venv with python 3.8.15 instead of my usual 3.10.9.

I started out by opening up a dataset I found on Data.gov (crime data 2020 to present). I rand the following code in my global environment in VS Code:

import pandas as pd
df=pd.read_csv('crime_data.csv')
df.head()

As you can see, this worked as expected:

author’s image

Notice that the python version is in the top right corner (Python 3.10.9).

When I tried to import pandas-profiling, I ran into a problem:

author’s error image

Specifically, this is the final error message:

ImportError: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic'

If you do a quick google search for this error, you’ll see it’s all too common when working with pandas-profiling due to changes in the dependencies’ names. To solve the problem, it’s best to work in a virtual environment where you can install specific python and package versions.

I did this directly in VS Code with the Crtl+Shift+P command. I selected Create Environment. Next, you can choose between venv or conda:

author image

I created a conda environment, which I named pandas-profiling2 with python 3.8.15

python version conda environment

Once this was set up, I ran the code in my pandas-profiling notebook and voila! I included a few screenshots of the Pandas Profile Report below:

  1. How do you activate your virtual environment?

To activate your python virtual environment, run the following command:

source env/bin/activate

2. How do you deactivate a virtual environment?

Run the following code in the command line

~ deactivate

3. What is a requirements file?

A requirements file is a list of all your project’s dependencies. To generate your requirements file, run the following command:

pip freeze > requirements.txt

After you create your requirements file, you can use this file to run your python project without needing to manually install the individual packages necessary for the project.


Photo by Billy Huynh on Unsplash

Have you ever started a data science project to find that you don’t have the proper dependencies installed? It’s more than frustrating to get to code block 2 and discover that your Pandas version isn’t compatible with Pandas-Profiling or that Graphviz won’t import into your notebook. Previously, Manuel Treffer published an article related to this issue, focusing on python dependency conflicts and how to manage them. The author listed some common problems that arise and ways to solve them, but I wanted to focus on another option that can solve even more problems — managing virtual environments.

As a beginner, you might approach python coding like this — install all the necessary packages for all of your data science projects in the global environment. There are some benefits to this, one being simplicity. However, some of the drawbacks include facing dependency conflicts and more importantly global environment corruption. Have you ever experienced a time when your package “could no longer be found”? If so, you’ve probably run into global environment corruption and it’s time to find a new approach to the issue.

Python documentation defines a virtual environment as:

“a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.”

Basically, a virtual environment is something you can create to isolate your data science projects.

This seems like a fairly straightforward question, and it is. There is one over-arching answer… You’re working on multiple python-based projects.

For instance, if you’re working on an NLP project one day and a CNN the next, there’s no need to install all the necessary packages for BOTH projects in one environment. Rather, you can create an environment for NLP and an environment for your CNN.

Athreya Anand summed this up very nicely in his article, Why you should use a virtual environment for EVERY python project. If you read this article, what you’ll see is that the main reason for creating virtual environments is to improve efficiency.

Your computer is like a public library. Every time you install packages in your global environment, you’re adding more and more books. In the end, this leads to disorganization and conflicting material. Think about it — if you owned a library, you would break it up into sections. You would also check the books regularly to make sure the content is up-to-date. The same goes for your computer.

When you’re working on an important data science project, the last thing you want to waste time on is rectifying package dependencies. It decreases productivity, and frankly, it really grinds my gears. As a newer data scientist, it took me a while to learn about virtual environments, but I can assure you that this is a core skill in a data scientist’s toolkit. Therefore, I want to walk through how virtual environments work.

There are a few options for creating and managing virtual environments, which we will cover below:

  1. Venv

This provides a lightweight option for creating virtual environments. To start, you’ll want to run the following in the command line:

python3 -m venv /path/to/new/virtual/environment

This command creates a directory with the name of your new virtual environment with a few different items:

  • pyvenv.cfg
  • home key
  • a bin (or Scripts on Windows)
  • lib/pythonX.Y/site-packages subdirectory (on Windows, this is Lib\site-packages)

Once these items are set up, your environment is activated and you can start working on your project.

  • You’re not tied to a specific version of python
  • Your code is reproducible. If someone wants to run your script, they just need to create a venv with the same requirements.
  • The global environments doesn’t get bombarded with python packages that you don’t always need

It takes up some space…but that’s really it. If you can spare a few hundred megabytes, then this shouldn’t be an issue.

If you’re typically using anaconda navigator or conda, then you might be more apt to use conda environments. This is also fairly simple to accomplish.

There are a few options for creating a conda environment:

 conda create — name myenv
  • Use a environment.yml file
  • Use anaconda navigator’s UI to create your environment
author’s image

This is a great option for new coders who want to use a point-and-click interface for creating environments. If you’re looking for more information on Anaconda Navigator, click here.

For those who are a little more advanced when it comes to python programming, you might want to check out docker — specifically docker containers.

According to Docker, a container is

“a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.”

This means that there are three core benefits of docker:

  • Code isolation —isolate software from the global environment and ensure that it works according to a standard
  • Independence — Docker containers ensure that someone else can run your code on another platform
  • Portability — Docker doesn’t require an OS per application, which improves server efficiency and cuts costs

If you’re packaging a python application in a Docker image, the following code can be used as a template for activating virtual environment:

FROM python:[VERSION]

CMD ["python", "yourcode.py"]

RUN virtualenv /ve

ENV PATH="/ve/bin:$PATH"

RUN /ve/bin/[PIP INSTALL PACKAGES]

CMD ["/ve/bin/python", "CODE.py"]

Now that you have the basics for creating and activating virtual environments (via venv, conda, and/or docker) the next step is practicing this process for a new python project.

An example of a project that might be helpful for practice is one that uses pandas-profiling. This is because pandas-profiling has certain dependencies based on which version of python, numpy, pandas, etc. you are using. In this quick example, I’ll go over how to use pandas-profiling 2.9, which requires me to create a venv with python 3.8.15 instead of my usual 3.10.9.

I started out by opening up a dataset I found on Data.gov (crime data 2020 to present). I rand the following code in my global environment in VS Code:

import pandas as pd
df=pd.read_csv('crime_data.csv')
df.head()

As you can see, this worked as expected:

author’s image

Notice that the python version is in the top right corner (Python 3.10.9).

When I tried to import pandas-profiling, I ran into a problem:

author’s error image

Specifically, this is the final error message:

ImportError: cannot import name 'ABCIndexClass' from 'pandas.core.dtypes.generic'

If you do a quick google search for this error, you’ll see it’s all too common when working with pandas-profiling due to changes in the dependencies’ names. To solve the problem, it’s best to work in a virtual environment where you can install specific python and package versions.

I did this directly in VS Code with the Crtl+Shift+P command. I selected Create Environment. Next, you can choose between venv or conda:

author image

I created a conda environment, which I named pandas-profiling2 with python 3.8.15

python version conda environment

Once this was set up, I ran the code in my pandas-profiling notebook and voila! I included a few screenshots of the Pandas Profile Report below:

  1. How do you activate your virtual environment?

To activate your python virtual environment, run the following command:

source env/bin/activate

2. How do you deactivate a virtual environment?

Run the following code in the command line

~ deactivate

3. What is a requirements file?

A requirements file is a list of all your project’s dependencies. To generate your requirements file, run the following command:

pip freeze > requirements.txt

After you create your requirements file, you can use this file to run your python project without needing to manually install the individual packages necessary for the project.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
artificial intelligenceenvironmentslatest newsManagementpackagepythonTechnologyVirtual
Comments (0)
Add Comment