- 1 Anaconda
- 1.1 Important Concepts
- 1.2 Versions available on Ada/Terra
- 1.3 Managing Anaconda Virtual Environments
- 1.3.1 Using Anaconda
- 184.108.40.206 List Anaconda Virtual Environments
- 220.127.116.11 Virtual Environment Types
- 18.104.22.168 Create a Private Anaconda Virtual Environment
- 22.214.171.124 Access an Anaconda Virtual Environment
- 126.96.36.199 Check Packages in an Anaconda Virtual Environment
- 188.8.131.52 Install/Uninstall Packages in a Anaconda Virtual Environment
- 184.108.40.206 Clean and Remove a Virtual Environment
- 1.3.2 JupyterLab
- 1.3.3 Jupyter Notebook
- 1.3.1 Using Anaconda
- 1.4 Using Conda
Anaconda is a leading open data science platform powered by Python. It provides a collection of over 720 open source packages, and is a package and virtual environment manager. More details on Anaconda: https://docs.continuum.io/anaconda/. Next several important concepts are discussed, and then we discuss Anaconda modules on Ada and Terra.
A package is a collection of programs. For example, numpy is a package, tensorflow is a package, etc. Over 150 packages are automatically installed with Anaconda installation. Over 250 additional open source packages can be installed from the Anaconda repository with the 'conda install' command. Moreover, thousands of other packages are available from Anaconda cloud.
A virtual environment is a named collection of packages. For example, a virtual environment named 'test_environment' is a collection of python 3.5, basemap 1.0.7, and shapely 1.5.16. A user may create one virtual environment per project if each project needs different collection of software. Therefore, virtual environments avoid problems of version conflicting between different user projects. The command 'conda' is used to create and manage virtual environments in Anaconda. Note that other than 'conda', the command 'pip' can also be used to install python packages into a virtual environment. The 'pip' command facilitates to access more python packages. However, 'pip' does not resolve package dependency well, while 'conda' does a much better job.
Versions available on Ada/Terra
The most up to date listing of available versions on the cluster you are using can be found with:
module avail Anaconda # or "ml avail Anaconda" if you get tired of typing "module"
This will show all available versions of below (plus some of the myAnaconda modules described on the Python page).
Anaconda Modules on Ada
Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-5.0.1 and Anaconda/3-220.127.116.11. Module Anaconda/2-5.0.1 is for Python 2.7, while Anaconda/3-18.104.22.168 is for Python 3.6. Anaconda/3-22.214.171.124 is recommended for users who needs Python 3, while Anaconda/2-5.0.1 is recommended for users who needs Python 2.
Anaconda Modules on Terra
Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-4.3.1 and Anaconda/3-126.96.36.199 Module Anaconda/2-4.3.1 is for Python 2.7, while Anaconda/3-188.8.131.52 is for Python 3.6. Anaconda/3-184.108.40.206 is recommended for users without legacy issues.
Managing Anaconda Virtual Environments
Information about environment management can be found on this page.
Anaconda/3-220.127.116.11 is recommended on both Terra and Ada.
List Anaconda Virtual Environments
A user may list all shared virtual environments and your own private virtual environments using the command:
conda info --env
We have many shared environment related to specific tasks. For example, the tensorflow-gpu, keras-gpu environments can be useful for machine learning applications.
Virtual Environment Types
- Shared Virtual Environments The command 'conda create -n virtual_environment_name python=x.x' creates a virtual environment named as 'virtual_environment_name' (a user should change the virtual environment name) in anaconda. On our clusters, users do not have write permission to where anaconda root environment is. So users cannot use this command to create a virtual environment using the Anaconda modules on our clusters. Instead, we may create a virtual environment for user(s). Note that all virtual environment created by this command are accessible to all users.
- Note: A list of available virtual environments (currently 76) can be discerned from the files in /sw/local/etc/Anaconda/venvs/. e.g. on terra, using Anaconda/3-18.104.22.168, one should be able to activate the VE "tensorflow-gpu-1.4.1" as needed.
- Private Virtual Environment A user can create a private virtual environment using the command 'conda create -n virtual_environment_name package_to_install' where package_to_install is optional. Such a virtual environment is only accessible to the user who creates it. The private virtual environment is located at $SCRATCH/.conda/envs. NOTE: private virtual environment works only for Anaconda/3-4.4.0 and later version (e.g., 22.214.171.124.1), and works for Anaconda/2-5.0.1 on Ada
Create a Private Anaconda Virtual Environment
Make scratch directory as your current directory and follow the commands in order to create your own virtual environment. NOTE: Do not create an environment in your home directory. You will exceed your home directory file limit.
[NetID@cluster NetID]$ cd $SCRATCH # Make scratch your current directory [NetID@cluster NetID]$ module load Anaconda/3-126.96.36.199 # Load Anaconda module [NetID@cluster NetID]$ conda create --name myenv # Create environment
Now "conda info --env" command will also show your private environment.
Access an Anaconda Virtual Environment
To activate a virtual environment user has to first load anaconda and follow these steps
[NetID@cluster NetID]$ module load Anaconda/3-188.8.131.52 # Load Anaconda module [NetID@cluster NetID]$ source activate myenv # Activate environment (myenv) [NetID@cluster NetID]$ python myprogram.py # Run your programs/commands (activated environment name will show on left of command line) (myenv) [NetID@cluster NetID]$ source deactivate # Deactivate environment [NetID@cluster NetID]$ # Command line changes to normal
Normally a user needs to load Anaconda module and any other modules needed for your virtual environment, source activate virtual environment, and then user can run a program or commands which access the packages in the activated virtual environment. After the program or commands finished, the user should source deactivate the virtual environment. Actually, the last step 'source deactivate virtual_environment_name' is not necessary if you do not need to clean your path environment. Below are the summaries on how to access a virtual environment.
- module load Anaconda/xxx
- module load any_other_module_needed
- source activate your_virtual_environment_name
- run your programs/commands
- source deactivate
Note: if you have a virtual environment not in the output of 'conda info --env', then you need the full path of the virtual environment in the source activate command. For example: source activate /scratch/user/uncommon/test.
Check Packages in an Anaconda Virtual Environment
To check the list of packages in a Anaconda environment user first can follow these steps on command line.
[NetID@cluster NetID]$ module load Anaconda/3-184.108.40.206 # Load Anaconda module [NetID@cluster NetID]$ source activate myenv # Activate environment (myenv) [NetID@cluster NetID]$ conda list # Conda list command to check installed packages; for example to see if numpy is installed
If you don't activate an environment and use "conda list" command then it will show packages in root environment.
Install/Uninstall Packages in a Anaconda Virtual Environment
NOTE: Users can only install/uninstall packages in their private environment. Users don't have access to install/uninstall packages in root and shared environments.
To install/uninstall packages in private environments users first need to activate them. For example, next few steps show how to install and uninstall numpy package in the "myenv" private environment.
[NetID@cluster NetID]$ module load Anaconda/3-220.127.116.11 # Load Anaconda module [NetID@cluster NetID]$ source activate myenv # Activate environment (myenv) [NetID@cluster NetID]$ conda install numpy # Command to install numpy package (myenv) [NetID@cluster NetID]$ conda list # Conda list command to check packages (myenv) [NetID@cluster NetID]$ conda uninstall numpy # Command to uninstall numpy package
If you see the following error after installing a software package in Anaconda:
This system lists a couple of UTF-8 supporting locales that you can pick from. The following suitable locales were discovered: aa_DJ.utf8, aa_ER.utf8, aa_ET.utf8, af_ZA.utf8, am_ET.utf8, an_ES.utf8, ar_AE.utf8, ar_BH.utf8, ar_DZ.utf8, ar_EG.utf8, ar_IN.utf8, ar_IQ.utf8, ar_JO.utf8, ar_KW.utf8, ar_LB.utf8, ar_LY.utf8, ar_MA.utf8, ar_OM.utf8, ar_QA.utf8, ar_SA.utf8, ar_SD.utf8, ar_SY.utf8, ar_TN.utf8, ar_YE.ut
Then copy the activate_utf.sh file to you conda environment substituting USERNAME and ENVIRONMENTNAME with your netid and environment name:
Ada: cp /sw/hprc/Anaconda/activate_utf.sh /scratch/user/USERNAME/.conda/envs/ENVIRONMENTNAME/etc/conda/activate.d/ Terra: cp /sw/hprc/sw/Anaconda/activate_utf.sh /scratch/user/USERNAME/.conda/envs/ENVIRONMENTNAME/etc/conda/activate.d/
Or if that doesn't work, run the following commands after activating your environment
export LANGUAGE=en_US.UTF-8 export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8
Clean and Remove a Virtual Environment
Anaconda downloads packages to your computer before software packages are installed. Those downloaded packages consume your disk quota. You may run 'conda clean --all' after your software packages are installed. To complete remove your private virtual environment myenv when you no longer need it, run the command "conda remove --name myenv --all"
You can use the default JupyterLab environment which is listed on the JupyterLab portal app or you can create your own JupyterLab conda environment either using Anaconda or Miniconda for use on the HPRC portal but you must use one of the Anaconda versions that are on the JupyterLab HPRC portal webpage.
Notice that you will need to make sure you have enough available file quota (~30,000) since conda creates thousands of files.
An Anaconda install of JupyterLab creates about the same number of files as Miniconda3.
To to create an Anaconda conda environment called jupyterlab_1.2.2, do the following on the command line:
module purge module load Anaconda/3-18.104.22.168 conda create -n jupyterlab_1.2.2
After your jupyterlab_1.2.2 environment is created, you will see output on how to activate and use your jupyterlab_1.2.2 environment
# # To activate this environment, use: # > source activate jupyterlab_1.2.2 # # To deactivate an active environment, use: # > source deactivate #
Then you can install jupyterlab (specifying a version if needed) and add packages to your jupyterlab_1.2.2 environment
source activate jupyterlab_1.2.2 conda install -c conda-forge jupyterlab=1.2.2 conda install -c conda-forge package-name
To remove downloads after packages are installed.
conda clean -t
JupyterLab v1.2.2 installed via Miniconda3 will install python v3.6.7 while Anaconda installs python 3.8.0.
Anaconda/3-22.214.171.124 and Miniconda3/4.7.10 both use python v3.6.7 with jupyterlab v1.2.0 but jupyterlab v1.2.2 installs python 3.8.0 in Anaconda so it is best to use Anaconda for JupyterLab at the moment if you want to use jupyterlab v1.2.2 instead of v1.2.0.
To to create an Miniconda conda environment called jupyterlab_1.2.0, do the following on the command line:
module purge module load Miniconda3/4.7.10 conda create -p /scratch/user/your_netid/.conda/envs/jupyterlab_1.2.0 jupyterlab=1.2.0
After your jupyterlab_1.2.0 environment is created, you will see output on how to activate and use your bio environment
# # To activate this environment, use # # $ conda activate /scratch/user/your_netid/.conda/envs/jupyterlab_1.2.0 # # To deactivate an active environment, use # # $ conda deactivate
You can add packages to your Miniconda3 environment using either Anaconda/3-126.96.36.199 or Miniconda3/4.7.10 both which use python v3.6.7
When activating the conda environment using the Miniconda3 module, you must specify the full path and should specify a path in your $SCRATCH directory since by default miniconda will install in your $HOME/.conda directory causing you to reach your $HOME file quota. When using the Anaconda module you only need to specify the environment name since Anaconda is configured to install environments in your $SCRATCH/.conda directory.
In this example, JupyterLab should be run using the portal JupyterLab app. You can use your Miniconda3/4.7.10 environment in the JupyterLab portal app by selecting the Anaconda/3-188.8.131.52 module in the portal app page and providing the name including full path of your Miniconda3/4.7.10 environment in the "JupyterLab Environment to be activated" box.
You can create your own Jupyter Notebook environment using either a Python module or Anaconda module for use on the HPRC Portal but you must use one of the Module versions that are on the Jupyter Notebook HPRC portal web page.
Your custom Notebook environment must be created on the command line for later use on the Jupyter Notebook portal app.
Notice that you will need to make sure you have enough available file quota (~10,000) since conda and pip creates thousands of files.
This table can help you decide when to use a Python module and when to use an Anaconda module for installing python packages.
|Example module||module load Python/3.6.6-intel-2018b||module load Anaconda/3-184.108.40.206|
|When to use||When only python packages are required||When C, C++ or R modules are required for installing a software package with an extensive dependency list (Example: qiime2)|
|Python version||only the same version as the module loaded||can install any version of Python 3 within Anaconda|
|Env location||virtual environment can be saved in any directory. It's up to the user to remember where environments are saved||Manages environments in a centralized location:|
|Env activation||Must provide full or relative path when activating
Command Line Example:
source activate /scratch/user/netid/my_envs/env_name/bin/activate
Terra Jupyter Notebook Portal App Example:
|Only need to provide environment name when activating|
Command Line Example:
source activate env_name
Jupyter Notebook Portal App Example:
|Available packages||PyPI||anaconda cloud (includes bioconda) and PyPI|
|Package install command||pip||conda (for Anaconda packages) |
pip (for PyPI packages)
|Installation type||wheel or source||precompiled binaries (using 'conda install pkg_name')|
wheel or source (using 'pip install --user pkg_name')
|software speed||Specific software packages such as TensorFlow non-GPU are much faster when configured correctly than Anaconda binaries since they are compiled from source and can take advantage of CPU features. However, the performance for GPU versions of the TensorFlow modules versus Anaconda environments are relatively similar.||precompiled binaries may be slower for some software packages|
|Dependency checks||yes but not completely (see link below)||yes|
|File usage||each virutal environment downloads its own packages||multiple conda environments share a common directory for downloaded packages so if a package has been previously installed in a conda environment, it doesn't have to be downloaded again when used in a new conda environment (unless you did 'conda clean -t')|
|Remove install cache||pip cache purge
for pip >=20.1b1
|conda clean -t |
to remove downloaded tar packages from shared pkgs directory
|Delete virtual environment||rm -rf env_name_directory||conda env remove --name env_name|
|possible issues||not all dependencies are resolved globally when installing multiple packages (see link below)||installing package dependencies from multiple channels (default vs conda-forge) may cause conflicts|
Note: you must activate the python virtualenv or anaconda environment before installing packages with 'pip install --user' or 'conda install'
A Python module can be used to create a virtual environment to be used in the portal Jupyter Notebook app when all you need is Python packages.
You can use a default Python virtual environment in the Jupyter Notebook portal app by leaving the "Optional Environment to be activated" field blank.
To to create a Python virtual environment called my_notebook-python-3.6.6-intel-2018b (you can name it whatever you like), do the following on the command line. You can save your virtual environments in any $SCRATCH directory you want. In this example a directory called /scratch/user/mynetid/pip_envs is used but you can use another name instead of pip_envs
A good practice is to name your environment so that you can identify which Python version is in your virtualenv so that you know which module to load.
The next three lines will create your virtual environment.
module purge module load Python/3.6.6-intel-2018b virtualenv /scratch/user/mynetid/pip_envs/my_notebook-python-3.6.6-intel-2018b
Then you can activate the virtual environment by using the full path to the activate command inside your virtual environment and install Python packages.
source /scratch/user/mynetid/pip_envs/my_notebook-python-3.6.6-intel-2018b/bin/activate pip install notebook pip install optional-python-package-name
You can use your Python/3.6.6-intel-2018b environment in the Jupyter Notebook portal app by selecting the Python/3.6.6-intel-2018b module in the portal app page and providing the name including full path to the activate command for your Python/3.6.6-intel-2018b virtual environment in the "Optional Conda Environment to be activated" box. The activate command is found inside the bin directory of your virtual env. An example of what to put in the "Optional Conda Environment to be activated" box is the full path used in the source command above.
Loading additional Lmod modules
You can install the jupyterlmod package in your python virtual environment on Terra which will allow you to load additional system modules that you may need or may have used during the creation of your virtual environment on Terra for use with the Terra Jupyter Notebook portal app.
To add this feature to your existing Terra virtual environment, do the following on the command line prior to launching Jupyter Notebook on the portal (you can use Python/3.6.6-foss-2018b if the additional module(s) you need are not available with the intel-2018b toolchain):
module purge module load Python/3.6.6-intel-2018b source /scratch/user/mynetid/pip_envs/my_notebook-python-3.6.6-intel-2018b/bin/activate pip install jupyter jupyterlmod
Then launch the Terra Jupyter Notebook portal app using your optional environment and click the 'Softwares' tab in your notebook and search for system modules.
Select a module or multiple modules that match the toolchain and python version that you used in creating your virtual environment and then click enter to load the module.
The 'Loaded Modules' list will update in a few seconds to reflect the additional module(s) loaded.
You can save your modules loaded using the 'collection' button at the right side of the notebook 'softwares' page so that you just have to select the collection instead of searching for modules each time you want to use your python virtual environment.
Anaconda is different than Python's virtualenv in that you can install other types of software such as R and R packages in your environment. Anaconda also manages the installation path and installs in your $SCRATCH/.conda directory so you don't have to create a directory prior to creating an environment. To to create an Anaconda conda environment called my_notebook (you can name it whatever you like), do the following on the command line:
module purge module load Anaconda/3-220.127.116.11 conda create -n my_notebook
After your my_notebook environment is created, you will see output on how to activate and use your my_notebook environment
# # To activate this environment, use: # > source activate my_notebook # # To deactivate an active environment, use: # > source deactivate #
Then you need to install notebook and then you can add optional packages to your my_notebook environment
source activate my_notebook conda install -c conda-forge notebook conda install -c conda-forge optional-package-name
You can use your Anaconda/3-18.104.22.168 environment in the Jupyter Notebook portal app by selecting the Anaconda/3-22.214.171.124 module in the portal app page and providing just the name (without the full path) of your Anaconda/3-126.96.36.199 environment in the "Optional Conda Environment to be activated" box. In the example above, the value to enter is: my_notebook
Other than creating a virtual environment as discussed above, the command 'conda' can list, clone, remove and share a virtual environment. More details can be found at https://conda.io/docs/using/envs.html. A user may find the conda cheatsheet is helpful: https://conda.io/docs/_downloads/conda-cheatsheet.pdf