- 1 Anaconda
- 1.1 Important Concepts
- 1.2 Versions available on Ada/Terra
- 1.3 Managing virtual environments
- 1.4 Using Conda
Anaconda is a leading open data science platform powered by Python. It provides a collection of over 720 open source packages, and is a package and virtual environment manager. More details on Anaconda: https://docs.continuum.io/anaconda/. Next several important concepts are discussed, and then we discuss Anaconda modules on Ada and Terra.
A package is a collection of programs. For example, numpy is a package, tensorflow is a package, etc. Over 150 packages are automatically installed with Anaconda installation. Over 250 additional open source packages can be installed from the Anaconda repository with the 'conda install' command. Moreover, thousands of other packages are available from Anaconda cloud.
A virtual environment is a named collection of packages. For example, a virtual environment named 'test_environment' is a collection of python 3.5, basemap 1.0.7, and shapely 1.5.16. A user may create one virtual environment per project if each project needs different collection of software. Therefore, virtual environments avoid problems of version conflicting between different user projects. The command 'conda' is used to create and manage virtual environments in Anaconda. Note that other than 'conda', the command 'pip' can also be used to install python packages into a virtual environment. The 'pip' command facilitates to access more python packages. However, 'pip' does not resolve package dependency well, while 'conda' does a much better job.
Versions available on Ada/Terra
The most up to date listing of available versions on the cluster you are using can be found with:
module avail Anaconda # or "ml avail Anaconda" if you get tired of typing "module"
This will show all available versions of below (plus some of the myAnaconda modules described on the Python page).
Anaconda Modules on Ada
Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-5.0.1 and Anaconda/3-22.214.171.124. Module Anaconda/2-5.0.1 is for Python 2.7, while Anaconda/3-126.96.36.199 is for Python 3.6. Anaconda/3-188.8.131.52 is recommended for users who needs Python 3, while Anaconda/2-5.0.1 is recommended for users who needs Python 2.
Anaconda Modules on Terra
Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-4.3.1 and Anaconda/3-184.108.40.206 Module Anaconda/2-4.3.1 is for Python 2.7, while Anaconda/3-220.127.116.11 is for Python 3.6. Anaconda/3-18.104.22.168 is recommended for users without legacy issues.
Managing virtual environments
Create a Virtual Environment
Anaconda/3-22.214.171.124 is recommended on both Terra and Ada.
Create a Virtual Environment
- Shared Virtual Environments The command 'conda create -n virtual_environment_name python=x.x anaconda' creates a virtual environment named as 'virtual_environment_name' (a user should change the virtual environment name) in anaconda root environment. On our clusters, users do not have write permission to where anaconda root environment is. So users cannot use this command to create a virtual environment using the Anaconda modules on our clusters. Instead, we may create a virtual environment for user(s). Note that all virtual environment created by this command are accessible to all users.
- Note: A list of available virtual environments (currently 76) can be discerned from the files in /sw/local/etc/Anaconda/venvs/. e.g. on terra, using Anaconda/3-126.96.36.199, one should be able to activate the VE "tensorflow-gpu-1.4.1" as needed.
- Private Virtual Environment A user can create a private virtual environment using the command 'conda create -n virtual_environment_name package_to_install' where package_to_install is optional. Such a virtual environment is only accessible to the user who creates it. The private virtual environment is located at $SCRATCH/.conda/envs. NOTE: private virtual environment works only for Anaconda/3-4.4.0 and later version (e.g., 188.8.131.52.1), and works for Anaconda/2-5.0.1 on Ada
List Virtual Environments
A user may list all shared virtual environments and your own private virtual environments using the command:
conda info --env
Access a Virtual Environment
Normally a user needs to load Anaconda module and any other modules needed for your virtual environment, source activate virtual environment, and then user can run a program or commands which access the packages in the activated virtual environment. After the program or commands finished, the user should source deactivate the virtual environment. Actually, the last step 'source deactivate virtual_environment_name' is not necessary if you do not need to clean your path environment. Below are the summaries on how to access a virtual environment.
- module load Anaconda/xxx
- module load any_other_module_needed
- source activate your_virtual_environment_name
- run your programs/commands
- source deactivate
Note: if you have a virtual environment not in the output of 'conda info --env', then you need the full path of the virtual environment in the source activate command. For example: source activate /scratch/user/uncommon/test.
Clean and Remove a Virtual Environment
Anaconda downloads packages to your computer before software packages are installed. Those downloaded packages consume your disk quota. You may run 'conda clean --all' after your software packages are installed. To complete remove your private virtual environment myenv when you no longer need it, run the command "conda remove --name myenv --all"
Other than creating a virtual environment as discussed above, the command 'conda' can list, clone, remove and share a virtual environment. More details can be found at https://conda.io/docs/using/envs.html. A user may find the conda cheatsheet is helpful: https://conda.io/docs/_downloads/conda-cheatsheet.pdf