Hprc banner tamu.png

SW:Anaconda

From TAMU HPRC
Revision as of 09:28, 13 March 2018 by Whomps (talk | contribs) (Versions available on Ada/Terra)
Jump to: navigation, search

Anaconda

Anaconda is a leading open data science platform powered by Python. It provides a collection of over 720 open source packages, and is a package and virtual environment manager. More details on Anaconda: https://docs.continuum.io/anaconda/. Next several important concepts are discussed, and then we discuss Anaconda modules on Ada and Terra.

Important Concepts

Package

A package is a collection of programs. For example, numpy is a package, tensorflow is a package, etc. Over 150 packages are automatically installed with Anaconda installation. Over 250 additional open source packages can be installed from the Anaconda repository with the 'conda install' command. Moreover, thousands of other packages are available from Anaconda cloud.

Virtual Environment

A virtual environment is a named collection of packages. For example, a virtual environment named 'test_environment' is a collection of python 3.5, basemap 1.0.7, and shapely 1.5.16. A user may create one virtual environment per project if each project needs different collection of software. Therefore, virtual environments avoid problems of version conflicting between different user projects. The command 'conda' is used to create and manage virtual environments in Anaconda. Note that other than 'conda', the command 'pip' can also be used to install python packages into a virtual environment. The 'pip' command facilitates to access more python packages. However, 'pip' does not resolve package dependency well, while 'conda' does a much better job.

Versions available on Ada/Terra

The most up to date listing of available versions on the cluster you are using can be found with:

module avail Anaconda 
# or "ml avail Anaconda" if you get tired of typing "module"

This will show all available versions of below (plus some of the myAnaconda modules described on the Python page).


Anaconda Modules on Ada

Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-4.0.0 and Anaconda/3-5.0.0.1. Module Anaconda/2-4.0.0 is for Python 2.7, while Anaconda/3-5.0.0.1 is for Python 3.6. Anaconda/3-5.0.0.1 is recommended for users without legacy issues.

Anaconda Modules on Terra

Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-4.3.1 and Anaconda/3-5.0.0.1 Module Anaconda/2-4.3.1 is for Python 2.7, while Anaconda/3-5.0.0.1 is for Python 3.6. Anaconda/3-5.0.0.1 is recommended for users without legacy issues.

Managing virtual environments

Create a Virtual Environment

Using Anaconda

Anaconda/3-5.0.0.1 is recommended on both Terra and Ada.

Create a Virtual Environment
  • Shared Virtual Environments The command 'conda create -n virtual_environment_name python=x.x anaconda' creates a virtual environment named as 'virtual_environment_name' (a user should change the virtual environment name) in anaconda root environment. On our clusters, users do not have write permission to where anaconda root environment is. So users cannot use this command to create a virtual environment using the Anaconda modules on our clusters. Instead, we may create a virtual environment for user(s). Note that all virtual environment created by this command are accessible to all users.
  • Private Virtual Environment A user can create a private virtual environment using the command 'conda create -n virtual_environment_name package_to_install' where package_to_install is optional. Such a virtual environment is only accessible to the user who creates it. The private virtual environment is located at $SCRATCH/.conda/envs.NOTE: private virtual environment works only for Anaconda/3-4.4.0 and later version (e.g., 3.5.0.0.1), and works for Anaconda/2-4.0.0 on Ada
List Virtual Environments

A user may list all shared virtual environments and your own private virtual environments using the command:

  conda info --env
Access a Virtual Environment

Normally a user needs to load Anaconda module and any other modules needed for your virtual environment, source activate virtual environment, and then user can run a program or commands which access the packages in the activated virtual environment. After the program or commands finished, the user should source deactivate the virtual environment. Actually, the last step 'source deactivate virtual_environment_name' is not necessary if you do not need to clean your path environment. Below are the summaries on how to access a virtual environment.

  1. module load Anaconda/xxx
  2. module load any_other_module_needed
  3. source activate your_virtual_environment_name
  4. run your programs/commands
  5. source deactivate

Note: if you have a virtual environment not in the output of 'conda info --env', then you need the full path of the virtual environment in the source activate command. For example: source activate /scratch/user/uncommon/test.

Using Conda

Other than creating a virtual environment as discussed above, the command 'conda' can list, clone, remove and share a virtual environment. More details can be found at https://conda.io/docs/using/envs.html. A user may find the conda cheatsheet is helpful: https://conda.io/docs/_downloads/conda-cheatsheet.pdf

Adding "channels"