Hprc banner tamu.png

SW:Anaconda

From TAMU HPRC
Revision as of 14:37, 9 July 2018 by Narendra5 (talk | contribs) (Managing Anaconda Virtual Environments)
Jump to: navigation, search

Anaconda

Anaconda is a leading open data science platform powered by Python. It provides a collection of over 720 open source packages, and is a package and virtual environment manager. More details on Anaconda: https://docs.continuum.io/anaconda/. Next several important concepts are discussed, and then we discuss Anaconda modules on Ada and Terra.

Important Concepts

Package

A package is a collection of programs. For example, numpy is a package, tensorflow is a package, etc. Over 150 packages are automatically installed with Anaconda installation. Over 250 additional open source packages can be installed from the Anaconda repository with the 'conda install' command. Moreover, thousands of other packages are available from Anaconda cloud.

Virtual Environment

A virtual environment is a named collection of packages. For example, a virtual environment named 'test_environment' is a collection of python 3.5, basemap 1.0.7, and shapely 1.5.16. A user may create one virtual environment per project if each project needs different collection of software. Therefore, virtual environments avoid problems of version conflicting between different user projects. The command 'conda' is used to create and manage virtual environments in Anaconda. Note that other than 'conda', the command 'pip' can also be used to install python packages into a virtual environment. The 'pip' command facilitates to access more python packages. However, 'pip' does not resolve package dependency well, while 'conda' does a much better job.

Versions available on Ada/Terra

The most up to date listing of available versions on the cluster you are using can be found with:

module avail Anaconda 
# or "ml avail Anaconda" if you get tired of typing "module"

This will show all available versions of below (plus some of the myAnaconda modules described on the Python page).


Anaconda Modules on Ada

Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-5.0.1 and Anaconda/3-5.0.0.1. Module Anaconda/2-5.0.1 is for Python 2.7, while Anaconda/3-5.0.0.1 is for Python 3.6. Anaconda/3-5.0.0.1 is recommended for users who needs Python 3, while Anaconda/2-5.0.1 is recommended for users who needs Python 2.

Anaconda Modules on Terra

Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-4.3.1 and Anaconda/3-5.0.0.1 Module Anaconda/2-4.3.1 is for Python 2.7, while Anaconda/3-5.0.0.1 is for Python 3.6. Anaconda/3-5.0.0.1 is recommended for users without legacy issues.

Managing Anaconda Virtual Environments

Information about environment management can be found on this page.

   https://conda.io/docs/user-guide/tasks/manage-environments.html

Using Anaconda

Anaconda/3-5.0.0.1 is recommended on both Terra and Ada.

List Anaconda Virtual Environments

A user may list all shared virtual environments and your own private virtual environments using the command:

  conda info --env

We have many shared environment related to specific tasks. For example, the tensorflow-gpu, keras-gpu environments can be useful for machine learning applications.

Virtual Environment Types
  • Shared Virtual Environments The command 'conda create -n virtual_environment_name python=x.x' creates a virtual environment named as 'virtual_environment_name' (a user should change the virtual environment name) in anaconda. On our clusters, users do not have write permission to where anaconda root environment is. So users cannot use this command to create a virtual environment using the Anaconda modules on our clusters. Instead, we may create a virtual environment for user(s). Note that all virtual environment created by this command are accessible to all users.
    • Note: A list of available virtual environments (currently 76) can be discerned from the files in /sw/local/etc/Anaconda/venvs/. e.g. on terra, using Anaconda/3-5.0.0.1, one should be able to activate the VE "tensorflow-gpu-1.4.1" as needed.
  • Private Virtual Environment A user can create a private virtual environment using the command 'conda create -n virtual_environment_name package_to_install' where package_to_install is optional. Such a virtual environment is only accessible to the user who creates it. The private virtual environment is located at $SCRATCH/.conda/envs. NOTE: private virtual environment works only for Anaconda/3-4.4.0 and later version (e.g., 3.5.0.0.1), and works for Anaconda/2-5.0.1 on Ada
Create a Private Anaconda Virtual Environment

Make scratch directory as your current directory and follow the commands in order to create your own virtual environment:

   [NetID@cluster NetID]$ cd $SCRATCH                          # Make scratch your current directory
   [NetID@cluster NetID]$ module load Anaconda/3-5.0.0.1       # Load Anaconda module
   [NetID@cluster NetID]$ conda create --name myenv            # Create environment 

Now "conda info --env" command will also show your private environment.

Access a Anaconda Virtual Environment

To activate a virtual environment user has to first load anaconda and follow these steps

    [NetID@cluster NetID]$ module load Anaconda/3-5.0.0.1          # Load Anaconda module
    [NetID@cluster NetID]$ source activate myenv                   # Activate environment 
    (myenv) [NetID@cluster NetID]$ python myprogram.py        # Run your programs/commands (activated environment name will show on left of command line)
    (myenv) [NetID@cluster NetID]$ source deactivate          # Deactivate environment                 
    [NetID@cluster NetID]$                                    # Command line changes to normal

Normally a user needs to load Anaconda module and any other modules needed for your virtual environment, source activate virtual environment, and then user can run a program or commands which access the packages in the activated virtual environment. After the program or commands finished, the user should source deactivate the virtual environment. Actually, the last step 'source deactivate virtual_environment_name' is not necessary if you do not need to clean your path environment. Below are the summaries on how to access a virtual environment.

  1. module load Anaconda/xxx
  2. module load any_other_module_needed
  3. source activate your_virtual_environment_name
  4. run your programs/commands
  5. source deactivate

Note: if you have a virtual environment not in the output of 'conda info --env', then you need the full path of the virtual environment in the source activate command. For example: source activate /scratch/user/uncommon/test.

Check Packages in a Anaconda Virtual Environment

To check the list of packages in a Anaconda environment user first can follow these steps on command line.

   [NetID@cluster NetID]$ module load Anaconda/3-5.0.0.1          # Load Anaconda module
   [NetID@cluster NetID]$ source activate myenv                   # Activate environment
   (myenv) [NetID@cluster NetID]$ conda list                      # Conda list command to check packages

If you don't activate an environment and use "conda list" command then it will show packages in root environment.

Install/Uninstall Packages in a Anaconda Virtual Environment
Clean and Remove a Virtual Environment

Anaconda downloads packages to your computer before software packages are installed. Those downloaded packages consume your disk quota. You may run 'conda clean --all' after your software packages are installed. To complete remove your private virtual environment myenv when you no longer need it, run the command "conda remove --name myenv --all"

Using Conda

Other than creating a virtual environment as discussed above, the command 'conda' can list, clone, remove and share a virtual environment. More details can be found at https://conda.io/docs/using/envs.html. A user may find the conda cheatsheet is helpful: https://conda.io/docs/_downloads/conda-cheatsheet.pdf

Adding "channels"