Hprc banner tamu.png

SW:Anaconda

From TAMU HPRC
Jump to: navigation, search

Anaconda

Anaconda is a leading open data science platform powered by Python. It provides a collection of over 720 open source packages, and is a package and virtual environment manager. More details on Anaconda: https://docs.continuum.io/anaconda/. Next several important concepts are discussed, and then we discuss Anaconda modules on Ada and Terra.

Important Concepts

Package

A package is a collection of programs. For example, numpy is a package, tensorflow is a package, etc. Over 150 packages are automatically installed with Anaconda installation. Over 250 additional open source packages can be installed from the Anaconda repository with the 'conda install' command. Moreover, thousands of other packages are available from Anaconda cloud.

Virtual Environment

A virtual environment is a named collection of packages. For example, a virtual environment named 'test_environment' is a collection of python 3.5, basemap 1.0.7, and shapely 1.5.16. A user may create one virtual environment per project if each project needs different collection of software. Therefore, virtual environments avoid problems of version conflicting between different user projects. The command 'conda' is used to create and manage virtual environments in Anaconda. Note that other than 'conda', the command 'pip' can also be used to install python packages into a virtual environment. The 'pip' command facilitates to access more python packages. However, 'pip' does not resolve package dependency well, while 'conda' does a much better job.

Versions available on Ada/Terra

The most up to date listing of available versions on the cluster you are using can be found with:

module avail Anaconda 
# or "ml avail Anaconda" if you get tired of typing "module"

This will show all available versions of below (plus some of the myAnaconda modules described on the Python page).

Anaconda Modules on Ada

Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-5.0.1 and Anaconda/3-5.0.0.1. Module Anaconda/2-5.0.1 is for Python 2.7, while Anaconda/3-5.0.0.1 is for Python 3.6. Anaconda/3-5.0.0.1 is recommended for users who needs Python 3, while Anaconda/2-5.0.1 is recommended for users who needs Python 2.

Anaconda Modules on Terra

Run command 'module spider Anaconda' to list Anaconda modules which include Anaconda/2-4.3.1 and Anaconda/3-5.0.0.1 Module Anaconda/2-4.3.1 is for Python 2.7, while Anaconda/3-5.0.0.1 is for Python 3.6. Anaconda/3-5.0.0.1 is recommended for users without legacy issues.

Managing Anaconda Virtual Environments

Information about environment management can be found on this page.

   https://conda.io/docs/user-guide/tasks/manage-environments.html

Using Anaconda

Anaconda/3-5.0.0.1 is recommended on both Terra and Ada.

List Anaconda Virtual Environments

A user may list all shared virtual environments and your own private virtual environments using the command:

  conda info --env

We have many shared environment related to specific tasks. For example, the tensorflow-gpu, keras-gpu environments can be useful for machine learning applications.

Virtual Environment Types
  • Shared Virtual Environments The command 'conda create -n virtual_environment_name python=x.x' creates a virtual environment named as 'virtual_environment_name' (a user should change the virtual environment name) in anaconda. On our clusters, users do not have write permission to where anaconda root environment is. So users cannot use this command to create a virtual environment using the Anaconda modules on our clusters. Instead, we may create a virtual environment for user(s). Note that all virtual environment created by this command are accessible to all users.
    • Note: A list of available virtual environments (currently 76) can be discerned from the files in /sw/local/etc/Anaconda/venvs/. e.g. on terra, using Anaconda/3-5.0.0.1, one should be able to activate the VE "tensorflow-gpu-1.4.1" as needed.
  • Private Virtual Environment A user can create a private virtual environment using the command 'conda create -n virtual_environment_name package_to_install' where package_to_install is optional. Such a virtual environment is only accessible to the user who creates it. The private virtual environment is located at $SCRATCH/.conda/envs. NOTE: private virtual environment works only for Anaconda/3-4.4.0 and later version (e.g., 3.5.0.0.1), and works for Anaconda/2-5.0.1 on Ada
Create a Private Anaconda Virtual Environment

Make scratch directory as your current directory and follow the commands in order to create your own virtual environment. NOTE: Do not create an environment in your home directory. You will exceed your home directory file limit.

   [NetID@cluster NetID]$ cd $SCRATCH                          # Make scratch your current directory
   [NetID@cluster NetID]$ module load Anaconda/3-5.0.0.1       # Load Anaconda module
   [NetID@cluster NetID]$ conda create --name myenv            # Create environment 

Now "conda info --env" command will also show your private environment.

Access an Anaconda Virtual Environment

To activate a virtual environment user has to first load anaconda and follow these steps

    [NetID@cluster NetID]$ module load Anaconda/3-5.0.0.1          # Load Anaconda module
    [NetID@cluster NetID]$ source activate myenv                   # Activate environment 
    (myenv) [NetID@cluster NetID]$ python myprogram.py             # Run your programs/commands (activated environment name will show on left of command line)
    (myenv) [NetID@cluster NetID]$ source deactivate               # Deactivate environment                 
    [NetID@cluster NetID]$                                         # Command line changes to normal

Normally a user needs to load Anaconda module and any other modules needed for your virtual environment, source activate virtual environment, and then user can run a program or commands which access the packages in the activated virtual environment. After the program or commands finished, the user should source deactivate the virtual environment. Actually, the last step 'source deactivate virtual_environment_name' is not necessary if you do not need to clean your path environment. Below are the summaries on how to access a virtual environment.

  1. module load Anaconda/xxx
  2. module load any_other_module_needed
  3. source activate your_virtual_environment_name
  4. run your programs/commands
  5. source deactivate

Note: if you have a virtual environment not in the output of 'conda info --env', then you need the full path of the virtual environment in the source activate command. For example: source activate /scratch/user/uncommon/test.

Check Packages in an Anaconda Virtual Environment

To check the list of packages in a Anaconda environment user first can follow these steps on command line.

   [NetID@cluster NetID]$ module load Anaconda/3-5.0.0.1          # Load Anaconda module
   [NetID@cluster NetID]$ source activate myenv                   # Activate environment
   (myenv) [NetID@cluster NetID]$ conda list                      # Conda list command to check installed packages; for example to see if numpy is installed

If you don't activate an environment and use "conda list" command then it will show packages in root environment.

Install/Uninstall Packages in a Anaconda Virtual Environment

NOTE: Users can only install/uninstall packages in their private environment. Users don't have access to install/uninstall packages in root and shared environments.

To install/uninstall packages in private environments users first need to activate them. For example, next few steps show how to install and uninstall numpy package in the "myenv" private environment.

  [NetID@cluster NetID]$ module load Anaconda/3-5.0.0.1          # Load Anaconda module
  [NetID@cluster NetID]$ source activate myenv                   # Activate environment
  (myenv) [NetID@cluster NetID]$ conda install numpy             # Command to install numpy package
  (myenv) [NetID@cluster NetID]$ conda list                      # Conda list command to check packages
  (myenv) [NetID@cluster NetID]$ conda uninstall numpy           # Command to uninstall numpy package

If you see the following error after installing a software package in Anaconda:

This system lists a couple of UTF-8 supporting locales that
you can pick from.  The following suitable locales were
discovered: aa_DJ.utf8, aa_ER.utf8, aa_ET.utf8, af_ZA.utf8, am_ET.utf8, an_ES.utf8, ar_AE.utf8, ar_BH.utf8, ar_DZ.utf8, ar_EG.utf8, ar_IN.utf8, ar_IQ.utf8, ar_JO.utf8, ar_KW.utf8, ar_LB.utf8, ar_LY.utf8, ar_MA.utf8, ar_OM.utf8, ar_QA.utf8, ar_SA.utf8, ar_SD.utf8, ar_SY.utf8, ar_TN.utf8, ar_YE.ut

Then copy the activate_utf.sh file to you conda environment substituting USERNAME and ENVIRONMENTNAME with your netid and environment name:

Ada:
cp /sw/hprc/Anaconda/activate_utf.sh /scratch/user/USERNAME/.conda/envs/ENVIRONMENTNAME/etc/conda/activate.d/

Terra:
cp /sw/hprc/sw/Anaconda/activate_utf.sh /scratch/user/USERNAME/.conda/envs/ENVIRONMENTNAME/etc/conda/activate.d/

Or if that doesn't work, run the following commands after activating your environment

export LANGUAGE=en_US.UTF-8
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
Clean and Remove a Virtual Environment

Anaconda downloads packages to your computer before software packages are installed. Those downloaded packages consume your disk quota. You may run 'conda clean --all' after your software packages are installed. To complete remove your private virtual environment myenv when you no longer need it, run the command "conda remove --name myenv --all"

JupyterLab

You can use the default JupyterLab environment which is listed on the JupyterLab portal app or you can create your own JupyterLab conda environment either using Anaconda or Miniconda for use on the HPRC portal but you must use one of the Anaconda versions that are on the JupyterLab HPRC portal webpage.

Notice that you will need to make sure you have enough available file quota (~30,000) since conda creates thousands of files.

An Anaconda install of JupyterLab creates about the same number of files as Miniconda3.

Anaconda

To to create an Anaconda conda environment called jupyterlab_1.2.2, do the following on the command line:

module purge
module load Anaconda/3-5.0.0.1
conda create -n jupyterlab_1.2.2


After your jupyterlab_1.2.2 environment is created, you will see output on how to activate and use your jupyterlab_1.2.2 environment

#
# To activate this environment, use:
# > source activate jupyterlab_1.2.2
#
# To deactivate an active environment, use:
# > source deactivate
#

Then you can install jupyterlab (specifying a version if needed) and add packages to your jupyterlab_1.2.2 environment

source activate jupyterlab_1.2.2
conda install -c conda-forge jupyterlab=1.2.2
conda install -c conda-forge package-name

To remove downloads after packages are installed.

conda clean -t
Miniconda

JupyterLab v1.2.2 installed via Miniconda3 will install python v3.6.7 while Anaconda installs python 3.8.0.

Anaconda/3-5.0.0.1 and Miniconda3/4.7.10 both use python v3.6.7 with jupyterlab v1.2.0 but jupyterlab v1.2.2 installs python 3.8.0 in Anaconda so it is best to use Anaconda for JupyterLab at the moment if you want to use jupyterlab v1.2.2 instead of v1.2.0.

To to create an Miniconda conda environment called jupyterlab_1.2.0, do the following on the command line:

module purge
module load Miniconda3/4.7.10
conda create -p /scratch/user/your_netid/.conda/envs/jupyterlab_1.2.0 jupyterlab=1.2.0

After your jupyterlab_1.2.0 environment is created, you will see output on how to activate and use your bio environment

#
# To activate this environment, use
#
#     $ conda activate /scratch/user/your_netid/.conda/envs/jupyterlab_1.2.0
#
# To deactivate an active environment, use
#
#     $ conda deactivate

You can add packages to your Miniconda3 environment using either Anaconda/3-5.0.0.1 or Miniconda3/4.7.10 both which use python v3.6.7

When activating the conda environment using the Miniconda3 module, you must specify the full path and should specify a path in your $SCRATCH directory since by default miniconda will install in your $HOME/.conda directory causing you to reach your $HOME file quota. When using the Anaconda module you only need to specify the environment name since Anaconda is configured to install environments in your $SCRATCH/.conda directory.

In this example, JupyterLab should be run using the portal JupyterLab app. You can use your Miniconda3/4.7.10 environment in the JupyterLab portal app by selecting the Anaconda/3-5.0.0.1 module in the portal app page and providing the name including full path of your Miniconda3/4.7.10 environment in the "JupyterLab Environment to be activated" box.

Jupyter Notebook

You can create your own Jupyter Notebook environment using either a Python module or Anaconda module for use on the HPRC Portal but you must use one of the Module versions that are on the Jupyter Notebook HPRC portal web page.

Your custom Notebook environment must be created on the command line for later use on the Jupyter Notebook portal app.

Notice that you will need to make sure you have enough available file quota (~10,000) since conda and pip creates thousands of files.

This table can help you decide when to use a Python module and when to use an Anaconda module for installing python packages.

Python Anaconda
Example module module load Python/3.6.6-intel-2018b module load Anaconda/3-5.0.0.1
When to use When only python packages are required When C, C++ or R modules are required for installing a software package with an extensive dependency list (Example: qiime2)
Can also install programming languages with specific versions such as Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, Julia and more within a conda environment
Python version only the same version as the module loaded can install any version of Python 3 within Anaconda
Env location virtual environment can be saved in any directory. It's up to the user to remember where environments are saved Manages environments in a centralized location:
   $SCRATCH/.conda/envs
Env activation Must provide full or relative path when activating
Command Line Example:
   source activate /scratch/user/netid/my_envs/env_name/bin/activate
Terra Jupyter Notebook Portal App Example:
  /scratch/user/netid/my_envs/env_name/bin/activate
Only need to provide environment name when activating
Command Line Example:
   source activate env_name
Jupyter Notebook Portal App Example:
  env_name
Available packages PyPI anaconda cloud (includes bioconda) and PyPI
Package install command pip conda (for Anaconda packages)
pip (for PyPI packages)
Installation type wheel or source precompiled binaries (using 'conda install pkg_name')
wheel or source (using 'pip install --user pkg_name')
software speed Specific software packages such as TensorFlow non-GPU are much faster when configured correctly than Anaconda binaries since they are compiled from source and can take advantage of CPU features. However, the performance for GPU versions of the TensorFlow modules versus Anaconda environments are relatively similar. precompiled binaries may be slower for some software packages
Dependency checks yes but not completely (see link below) yes
File usage each virutal environment downloads its own packages multiple conda environments share a common directory for downloaded packages so if a package has been previously installed in a conda environment, it doesn't have to be downloaded again when used in a new conda environment (unless you did 'conda clean -t')
Remove install cache pip cache purge
   for pip >=20.1b1
conda clean -t
  to remove downloaded tar packages from shared pkgs directory
Delete virtual environment rm -rf env_name_directory conda env remove --name env_name
possible issues not all dependencies are resolved globally when installing multiple packages (see link below) installing package dependencies from multiple channels (default vs conda-forge) may cause conflicts

understanding-conda-and-pip

Note: you must activate the python virtualenv or anaconda environment before installing packages with 'pip install --user' or 'conda install'

Python

A Python module can be used to create a virtual environment to be used in the portal Jupyter Notebook app when all you need is Python packages.

You can use a default Python virtual environment in the Jupyter Notebook portal app by leaving the "Optional Environment to be activated" field blank.

To to create a Python virtual environment called my_notebook-python-3.6.6-intel-2018b (you can name it whatever you like), do the following on the command line. You can save your virtual environments in any $SCRATCH directory you want. In this example a directory called /scratch/user/mynetid/pip_envs is used but you can use another name instead of pip_envs

mkdir /scratch/user/mynetid/pip_envs

A good practice is to name your environment so that you can identify which Python version is in your virtualenv so that you know which module to load.

The next three lines will create your virtual environment.

module purge
module load Python/3.6.6-intel-2018b
virtualenv /scratch/user/mynetid/pip_envs/my_notebook-python-3.6.6-intel-2018b

Then you can activate the virtual environment by using the full path to the activate command inside your virtual environment and install Python packages.

source /scratch/user/mynetid/pip_envs/my_notebook-python-3.6.6-intel-2018b/bin/activate
pip install notebook
pip install optional-python-package-name

You can use your Python/3.6.6-intel-2018b environment in the Jupyter Notebook portal app by selecting the Python/3.6.6-intel-2018b module in the portal app page and providing the name including full path to the activate command for your Python/3.6.6-intel-2018b virtual environment in the "Optional Conda Environment to be activated" box. The activate command is found inside the bin directory of your virtual env. An example of what to put in the "Optional Conda Environment to be activated" box is the full path used in the source command above.


Loading additional Lmod modules

You can install the jupyterlmod package in your python virtual environment on Terra which will allow you to load additional system modules that you may need or may have used during the creation of your virtual environment on Terra for use with the Terra Jupyter Notebook portal app.

To add this feature to your existing Terra virtual environment, do the following on the command line prior to launching Jupyter Notebook on the portal (you can use Python/3.6.6-foss-2018b if the additional module(s) you need are not available with the intel-2018b toolchain):

module purge
module load Python/3.6.6-intel-2018b
source /scratch/user/mynetid/pip_envs/my_notebook-python-3.6.6-intel-2018b/bin/activate
pip install jupyter jupyterlmod

Then launch the Terra Jupyter Notebook portal app using your optional environment and click the 'Softwares' tab in your notebook and search for system modules.

Select a module or multiple modules that match the toolchain and python version that you used in creating your virtual environment and then click enter to load the module.

The 'Loaded Modules' list will update in a few seconds to reflect the additional module(s) loaded.

You can save your modules loaded using the 'collection' button at the right side of the notebook 'softwares' page so that you just have to select the collection instead of searching for modules each time you want to use your python virtual environment.

Anaconda

Anaconda is different than Python's virtualenv in that you can install other types of software such as R and R packages in your environment. Anaconda also manages the installation path and installs in your $SCRATCH/.conda directory so you don't have to create a directory prior to creating an environment. To to create an Anaconda conda environment called my_notebook (you can name it whatever you like), do the following on the command line:

module purge
module load Anaconda/3-5.0.0.1
conda create -n my_notebook


After your my_notebook environment is created, you will see output on how to activate and use your my_notebook environment

#
# To activate this environment, use:
# > source activate my_notebook
#
# To deactivate an active environment, use:
# > source deactivate
#

Then you need to install notebook and then you can add optional packages to your my_notebook environment

source activate my_notebook
conda install -c conda-forge notebook
conda install -c conda-forge optional-package-name

You can use your Anaconda/3-5.0.0.1 environment in the Jupyter Notebook portal app by selecting the Anaconda/3-5.0.0.1 module in the portal app page and providing just the name (without the full path) of your Anaconda/3-5.0.0.1 environment in the "Optional Conda Environment to be activated" box. In the example above, the value to enter is: my_notebook

Using Conda

Other than creating a virtual environment as discussed above, the command 'conda' can list, clone, remove and share a virtual environment. More details can be found at https://conda.io/docs/using/envs.html. A user may find the conda cheatsheet is helpful: https://conda.io/docs/_downloads/conda-cheatsheet.pdf

Adding "channels"