Hprc banner tamu.png

SW:Caffe

From TAMU HPRC
Jump to: navigation, search

CAFFE Wiki info

Caffe

Description

Caffe is a deep learning framework made with expression, speed, and modularity in mind.

Access

Caffe is open to all HPRC users.

Anaconda and Caffe

TAMU HPRC can support Caffe through the Anaconda module.
While there are a variety of Anaconda modules available on Ada and Terra, it is simplest to use exactly the versions in the following sections.

You can learn more about the module system on our SW:Modules page.

You can explore the available Anaconda environments on a per-module basis using the following:

[NetID@ada ~]$ module load Anaconda/[SomeVersion]
[NetID@ada ~]$ conda info --envs

To load and use the Caffe virtual environment:

[NetID@ada ~]$ module load Anaconda/3-5.0.0.1
[NetID@ada ~]$ source activate caffe-gpu-1.0
[NetID@ada ~]$ [Run your script accessing CAFFE here]
[NetID@ada ~]$ source deactivate caffe-gpu-1.0

If you need python2, then you may load Anaconda/2-5.0.1 module.

Example Caffe Script

As with most cluster use, Caffe should be used via the submission of a job file. Scripts using Caffe are written in Python, and thus Caffe scripts should not be written directly inside a job file or entered in the shell line by line. Instead, a separate file for the Python/Caffe script should be created, which can then be executed by the job file.

Caffe was developed to represent deep networks in a modular way. That is to say: each layer of a deep network is represented in its own file. Before the script can be used, the layer file must be defined (in the text editor of your choice). More about the anatomy of a Caffe model can be found here.

Note: The layer file(s) and the script MUST be in the same directory.

The following was designed for use on the Anaconda/3-5.0.0.1 with the caffe-gpu-1.0 virtual environment. It is recommended to test your script with the same version.

Creating the layer file, conv.prototxt:

name: "convolution"
input: "data"
input_dim: 1
input_dim: 1
input_dim: 100
input_dim: 100
layer {
  name: "conv"
  type: "Convolution"
  bottom: "data"
  top: "conv"
  convolution_param {
    num_output: 3
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}  

Creating the script file: Load Caffe:

import caffe

Pay careful attention to which node this script will run on, as not all nodes have GPUs.
(More information on the computing environment: For Ada or For Terra).

To make the script GPU exclusive:

caffe.set_device(0)
caffe.set_mode_gpu()

To make the script CPU exclusive:

caffe.set_mode_cpu()

To load the net layer defined in conv.prototxt:

net = caffe.Net('conv.prototxt', caffe.TEST)

It is recommended to save this script with a .py file extension, but not necessary.

Once saved, the script can be tested on a login node by entering:

[NedID@ada ~]$ python testscript.py

NOTE: Make sure to run this command from the same directory that the script is saved in.

NOTE: While acceptable to test programs on the login node, please do not run extended or intense computation on these shared resources. Use a batch job and the compute nodes for heavy processing.

Usage on the Login Nodes

Please limit interactive processing to short, non-intensive usage. Use non-interactive batch jobs for resource-intensive and/or multiple-core processing. Users are requested to be responsible and courteous to other users when using software on the login nodes.

The most important processing limits here are:

  • ONE HOUR of PROCESSING TIME per login session.
  • EIGHT CORES per login session on the same node or (cumulatively) across all login nodes.

Anyone found violating the processing limits will have their processes killed without warning. Repeated violation of these limits will result in account suspension.
Note: Your login session will disconnect after one hour of inactivity.

Usage on the Compute Nodes

Non-interactive batch jobs on the compute nodes allows for resource-demanding processing. Non-interactive jobs have higher limits on the number of cores, amount of memory, and runtime length.

For instructions on how to create and submit a batch job, please see the appropriate wiki page for each respective cluster:

Usage on the VNC Nodes

The VNC nodes allow for usage of the a graphical user interface (GUI) without disrupting other users.

VNC jobs and GUI usage do come with restrictions. All VNC jobs are limited to a single node (20 cores, 64GB or 256GB). There are fewer VNC nodes than comparable compute nodes.

For more information, including instructions, on using software on the VNC nodes, please visit our Ada Remote Visualization page.