Hprc banner tamu.png

Difference between revisions of "SW:Matlab"

Jump to: navigation, search
(Using Matlab Parallel Toolbox on HPRC Resources)
(Using Matlab Parallel Toolbox on HPRC Resources)
Line 404: Line 404:
The '''parpool''' functions enables the full functionality of the parallel language features (parfor and spmd, will be discussed below). A parpool creates a special job on a pool of workers, and connects the pool to the MATLAB client. For example:
The '''parpool''' functions enables the full functionality of the parallel language features (parfor and spmd, will be discussed below). A parpool creates a special job on a pool of workers, and connects the pool to the MATLAB client. For example:
mypool = parpool 4
mypool = parpool 4
This code starts a worker pool using the default cluster profile, with 4 additional workers.  
This code starts a worker pool using the default cluster profile, with 4 additional workers.  

Revision as of 14:29, 11 May 2017

Running Matlab interactively on login nodes

Matlab is accessible to all HPRC users within the terms of our license agreement. If you have particular concerns about whether specific usage falls within the TAMU HPRC license, please send an email to HPRC Helpdesk.

To be able to use matlab, the Matlab module needs to be loaded first. This can be done using the following command:

[ netID@cluster ~]$ module load Matlab/R2017a

This will setup the environment for Matlab version R2017a. To see a list of all installed versions, use the following command:

[ netID@cluster ~]$ module spider Matlab

Note: New versions of software become available periodically. Version numbers may change.

To start matlab, use the following command:

[ netID@cluster ~]$ matlab

Depending on your X server settings, this will start either the Matlab GUI or the Matlab command line interface. To start Matlab in command line interface mode, use the following command with the appropriate flags:

[ netID@cluster ~]$ matlab -nosplash -nodisplay

By default, Matlab will execute a large number of built-in operators and functions multi-threaded and will use as many threads (i.e. cores) as are available on the node. Since login nodes are shared among all users, HPRC restricts the number of computational threads to 8. This should suffice for most cases. Speedup achieved through multi-threading depends on many factors and in certain cases, it is possible that using 8 thread might negatively affect runtime.To explicitly change the number of computational threads, use the following Matlab command:


This will set the number of computational threads to 4.

To completely disable multi-threading, use the -singleCompThread option when starting Matlab:

[ netID@cluster ~]$ matlab -singleCompThread

Usage on the Login Nodes

Please limit interactive processing to short, non-intensive usage. Use non-interactive batch jobs for resource-intensive and/or multiple-core processing. Users are requested to be responsible and courteous to other users when using software on the login nodes.

The most important processing limits here are:

  • ONE HOUR of PROCESSING TIME per login session.
  • EIGHT CORES per login session on the same node or (cumulatively) across all login nodes.

Anyone found violating the processing limits will have their processes killed without warning. Repeated violation of these limits will result in account suspension.
Note: Your login session will disconnect after one hour of inactivity.

Running (parallel) Matlab Scripts on HPRC compute nodes

When your Matlab script needs more resources than are allowed during an interactive session (e.g. cpu time, number of cores) you CANNOT run it on a login node. In this section, we will discuss the various ways to run your Matlab scripts on the HPRC compute nodes.

Submit Matlab Scripts Remotely or Locally Using the HPRC Matlab App

HPRC developed an app to make it as straightforward as possible to submit Matlab jobs to the compute nodes. The app can be started from the Matlab GUI on HPRC login nodes as well as from your local Matlab GUI (i.e. running on your local desktop or laptop). NOTE: you have to be within the campus firewall; either on campus or through VPN.

Installing the App for remote job submission

To use the App in your local Matlab GUI, you need to install the TAMU HPRC Matlab Toolbox first. You can download the toolbox here. After downloading you can install it from within the Matlab GUI. Simply double-click on the file (or right click and select install). This will install both the toolbox (containing the actual code and functions to submit Matlab scripts) as well as the App. After installing, the app should show up in the Apps tab under the name TAMU HPRC.

Using the App to Submit your script

After installing you can start the app from the APPS tab. Alternatively, you can start the App directly by typing HPRC on the Matlab command line. The App window will popup and will look like:


To select the Matlab script you want to run click on the Browse button; it will open a file selection dialog when you can select the Matlab script. You can also type the name of the script directly. NOTE: the file does not have to be in the current directory.

In the "MATLAB OPTIONS" tab you can set the following properties for your Matlab run:

  • #workers represents the number of additional workers you want to use. Only set this if your program contains explicit Matlab parallel constructs (such as parfor or spmd)
  • per node represents the number of workers to distribute per node. This is useful if you want every worker to have access to resources on a node (e.g. all the memory or a gpu)
  • #threads sets the number of computational threads available for every worker and client.
  • use gpu select this box if your Matlab script used Matlab gpu operations.

You can also specify where to run your script. If you are running the Matlab GUI from an HPRC login node, choose local. If you are using the App from your local Matlab GUI (e.g. on your laptop or desktop) choose what HPRC cluster to run on; either ada or terra. In that case you have to enter your username as well.

In the "BATCH OPTIONS" tab you can set the following additional batch job properties:

  • Total Memory (MB) Total memory you need to run your Matlab job. If value is 0, memory will be set to max memory per core (e.g. on ada will be 2500MB)
  • Walltime (hh:mm) total wall time for your job.
  • batch options explicit batch options; LSF options for ada OR SLURM options for terra. For example, "-u user@tamu.edu -B -N" on LSF to send email when a job starts or ends.

To submit the job click on the SUBMIT button. A new window will popup that will show information about the job submission process.


NOTE: The first time you submit a job remotely (i.e. through a Matlab GUI on your local laptop/desktop) a directory selection dialog will popup, where you heave to select a directory where Matlab will store Job information. You will only need to do this once.

Retrieve results and information from Submitted Job

A variable named myjob of type Job will be copied to the workspace once the job has been successfully submitted. You can use this variable to retrieve information about the job. For example:

  • myjob.State will show the current status of the job. This can be queued , running, or finished
  • myjob.diary will display all the redirected screen output from your Matlab run.
  • myjob.load will load all the variables from your Matlab run into the current workspace

In addition, you can also get the Job information through the Parallel Job monitor (click Parallel --> Monitor Jobs). Use TAMU or TAMUREMOTE cluster profile to see the corresponding jobs .

Considerations for submtting scripts remotely

Depending on your local network connection, and the size of your input data, copying all your input data to the HPRC cluster might take time. Keep this in mind when running jobs remotely. Also, if your output data is large it might take considerable time to send everything back to your local workspace. In that case, it might be useful to clear variables that are not needed anymore. Another option is to save large matrices into a file instead. NOTE: currently, the files will not be copied back but remain remotely in your $HOME directory.

Submit Matlab Scripts Remotely or Locally From the Matlab Command Line

Instead of using the App you can also call Matlab functions (developed by HPRC) directly to run your Matlab script on HPRC compute nodes. There are two steps involved in submitting your Matlab script:

  • Define the properties for your Matlab script (e.g. #workers). HPRC created a class named TAMUClusterProperties for this
  • Submit the Matlab script to run on HPRC compute nodes. HPRC created a function named tamu_run_batch for this.

For example, suppose you have a script named mysimulation.m, you want to use 4 workers and estimate it will need less than 7 hours of computing time:

>> tp=TAMUClusterProperties();
>> tp.workers(4);
>> tp.walltime('07:00');
>> myjob=tamu_run_batch(tp,'mysimulation.m');

NOTE: TAMUClusterProperties will use all default values for any of the properties that have not been set explicitly.

In case you want to submit your Matlab script remotely from your local Matlab GUI, you also have to specify the HPRC cluster name you want to run on and your username. For example, suppose you have a script that uses Matlab GPU functions and you want to run it on terra:

>> tp=TAMUClusterProperties();
>> tp.gpu(1);
>> tp.hostname('terra.tamu.edu');
>> tp.user('<USERNAME>');  
>> myjob=tamu_run_batch(tp,'mysimulation.m');

To see all available methods on objects of type TAMUClusterProperties you can use the Matlab help or doc functions: E.g.

  >> help TAMUClusterProperties/doc 

To see help page for tamu_run_batch, use:

   >> help tamu_run_batch
      tamu_run_batch  runs Matlab script on worker(s). 
         j = TAMU_RUN_BATH(tp,'script') runs the script
         script.m on the worker(s) using the TAMUClusterProperties object tp.
         Returns j, a handle to the job object that runs the script.

tamu_run_batch returns a variable of type Job. See the "Retrieve results and information from Submitted Job" section how to get results and information from the submitted job.

Submit Matlab Scripts Directly from HPRC Login Shell

HPRC developed a tool named matlabsubmit to run Matlab simulations on the HPRC compute nodes without the need to create your own batch script and without the need to start a Matlab session. matlabsubmit will automatically generate a batch script with the correct requirements. matlabsubmit will also generate boilerplate Matlab code to set up the environment (e.g. st number of computational threads) and if needed will start a parpool using the correct Cluster Profile (local if all workers fit on a single node and a TAMU cluster profile otherwise)

To submit your Matlab script, use the following command:

[ netID@cluster ~]$ matlabsubmit myscript.m

When executing, matlabsubmit will do the following:

  • generate boiler plate Matlab code to setup the matlab environment (e.g. #threads, #workers)
  • generate a batch script with all resources set correctly and the command to run matlab
  • submit the generated batch script to the batch scheduler and return control back to the user

To see all options for matlabsubmit type:

[ netID@cluster ~]$ matlabsubmit -h

Example 1: basic use

The following example shows the simplest use of matlabsubmit. It will execute matlab script test.m using default values for batch resources and Matlab resources. matlabsubmit will also print some useful information to the screen. As can be seen in the example, it will show the Matlab resources requested (e.g. #threads, #workers), the submit command that will be used to submit the job, the batch scheduler JobID, and the location of output generated by Matlab and the batch scheduler.

-bash-4.1$ matlabsubmit test.m

Running Matlab script with following parameters
Script     : test.m
Workers    : 0
Nodes      : 1
Mem/proc   : 2500
#threads   : 8

bsub  -e MatlabSubmitLOG1/lsf.err -o MatlabSubmitLOG1/lsf.out  
      -L /bin/bash -n 8 -R span[ptile=8] -W 02:00 -M 2500 
      -R rusage[mem=2500]      
      -J test1 MatlabSubmitLOG1/submission_script

Verifying job submission parameters...
Verifying project account...
     Account to charge:   082839397478
         Balance (SUs):     81535.6542
         SUs to charge:        16.0000
Job <2847580> is submitted to default queue <sn_regular>.

matlabsubmit ID        : 1
matlab output file     : MatlabSubmitLOG1/matlab.log
LSF/matlab output file : MatlabSubmitLOG1/lsf.out
LSF/matlab error file  : MatlabSubmitLOG1/lsf.err

The matlab script test.m has to be in the current directory. Control will be returned immediately after executing the matlabsubmit command. To check the run status or kill a job, use the respective batch scheduler commands (e.g. bjobs and bkill on ada). matlabsubmit will create a sub directory named MatlabSubmitLOG<N> (where N is the matlabsubmit ID). In this directory matlabsubmit will store all its relevant files; the generated batch script, matlab driver, redirected output and error, and a copy of the workspace (after the job is done). A listing of this directory will show the following files:

  • lsf.err redirected error
  • lsf.out redirected output (both LSF and Matlab)
  • matlab.log redirected Matlab screen output
  • matlabsubmit_wrapper.m Matlab code that sets #threads and calls user function
  • submission_script the generated LSF batch script
  • workspace.mat a copy of the matlab workspace (after execution has finished)

Options with matlabsubmit

The example above showed the most simple case of using matlabsubmit. No options where specified and matlabsubmit used default values for requested resources. However, matlabsubmit provides a number of options to set batch resources (e.g. walltime, memory) as well as matlab related options (e.g. number of threads to use, number of workers, etc). To see all the available options you can use the "-h" option. See below for the output of "matlabsubmit -h":

-bash-4.1$ matlabsubmit -h
/software/hprc/Matlab/bin/matlabsubmit: option requires an argument -- h
Usage: /software/hprc/Matlab/bin/matlabsubmit [options] SCRIPTNAME

This tools automates the process of running matlab codes on the compute nodes.

  -h Shows this message
  -m set the amount of requested memory in MEGA bytes(e.g. -m 20000)
  -t sets the walltime; form hh:mm (e.g. -t 03:27)
  -w sets the number of ADDITIONAL workers
  -g indicates script needs GPU  (no value needed)
  -b sets the billing account to use 
  -s set number of threads for multithreading (default: 8 ( 1  when -w > 0)
  -p set number of workers per node
  -f run function call instead of script
  -x add explicit batch scheduler option
  memory   : 2500 per core 
  time     : 02:00
  workers  : 0
  gpu      : no gpu 
  threading: on, 8 threads


For example, the command matlabsubmit -t "03:27" -m 17000 -s 20 myscript.m will request 17gb of memory and 3 hours and 27 minutes of computing time. It will also set the number of computational threads in Matlab to 20 and execute the Matlab script myscript.m.

NOTE when using the -f flag to execute a function instead of a script, the function call must be enclosed with double quotes when it contains parentheses. For example: matlabsubmit -f "myfunc(21)"

Example 2: Utilizing Matlab workers (single node)

To utilize additional workers used by Matlab's parallel features such as parfor,spmd, and distributed matlabsubmit provides the option to specify the number of workers. This is done using the -w <N> flag (where <N> represents the number of workers). The following example shows a simple case of using additional workers; in this case 8 workers

-bash-4.1$ matlabsubmit -w 8 test.m
Running Matlab script with following parameters
Script     : test.m
Workers    : 8
Nodes      : 1
Mem/proc   : 2500
#threads   : 1

bsub  -e MatlabSubmitLOG5/lsf.err -o MatlabSubmitLOG5/lsf.out  
      -L /bin/bash -n 9 -R span[ptile=9] -W 02:00 -M 2500 
      -R rusage[mem=2500] 
      -J test5 MatlabSubmitLOG5/submission_script

Verifying job submission parameters...
Verifying project account...
     Account to charge:   082839397478
         Balance (SUs):     80533.2098
         SUs to charge:        18.0000
Job <2901543> is submitted to default queue <sn_regular>.

matlabsubmit ID        : 5
matlab output file     : MatlabSubmitLOG5/matlab.log
LSF/matlab output file : MatlabSubmitLOG5/lsf.out
LSF/matlab error file  : MatlabSubmitLOG5/lsf.err


In this example, matlabsubmit will first execute matlab code to create a parpool with 8 workers (using the local profile). As can be seen in the output, in this case, matlabsubmit requests 9 cores: 1 core for the client and 8 cores for the workers. The only exception is when the user requests 20 workers. In that case, matlabsubmit will request 20 cores.

Example 3: Utilizing Matlab workers (multi node)

matlabsubmit provides excellent options for Matlab runs that need more than 20 workers (maximum for single node) and/or when the Matlab workers need to be distributed among multiple nodes. Reasons for distributing workers among different nodes include: need to use certain resources such as gpu on multiple nodes, enable multi threading on every worker, and use the available memory on multiple nodes. The following example shows how to run a matlab simulation that utilizes 24 workers, where every node will run 4 workers (i.e. the workers will be distributed among 24/4 = 6 nodes).

-bash-4.1$ matlabsubmit -w 24 -p 4 test.m
Running Matlab script with following parameters
Script     : test.m
Workers    : 24
Nodes      : 6
Mem/proc   : 2500
#threads   : 1

... starting matlab batch. This might take some time. 
See MatlabSubmitLOG8/matlab-batch-commands.log
...Starting Matlab from host: login4
MATLAB is selecting SOFTWARE OPENGL rendering.

                                           < M A T L A B (R) >
                                 Copyright 1984-2016 The MathWorks, Inc.
                                 R2016a ( 64-bit (glnxa64)
                                            February 11, 2016

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
... Interactive Matlab session, multi threading reduced to 4

	Academic License

commandToRun =

bsub -L /bin/bash -J Job1 -o '/general/home/pennings/Job1/Job1.log' -n 25 -M 2500 
     -R rusage[mem=2500] -R "span[ptile=4]" -W 02:00       
     "source /general/home/pennings/Job1/mdce_envvars ;

job = 



                   ID: 1
                 Type: pool
             Username: pennings
                State: running
           SubmitTime: Mon Aug 01 12:15:15 CDT 2016
     Running Duration: 0 days 0h 0m 0s
      NumWorkersRange: [25 25]

      AutoAttachFiles: true
  Auto Attached Files: /general/home/pennings/MatlabSubmitLOG8/matlabsubmit_wrapper.m
        AttachedFiles: {}
      AdditionalPaths: {}

    Associated Tasks: 

       Number Pending: 25
       Number Running: 0
      Number Finished: 0
    Task ID of Errors: []
  Task ID of Warnings: []

matlabsubmit JOBID            : 8
batch  output file (client)   : Job1/Task1.diary.txt
batch  output files (workers) : Job1/Task[2-25].diary.txt


As can be seen the output is very different from the previous examples. When a job uses multiple nodes the approach matlabsubmit uses is a bit different. matlabsubmit will start a regular interactive matlab session and from within it will run the Matlab batch command using the TAMUG cluster profile. It will then exit Matlab while the Matlab script is executed on the compute nodes.

The contents of the MatlabSubmitLOG directory are also slightly different. A listing will show the following files:

  • matlab-batch-commands.log screen output from Matlab
  • matlabsubmit_driver.m Matlab code that sets up the cluster profile and calls Matlab batch
  • matlabsubmit_wrapper.m Matlab code that sets #threads and calls user function
  • submission_script The actual command to start Matlab

In addition to the MatlabSubmitLOG directory created by matlabsubmit, Matlab will also create a directory named Job<N> used by the cluster profile to store meta data, log files, and screen output. The *.diary.txt text files will show screen output for the client and all the workers.

Using Matlab Parallel Toolbox on HPRC Resources


In addition to the 50 general Matlab licenses, HPRC also purchased a Matlab Distributed Computing Server license for a total 96 tokens. These tokens are used to start additional Matlab workers and are used by parallel Matlab constructs like parfor, spmd, and distributed.

For parallel processing on the compute nodes Matlab uses Cluster profiles. A cluster profile acts as an interface between Matlab and the batch scheduler (e.g. LSF, SLURM) and lets you define certain properties of your cluster (e.g. how to submit jobs, submission parameters, job requirements, etc). Matlab will use the cluster profile to offload parallel (or sequential) matlab code to one or more workers.

For your convenience, HPRC already created a custom Cluster Profile. You can use this profile to define how many workers you want, how you want to distribute the workers over the nodes Before you can use this profile you need to import it first (you only need to do this once). This can be done using by calling the following Matlab function.

Importing Cluster Profile


This function imports the cluster profile into the workspace and it also creates a sub directory structure in you scratch to store job information for that cluster

We will discuss briefly some of the most common parallel matlab concepts. For more detailed information about these constructs, as well as additional parallel constructs consult the Parallel Computing Toolbox User Guide.

Starting a Parallel Pool

The parpool functions enables the full functionality of the parallel language features (parfor and spmd, will be discussed below). A parpool creates a special job on a pool of workers, and connects the pool to the MATLAB client. For example:

mypool = parpool 4

This code starts a worker pool using the default cluster profile, with 4 additional workers.

NOTE: only instructions within parfor and spmd blocks are executed on the workers. All other instructions are executed on the client.

NOTE: all variables declared inside the matlabpool block will be destroyed once the block is finished.


The concept of a parfor-loop is similar to the standard Matlab for-loop. The difference is that parfor partitions the iterations among the available workers to run in parallel. For example:

parfor i=1:1024

This code will open a parallel pool with 2 workers using the default cluster profile and execute the loop in parallel.

For more information please visit the Matlab parfor page.


spmd runs the same program on all workers concurrently. A typical use of spmd is when you need to run the same program on multiple sets of input. For example, Suppose you have 4 inputs named data1,data2,data3,data4 and you want run funcion myfun on all of them:

spmd (4)
    data = load(['data' num2str(labindex)])
    myresult = myfun(data)

NOTE: labindex is a Matlab variable and is set to the worker id, values range from 1 to number of workers.

Every worker will have its own version of variable myresult. To access these variables outside the spmd block you append {i} to the variable name, e.g. myresult{3} represents variable myresult from worker 3.

For more information please visit the Matlab spmd page.

Using GPU

Normally all variables reside in the client workspace and matlab operations are executed on the client machine. However, Matlab also provides options to utilize available GPUs to run code faster. Running code on the gpu is actually very straightforward. Matlab provides GPU versions for many build-in operations. These operations are executed on the GPU automatically when the variables involved reside on the GPU. The results of these operations will also reside on the GPU. To see what functions can be run on the GPU type:

methods('gpuArray') This will show a list of all available functions that can be run on the GPU, as well as a list of available static functions to create data on the GPU directly (will be discussed later).

NOTE: There is significant overhead of executing code on the gpu because of memory transfers.

Another useful function is: gpuDevice This functions shows all the properties of the GPU. When this function is called from the client (or a node without a GPU) it will just print an error message.

To copy variables from the client workspace to the GPU, you can use the gpuArray command. For example:

carr = ones(1000);
garr = gpuArray(carr);

will copy variable carr to the GPU wit name garr. If variable carr is not used in the client workspace you can write it as:

garr = gpuArray(ones(1000));

The two versions have the same problem. They both need to copy the 1000x1000 matrix from client workspace to the GPU. We mentioned above that Matlab provides methods to create data directly on the GPU to avoid the overhead of copying data to the GPU. For example:


This will create a 1000x1000 matrix directly on the GPU consisting of all ones.

You can find a list of all methods to create data directly on the GPU here.

To copy data back to the client workspace Matlab provides the gather operation.

carr2 = gather(garr)

This will copy the array garr on the GPU back to variable carr2 in the client workspace. Overhead

As mentioned before there is considerable overhead involved when using the GPU. Actually, there are two types of overhead. Warming up GPU (first time GPU is used). Data transfer. Warming up

Here is a little example that performs a matrix multiplication on the client, a matrix multiplication on the GPU, and prints out elapsed times for both. The actual cpu-gpu matrix multiplication code can be written as: a = rand(1000); tic; b = a*a; toc; tic; ag = gpuArray(a); bg = ag*ag; toc; c = gather(cg)

Almost no additional steps are required to use the gpu. Actually, copying the results to the client workspace is not even needed. Variables that reside on the gpu can be printed or plotted just like variables in the client workspace.