Difference between revisions of "SW:Matlab"
Phamminhtris (talk | contribs) (→Run Matlab Scripts Remotely Using the HPRC Matlab App) |
|||
(34 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
__TOC__ | __TOC__ | ||
− | = Running Matlab interactively | + | = Running Matlab interactively = |
− | Matlab is accessible | + | Matlab is accessible to all HPRC users within the terms of our license agreement. If you have particular concerns about whether specific usage falls within the TAMU HPRC license, please send an email to HPRC Helpdesk. You can start a Matlab session either directly on a login node or through our portal |
− | + | == Running Matlab on a login node == | |
− | |||
− | This will setup the environment for Matlab version | + | To be able to use Matlab, the Matlab module needs to be loaded first. This can be done using the following command: |
+ | [ netID@cluster ~]$ '''module load Matlab/R2020b''' | ||
+ | |||
+ | This will setup the environment for Matlab version R2020b. To see a list of all installed versions, use the following command: | ||
[ netID@cluster ~]$ '''module spider Matlab''' | [ netID@cluster ~]$ '''module spider Matlab''' | ||
<font color=teal>'''Note:''' New versions of software become available periodically. Version numbers may change.</font> | <font color=teal>'''Note:''' New versions of software become available periodically. Version numbers may change.</font> | ||
Line 14: | Line 16: | ||
[ netID@cluster ~]$ '''matlab''' | [ netID@cluster ~]$ '''matlab''' | ||
− | Depending on your X server settings, this will start either the Matlab GUI or the Matlab command line interface. To start Matlab in command line interface mode, use the following command with the appropriate flags: | + | Depending on your X server settings, this will start either the Matlab GUI or the Matlab command-line interface. To start Matlab in command-line interface mode, use the following command with the appropriate flags: |
[ netID@cluster ~]$ '''matlab -nosplash -nodisplay''' | [ netID@cluster ~]$ '''matlab -nosplash -nodisplay''' | ||
− | By default, Matlab will execute a large number of built-in operators and functions multi-threaded and will use as many threads (i.e. cores) as are available on the node. Since login nodes are shared among all users, HPRC restricts the number of computational threads to 8. This should suffice for most cases. Speedup achieved through multi-threading depends on many factors and in certain cases | + | By default, Matlab will execute a large number of built-in operators and functions multi-threaded and will use as many threads (i.e. cores) as are available on the node. Since login nodes are shared among all users, HPRC restricts the number of computational threads to 8. This should suffice for most cases. Speedup achieved through multi-threading depends on many factors and in certain cases. To explicitly change the number of computational threads, use the following Matlab command: |
>>feature('NumThreads',4); | >>feature('NumThreads',4); | ||
Line 27: | Line 29: | ||
{{:SW:Login_Node_Warning}} | {{:SW:Login_Node_Warning}} | ||
− | =Running | + | == Running Matlab through the hprc portal == |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | HPRC provides a portal through which users can start an interactive Matlab GUI session inside a web browser. For more information how to use the portal see our [[SW:Portal | HPRC OnDemand Portal]] section | |
− | + | = Running Matlab through the batch system = | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | HPRC developed a tool named '''matlabsubmit''' to run Matlab simulations on the HPRC compute nodes without the need to create your own batch script and without the need to start a Matlab session. '''matlabsubmit''' will automatically generate a batch script with the correct requirements. In addition, '''matlabsubmit''' will also generate boilerplate Matlab code to set up the environment (e.g. the number of computational threads) and, if needed, will start a ''parpool'' using the correct Cluster Profile (''local'' if all workers fit on a single node and a cluster profile when workers are distribued over multiple nodes) | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | HPRC developed a tool named '''matlabsubmit''' to run Matlab simulations on the HPRC compute nodes without the need to create your own batch script and without the need to start a Matlab session. '''matlabsubmit''' will automatically generate a batch script with the correct requirements. '''matlabsubmit''' will also generate boilerplate Matlab code to set up the environment (e.g. | ||
To submit your Matlab script, use the following command: | To submit your Matlab script, use the following command: | ||
<pre> | <pre> | ||
[ netID@cluster ~]$ matlabsubmit myscript.m | [ netID@cluster ~]$ matlabsubmit myscript.m | ||
− | </pre> | + | </pre> |
− | |||
− | |||
− | + | In the above example, '''matlabsubmit''' will use all default values for runtime, memory requirements, the number of workers, etc. To specify resources, you can use the command-line options of '''matlabsubmmit'''. For example: | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<pre> | <pre> | ||
− | + | [ netID@cluster ~]$ matlabsubmit -t 07:00 -s 4 myscript.m | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</pre> | </pre> | ||
+ | will set the wall-time to 7 hours and makes sure Matlab will use 4 computational threads for its run ( '''matlabsubmit''' will also request 4 cores). | ||
− | + | To see all options for '''matlabsubmit''' use the '''-h''' flag | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<pre> | <pre> | ||
+ | [ netID@cluster ~]$ matlabsubmit -h | ||
+ | Usage: /sw/hprc/sw/Matlab/bin/matlabsubmit [options] SCRIPTNAME | ||
− | + | This tool automates the process of running Matlab codes on the compute nodes. | |
− | |||
− | |||
− | |||
− | This | ||
OPTIONS: | OPTIONS: | ||
-h Shows this message | -h Shows this message | ||
-m set the amount of requested memory in MEGA bytes(e.g. -m 20000) | -m set the amount of requested memory in MEGA bytes(e.g. -m 20000) | ||
− | -t sets the walltime; form hh:mm | + | -t sets the walltime; form hh:mm (e.g. -t 03:27) |
-w sets the number of ADDITIONAL workers | -w sets the number of ADDITIONAL workers | ||
-g indicates script needs GPU (no value needed) | -g indicates script needs GPU (no value needed) | ||
Line 228: | Line 72: | ||
DEFAULT VALUES: | DEFAULT VALUES: | ||
− | memory : | + | memory : 2000 per core |
time : 02:00 | time : 02:00 | ||
workers : 0 | workers : 0 | ||
gpu : no gpu | gpu : no gpu | ||
threading: on, 8 threads | threading: on, 8 threads | ||
− | |||
− | |||
− | |||
</pre> | </pre> | ||
− | |||
− | |||
− | |||
'''NOTE''' when using the '''-f''' flag to execute a function instead of a script, the function call must be enclosed with double quotes when it contains parentheses. For example: '''matlabsubmit -f "myfunc(21)"''' | '''NOTE''' when using the '''-f''' flag to execute a function instead of a script, the function call must be enclosed with double quotes when it contains parentheses. For example: '''matlabsubmit -f "myfunc(21)"''' | ||
− | + | <br> | |
− | |||
− | |||
− | + | When executing, '''matlabsubmit''' will do the following: | |
− | + | * generate boilerplate Matlab code to setup the Matlab environment (e.g. #threads, #workers) <br> | |
− | + | * generate a batch script with all resources set correctly and the command to run Matlab <br> | |
− | + | * submit the generated batch script to the batch scheduler and return control back to the user <br> | |
− | |||
− | |||
− | |||
− | |||
− | #threads | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | For detailed examples on using matlabsubmit see the [[ SW:Matlab_matlabsubmit | examples ]] section. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
= Using Matlab Parallel Toolbox on HPRC Resources= | = Using Matlab Parallel Toolbox on HPRC Resources= | ||
− | <font color=red> ''THIS SECTION IS UNDER CONSTRUCTION'' </font>< | + | <!-- <font color=red> ''THIS SECTION IS UNDER CONSTRUCTION'' </font><be> --> |
− | In this section, we will | + | In this section, we will focus on utilizing the Parallel toolbox on HPRC cluster. For a general intro to the Parallel Toolbox see the [https://www.mathworks.com/help/parallel-computing/index.html?s_tid=CRUX_lftnav parallel toolbox ] section on the Mathworks website. Here we will discuss how to use Matlab Cluster profiles to distribute workers over multiple nodes. |
− | |||
+ | == Cluster Profiles == | ||
− | |||
− | Cluster | + | Matlab uses the concept of ''Cluster Profiles'' to create parallel pools. When Matlab creates a parallel pool, it uses the cluster profile to determine how many workers to use, how many threads every worker can use, where to store meta-data, as well as some other meta-data. There are two kinds of profiles. |
− | * local profiles: parallel processing is limited to the same node the Matlab client is running. | + | * local profiles: parallel processing is limited to the same node the Matlab client is running. |
− | * cluster profiles: parallel processing can span multiple nodes; profile interacts with batch scheduler (e.g. | + | * cluster profiles: parallel processing can span multiple nodes; profile interacts with a batch scheduler (e.g. SLURM on terra). |
'''NOTE:''' we will not discuss ''local profiles'' any further here. Processing using a local profile is exactly the same as processing using cluster profiles. | '''NOTE:''' we will not discuss ''local profiles'' any further here. Processing using a local profile is exactly the same as processing using cluster profiles. | ||
+ | |||
+ | |||
+ | TAMU HPRC provides a framework, to easily manage and update cluster profiles. The central concept in most of the discussion below is the '''TAMUClusterProperties''' object. The TAMUClusterProperties object keeps track of all the properties needed to successfully create a parallel pool. That includes typical Matlab properties, such as the number of Matlab workers requested as well as batch scheduler properties such as wall-time and memory. '''TAMUClusterProperties'''. | ||
=== Importing Cluster Profile === | === Importing Cluster Profile === | ||
− | For your convenience, HPRC already created a custom Cluster Profile. | + | For your convenience, HPRC already created a custom Cluster Profile. Using the profile, you can define how many workers you want, how you want to distribute the workers over the nodes, how many computational threads to use, how long to run, etc. Before you can use this profile you need to import it first. This can be done using by calling the following Matlab function. |
<pre> | <pre> | ||
− | >> | + | >>tamuprofile.importProfile() |
</pre> | </pre> | ||
− | |||
− | '''NOTE:''' | + | This function imports the cluster profile and it creates a directory structure in your scratch directory where Matlab will store meta-information during parallel processing. The default location is ''/scratch/$USER/MatlabJobs/TAMU<VERSION'', where <VERSION> represents the Matlab version. For example, for Matlab R2020b, it will be ''/scratch/$USER/MatlabJobs/TAMU2020b'' |
+ | |||
+ | <!-- | ||
+ | '''NOTE:''' function '''tamuprofile.clusterprofile''' is a wrapper around the Matlab function | ||
[https://www.mathworks.com/help/distcomp/parallel.importprofile.html parallel.importprofile] | [https://www.mathworks.com/help/distcomp/parallel.importprofile.html parallel.importprofile] | ||
+ | --> | ||
− | + | === Getting the Cluster Profile Object === | |
− | + | To get a '''TAMUClusterProperties''' object you can do the following: | |
− | = | + | <pre> |
+ | >> tp=TAMUClusterProperties; | ||
+ | </pre> | ||
− | + | '''tp''' is an object of type '''TAMUClusterProperties''' with default values for all the properties. To see all the properties, you can just print the value of '''tp'''. You can easily change the values using the convenience methods of '''TAMUClusterProperties''' | |
− | |||
− | |||
− | |||
For example, suppose you have Matlab code and want to use 4 workers for parallel processing. | For example, suppose you have Matlab code and want to use 4 workers for parallel processing. | ||
Line 430: | Line 148: | ||
>> tp=TAMUClusterProperties; | >> tp=TAMUClusterProperties; | ||
>> tp.workers(4); | >> tp.workers(4); | ||
− | |||
</pre> | </pre> | ||
− | + | == Creating a Parallel Pool == | |
− | |||
− | |||
− | |||
− | |||
− | == | ||
− | To start a parallel pool you can use the HPRC convenience function ''' | + | To start a parallel pool you can use the HPRC convenience function '''tamuprofile.parpool'''. It takes as argument a '''TAMUClustrerProperties''' object that specifies all the resources that are requested. |
− | + | For example: | |
<pre> | <pre> | ||
− | mypool = parpool | + | mypool = tamuprofile.parpool(tp) |
: | : | ||
delete(mypool) | delete(mypool) | ||
Line 453: | Line 165: | ||
NOTE: only instructions within parfor and spmd blocks are executed on the workers. All other instructions are executed on the client. | NOTE: only instructions within parfor and spmd blocks are executed on the workers. All other instructions are executed on the client. | ||
− | NOTE: all variables declared inside the | + | NOTE: all variables declared inside the parpool block will be destroyed once the block is finished. |
− | == | + | ==== Alternative approach to create parallel pool ==== |
− | = | ||
− | + | Matlab already provides functions to create parallel pools, namely: ''parcluster(<string clustername>)'' and ''parpool(<parcluster object>)''. You can use these functions as well, but it will be a more complicated to set all the properties correct ( we will not discuss how to do that here). To create a parallel pool using the basic Matlab functions, you can do the following: | |
− | |||
<pre> | <pre> | ||
− | + | cp = parcluster('TAMU2020b') | |
− | + | % add code to set the number of workers manually. There is no uniform way to do this and might depend on the type of cluster profile and the batch scheduler (e.g. Slurm) | |
− | + | mypool = parpool(cp); | |
− | + | ; | |
+ | delete(mypool) | ||
</pre> | </pre> | ||
− | |||
− | For | + | For convenience, TAMU HPRC also provides a convenience function to return a fully populated ''parcluster'' object that can be passed into a Matlab ''parpool'' function. See below for an example that creates a pool with 4 workers: |
+ | <pre> | ||
+ | tp = TAMUClusterProperties(); | ||
+ | tp.workers(4); | ||
+ | cp = tamuprofile.parcluster(); | ||
+ | mypool = parpool(cp) | ||
+ | : | ||
+ | delete(mypool) | ||
+ | </pre> | ||
− | |||
− | |||
− | < | + | <!-- |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Using GPU == | == Using GPU == | ||
Line 536: | Line 242: | ||
</pre> | </pre> | ||
+ | --> | ||
+ | |||
+ | |||
+ | |||
+ | <!-- | ||
+ | |||
+ | |||
+ | = Running (parallel) Matlab Scripts on HPRC compute nodes = | ||
+ | |||
+ | '''NOTE:''' Due to the new 2-factor authentication mechanism, this method does not work at the moment. We will update this wiki page when this is fixed. | ||
+ | |||
+ | |||
+ | For detailed information how to submit Matlab codes remotely, click [[SW:Matlab_app | here]] | ||
+ | |||
+ | == Submit Matlab Scripts Remotely or Locally From the Matlab Command Line == | ||
+ | |||
+ | '''NOTE:''' Due to the new 2-factor authentication mechanism, remote submission method does not work at the moment. We will update this wiki page when this is fixed. | ||
+ | |||
+ | Instead of using the App you can also call Matlab functions (developed by HPRC) directly to run your Matlab script on HPRC compute nodes. There are two steps involved in submitting your Matlab script: | ||
+ | |||
+ | * Define the properties for your Matlab script (e.g. #workers). HPRC created a class named '''TAMUClusterProperties''' for this | ||
+ | * Submit the Matlab script to run on HPRC compute nodes. HPRC created a function named '''tamu_run_batch''' for this. | ||
+ | |||
+ | For example, suppose you have a script named ''mysimulation.m'', you want to use 4 workers and estimate it will need less than 7 hours of computing time: | ||
+ | |||
+ | <pre> | ||
+ | >> tp=TAMUClusterProperties(); | ||
+ | >> tp.workers(4); | ||
+ | >> tp.walltime('07:00'); | ||
+ | >> myjob=tamu_run_batch(tp,'mysimulation.m'); | ||
+ | </pre> | ||
+ | |||
+ | '''NOTE:''' '''TAMUClusterProperties''' will use all default values for any of the properties that have not been set explicitly. | ||
+ | |||
+ | In case you want to submit your Matlab script remotely from your local Matlab GUI, you also have to specify the HPRC cluster name you want to run on and your username. | ||
+ | For example, suppose you have a script that uses Matlab GPU functions and you want to run it on terra: | ||
+ | <pre> | ||
+ | >> tp=TAMUClusterProperties(); | ||
+ | >> tp.gpu(1); | ||
+ | >> tp.hostname('terra.tamu.edu'); | ||
+ | >> tp.user('<USERNAME>'); | ||
+ | >> myjob=tamu_run_batch(tp,'mysimulation.m'); | ||
+ | </pre> | ||
+ | |||
+ | To see all available methods on objects of type '''TAMUClusterProperties''' you can use the Matlab '''help''' or '''doc''' functions: E.g. | ||
+ | |||
+ | >> help TAMUClusterProperties/doc | ||
+ | |||
+ | To see help page for '''tamu_run_batch''', use: | ||
+ | |||
+ | <pre> | ||
+ | >> help tamu_run_batch | ||
+ | tamu_run_batch runs Matlab script on worker(s). | ||
+ | |||
+ | j = TAMU_RUN_BATH(tp,'script') runs the script | ||
+ | script.m on the worker(s) using the TAMUClusterProperties object tp. | ||
+ | Returns j, a handle to the job object that runs the script. | ||
+ | |||
+ | |||
+ | </pre> | ||
+ | |||
+ | |||
+ | '''tamu_run_batch''' returns a variable of type '''Job'''. See the section ''"Retrieve results and information from Submitted Job"'' how to get results and information from the submitted job. | ||
[[Category:Software]] | [[Category:Software]] | ||
+ | --> |
Latest revision as of 17:18, 22 September 2021
Contents
Running Matlab interactively
Matlab is accessible to all HPRC users within the terms of our license agreement. If you have particular concerns about whether specific usage falls within the TAMU HPRC license, please send an email to HPRC Helpdesk. You can start a Matlab session either directly on a login node or through our portal
Running Matlab on a login node
To be able to use Matlab, the Matlab module needs to be loaded first. This can be done using the following command:
[ netID@cluster ~]$ module load Matlab/R2020b
This will setup the environment for Matlab version R2020b. To see a list of all installed versions, use the following command:
[ netID@cluster ~]$ module spider Matlab
Note: New versions of software become available periodically. Version numbers may change.
To start matlab, use the following command:
[ netID@cluster ~]$ matlab
Depending on your X server settings, this will start either the Matlab GUI or the Matlab command-line interface. To start Matlab in command-line interface mode, use the following command with the appropriate flags:
[ netID@cluster ~]$ matlab -nosplash -nodisplay
By default, Matlab will execute a large number of built-in operators and functions multi-threaded and will use as many threads (i.e. cores) as are available on the node. Since login nodes are shared among all users, HPRC restricts the number of computational threads to 8. This should suffice for most cases. Speedup achieved through multi-threading depends on many factors and in certain cases. To explicitly change the number of computational threads, use the following Matlab command:
>>feature('NumThreads',4);
This will set the number of computational threads to 4.
To completely disable multi-threading, use the -singleCompThread option when starting Matlab:
[ netID@cluster ~]$ matlab -singleCompThread
Usage on the Login Nodes
Please limit interactive processing to short, non-intensive usage. Use non-interactive batch jobs for resource-intensive and/or multiple-core processing. Users are requested to be responsible and courteous to other users when using software on the login nodes.
The most important processing limits here are:
- ONE HOUR of PROCESSING TIME per login session.
- EIGHT CORES per login session on the same node or (cumulatively) across all login nodes.
Anyone found violating the processing limits will have their processes killed without warning. Repeated violation of these limits will result in account suspension.
Note: Your login session will disconnect after one hour of inactivity.
Running Matlab through the hprc portal
HPRC provides a portal through which users can start an interactive Matlab GUI session inside a web browser. For more information how to use the portal see our HPRC OnDemand Portal section
Running Matlab through the batch system
HPRC developed a tool named matlabsubmit to run Matlab simulations on the HPRC compute nodes without the need to create your own batch script and without the need to start a Matlab session. matlabsubmit will automatically generate a batch script with the correct requirements. In addition, matlabsubmit will also generate boilerplate Matlab code to set up the environment (e.g. the number of computational threads) and, if needed, will start a parpool using the correct Cluster Profile (local if all workers fit on a single node and a cluster profile when workers are distribued over multiple nodes)
To submit your Matlab script, use the following command:
[ netID@cluster ~]$ matlabsubmit myscript.m
In the above example, matlabsubmit will use all default values for runtime, memory requirements, the number of workers, etc. To specify resources, you can use the command-line options of matlabsubmmit. For example:
[ netID@cluster ~]$ matlabsubmit -t 07:00 -s 4 myscript.m
will set the wall-time to 7 hours and makes sure Matlab will use 4 computational threads for its run ( matlabsubmit will also request 4 cores).
To see all options for matlabsubmit use the -h flag
[ netID@cluster ~]$ matlabsubmit -h Usage: /sw/hprc/sw/Matlab/bin/matlabsubmit [options] SCRIPTNAME This tool automates the process of running Matlab codes on the compute nodes. OPTIONS: -h Shows this message -m set the amount of requested memory in MEGA bytes(e.g. -m 20000) -t sets the walltime; form hh:mm (e.g. -t 03:27) -w sets the number of ADDITIONAL workers -g indicates script needs GPU (no value needed) -b sets the billing account to use -s set number of threads for multithreading (default: 8 ( 1 when -w > 0) -p set number of workers per node -f run function call instead of script -x add explicit batch scheduler option DEFAULT VALUES: memory : 2000 per core time : 02:00 workers : 0 gpu : no gpu threading: on, 8 threads
NOTE when using the -f flag to execute a function instead of a script, the function call must be enclosed with double quotes when it contains parentheses. For example: matlabsubmit -f "myfunc(21)"
When executing, matlabsubmit will do the following:
- generate boilerplate Matlab code to setup the Matlab environment (e.g. #threads, #workers)
- generate a batch script with all resources set correctly and the command to run Matlab
- submit the generated batch script to the batch scheduler and return control back to the user
For detailed examples on using matlabsubmit see the examples section.
Using Matlab Parallel Toolbox on HPRC Resources
In this section, we will focus on utilizing the Parallel toolbox on HPRC cluster. For a general intro to the Parallel Toolbox see the parallel toolbox section on the Mathworks website. Here we will discuss how to use Matlab Cluster profiles to distribute workers over multiple nodes.
Cluster Profiles
Matlab uses the concept of Cluster Profiles to create parallel pools. When Matlab creates a parallel pool, it uses the cluster profile to determine how many workers to use, how many threads every worker can use, where to store meta-data, as well as some other meta-data. There are two kinds of profiles.
- local profiles: parallel processing is limited to the same node the Matlab client is running.
- cluster profiles: parallel processing can span multiple nodes; profile interacts with a batch scheduler (e.g. SLURM on terra).
NOTE: we will not discuss local profiles any further here. Processing using a local profile is exactly the same as processing using cluster profiles.
TAMU HPRC provides a framework, to easily manage and update cluster profiles. The central concept in most of the discussion below is the TAMUClusterProperties object. The TAMUClusterProperties object keeps track of all the properties needed to successfully create a parallel pool. That includes typical Matlab properties, such as the number of Matlab workers requested as well as batch scheduler properties such as wall-time and memory. TAMUClusterProperties.
Importing Cluster Profile
For your convenience, HPRC already created a custom Cluster Profile. Using the profile, you can define how many workers you want, how you want to distribute the workers over the nodes, how many computational threads to use, how long to run, etc. Before you can use this profile you need to import it first. This can be done using by calling the following Matlab function.
>>tamuprofile.importProfile()
This function imports the cluster profile and it creates a directory structure in your scratch directory where Matlab will store meta-information during parallel processing. The default location is /scratch/$USER/MatlabJobs/TAMU<VERSION, where <VERSION> represents the Matlab version. For example, for Matlab R2020b, it will be /scratch/$USER/MatlabJobs/TAMU2020b
Getting the Cluster Profile Object
To get a TAMUClusterProperties object you can do the following:
>> tp=TAMUClusterProperties;
tp is an object of type TAMUClusterProperties with default values for all the properties. To see all the properties, you can just print the value of tp. You can easily change the values using the convenience methods of TAMUClusterProperties
For example, suppose you have Matlab code and want to use 4 workers for parallel processing.
>> tp=TAMUClusterProperties; >> tp.workers(4);
Creating a Parallel Pool
To start a parallel pool you can use the HPRC convenience function tamuprofile.parpool. It takes as argument a TAMUClustrerProperties object that specifies all the resources that are requested.
For example:
mypool = tamuprofile.parpool(tp) : delete(mypool)
This code starts a worker pool using the default cluster profile, with 4 additional workers.
NOTE: only instructions within parfor and spmd blocks are executed on the workers. All other instructions are executed on the client.
NOTE: all variables declared inside the parpool block will be destroyed once the block is finished.
Alternative approach to create parallel pool
Matlab already provides functions to create parallel pools, namely: parcluster(<string clustername>) and parpool(<parcluster object>). You can use these functions as well, but it will be a more complicated to set all the properties correct ( we will not discuss how to do that here). To create a parallel pool using the basic Matlab functions, you can do the following:
cp = parcluster('TAMU2020b') % add code to set the number of workers manually. There is no uniform way to do this and might depend on the type of cluster profile and the batch scheduler (e.g. Slurm) mypool = parpool(cp); ; delete(mypool)
For convenience, TAMU HPRC also provides a convenience function to return a fully populated parcluster object that can be passed into a Matlab parpool function. See below for an example that creates a pool with 4 workers:
tp = TAMUClusterProperties(); tp.workers(4); cp = tamuprofile.parcluster(); mypool = parpool(cp) : delete(mypool)