Hprc banner tamu.png


Revision as of 09:12, 19 August 2016 by Pingluo (talk | contribs)
Jump to: navigation, search

After a model is built in Comsol, the next step is to compute the model for a solution, which is often time consuming and a job script must be created to run it in batch. This tutorial illustrates how to create Comsol LSF batch scripts on Ada.

All solvers in Comsol can run in parallel in one of three parallel modes: shared memory mode, distributed mode, or hybrid mode. By default, a Comsol solver runs in shared memory mode. This is the same as OpenMP where the parallelism is limited by total number of CPU cores available in one compute node in a cluster.

Example 1:

 #BSUB -n 20
 comsol -np 20 batch -inputfile input.mph -outputfile output.mph

Comsol solvers can also run in distributed mode by checking the "distributed computing" checkbox of the solver when building the model. In this mode, the solver runs on multiple nodes and uses MPI for communication. Except PARDISO, all solvers support distributed mode. However, PARDISO also has a check box for distributed computing. If selected, the actual solver used is MUMPS.

Example 2:

 #BSUB -n 40
 comsol -simplecluster -inputfile input.mph -outputfile output.mph

This is the same as:

 #BSUB -n 40
 comsol -f ./hostfile.$JOB_ID -nn 40 batch -inputfile input.mph -outputfile output.mph

Either mode has its pros and cons. Shared mode utilizes CPU cores better than distributed mode but can only run in one node, while distributed mode can utilize more than one node. It is usually best to run a solver in a way to take advantage of both modes. This can be done easily at the command line through fine tuning of the options -nn, -nnhost, -np.

Example 3:

 #BSUB -n 40
 comsol batch -f ./hostfile.$JOB_ID -nn 2 -nnhost 1 -np 20 -inputfile input.mph -outputfile output.mph

Example 4:

 #BSUB -n 40
 comsol batch -f ./hostfile.$JOB_ID -nn 4 -nnhost 2 -np 10 -inputfile input.mph -outputfile output.mph

Comsol models configured with parameter sweep can also benefit from parallel computing in different ways. A model configured with parameter sweep needs to run under a range of parameters or combinations of parameters, and each set of parameters can be calculated independently. Once a model with parameter sweep node created in Comsol GUI, it must be also configured with cluster sweep to distribute the parameters to be processes in parallel.

Example 5:

 #BSUB -n 40
 comsol -f  ./hostfile.$JOB_ID -nn 10 -nnhost 5 -np 4 -inputfile input.mph -outputfile output.mph

large memory requirement

Example 6:

 #BSUB -n 200
 comsol -f ./hostfile.$JOB_ID -nn 10 -nnhost 1 -np 20

Common problems

1. Disk quota exceeded in home directory.

By default, comsol stores all temparary files in your home directory. For large models, are likely to get "Disk quota exceeded" error due to huge amount of temporary files dumped into y our home directory. To resolve this issue, you need to redirect temporary files to your scratch directory.

  comsole -tmpdir /scratch/user/username/cosmol/tmp -recoverydir /scratch/user/username/comsol/recovery ...

For admins:

1. ??? Java exception occurred:java.lang.OutOfMemoryError: Java heap space

Increase the java head size in $COMSLPATH/bin/comsol.ini. Change -Xmx from 1024m to a larger value.