Hprc banner tamu.png

Difference between revisions of "SW:Comsol"

From TAMU HPRC
Jump to: navigation, search
Line 8: Line 8:
 
   #BSUB -J comsoltest
 
   #BSUB -J comsoltest
 
   #BSUB -n 20 -R "span[ptile=20]"
 
   #BSUB -n 20 -R "span[ptile=20]"
   #BSUB -M 2800
+
   #BSUB -M 2800 -R "rusage[mem=2700]"
 
   #BSUB -o output.%J
 
   #BSUB -o output.%J
 
   #BSUB -L /bin/bash
 
   #BSUB -L /bin/bash
 
   #BSUB -W 2:00
 
   #BSUB -W 2:00
 
+
 
 
   module load Comsol/xxx  
 
   module load Comsol/xxx  
 
   comsol -np 20 batch -inputfile in.mph -outputfile out.mph
 
   comsol -np 20 batch -inputfile in.mph -outputfile out.mph
Line 20: Line 20:
 
==Running Comsol in Different Parallel Mode==
 
==Running Comsol in Different Parallel Mode==
  
Assuming other things don't change, we will see additional examples running in different parallel modes by changing number of cores and Comsol command line parameters.  
+
Assuming other things are the same as in Example 1, we will see additional examples running in different parallel modes by changing number of cores and Comsol command line parameters.  
  
 
===Shared Memory Mode===
 
===Shared Memory Mode===
Example 2: Solving a model in shared mode and using all 20 cores in one cluster node
+
Example 2: Solving a model in shared mode and using all 20 cores in one cluster node. This is the same as example one.
  
 
   #BSUB -n 20
 
   #BSUB -n 20
Line 38: Line 38:
 
This is the same as:
 
This is the same as:
 
   #BSUB -n 40
 
   #BSUB -n 40
 +
  cat $LSB_DJOB_HOSTFILE  > nodefile.$JOB_ID
 
   comsol -f ./hostfile.$JOB_ID -nn 40 batch -inputfile input.mph -outputfile output.mph
 
   comsol -f ./hostfile.$JOB_ID -nn 40 batch -inputfile input.mph -outputfile output.mph
  
Line 46: Line 47:
  
 
   #BSUB -n 40
 
   #BSUB -n 40
 +
  cat $LSB_DJOB_HOSTFILE |uniq  > nodefile.$JOB_ID
 
   comsol batch -f ./hostfile.$JOB_ID -nn 2 -nnhost 1 -np 20 -inputfile input.mph -outputfile output.mph
 
   comsol batch -f ./hostfile.$JOB_ID -nn 2 -nnhost 1 -np 20 -inputfile input.mph -outputfile output.mph
  
Line 51: Line 53:
  
 
   #BSUB -n 40
 
   #BSUB -n 40
 +
  cat $LSB_DJOB_HOSTFILE |uniq  > nodefile.$JOB_ID
 
   comsol batch -f ./hostfile.$JOB_ID -nn 4 -nnhost 2 -np 10 -inputfile input.mph -outputfile output.mph
 
   comsol batch -f ./hostfile.$JOB_ID -nn 4 -nnhost 2 -np 10 -inputfile input.mph -outputfile output.mph
  
Line 58: Line 61:
 
Example 6: Solving a parameter sweep model on 40 cores. In this example, 10 combinations of parameters will be running in concurrently on two cluster node with 5 combinations of parameters on each cluster node. Each combination of parameters will be running on 4 cores.  
 
Example 6: Solving a parameter sweep model on 40 cores. In this example, 10 combinations of parameters will be running in concurrently on two cluster node with 5 combinations of parameters on each cluster node. Each combination of parameters will be running on 4 cores.  
  
   #BSUB -n 40
+
   #BSUB -n 200
 +
  cat $LSB_DJOB_HOSTFILE |uniq  > nodefile.$JOB_ID
 
   comsol -f  ./hostfile.$JOB_ID -nn 10 -nnhost 5 -np 4 -inputfile input.mph -outputfile output.mph
 
   comsol -f  ./hostfile.$JOB_ID -nn 10 -nnhost 5 -np 4 -inputfile input.mph -outputfile output.mph
  
Line 66: Line 70:
  
 
   #BSUB -n 200
 
   #BSUB -n 200
 +
  cat $LSB_DJOB_HOSTFILE |uniq  > nodefile.$JOB_ID
 
   comsol -f ./hostfile.$JOB_ID -nn 10 -nnhost 1 -np 20 -inputfile input.mph -outputfile output.mph
 
   comsol -f ./hostfile.$JOB_ID -nn 10 -nnhost 1 -np 20 -inputfile input.mph -outputfile output.mph
  

Revision as of 08:16, 31 August 2016

When a model is built in Comsol GUI, the next step is to compute the model for a solution, which is often time-consuming. A job script must be created to run the model in batch to have controllable duration and computational resources. This tutorial illustrates how to create Comsol LSF batch scripts on Ada.

All solvers in Comsol can run in parallel in one of three parallel modes: shared memory mode, distributed mode, or hybrid mode. By default, a Comsol solver runs in shared memory mode. This is the same as OpenMP where the parallelism is limited by total number of CPU cores available in one compute node in a cluster.

A Complete Example

Example 1:

 #BSUB -J comsoltest
 #BSUB -n 20 -R "span[ptile=20]"
 #BSUB -M 2800 -R "rusage[mem=2700]"
 #BSUB -o output.%J
 #BSUB -L /bin/bash
 #BSUB -W 2:00
 
 module load Comsol/xxx 
 comsol -np 20 batch -inputfile in.mph -outputfile out.mph

Note: xxx represents a Comsol version. You need to pick the version you need to access.

Running Comsol in Different Parallel Mode

Assuming other things are the same as in Example 1, we will see additional examples running in different parallel modes by changing number of cores and Comsol command line parameters.

Shared Memory Mode

Example 2: Solving a model in shared mode and using all 20 cores in one cluster node. This is the same as example one.

 #BSUB -n 20
 comsol -np 20 batch -inputfile input.mph -outputfile output.mph

Distributed Mode

Comsol solvers can also run in distributed mode by checking the "distributed computing" checkbox of the solver when building the model. In this mode, the solver runs on multiple nodes and uses MPI for communication. Except PARDISO, all solvers support distributed mode. However, PARDISO also has a check box for distributed computing. If selected, the actual solver used is MUMPS.

Example 3: Solving a model in distributed mode on two cluster nodes with a total of 40 cores

 #BSUB -n 40
 comsol -simplecluster -inputfile input.mph -outputfile output.mph

This is the same as:

 #BSUB -n 40
 cat $LSB_DJOB_HOSTFILE  > nodefile.$JOB_ID
 comsol -f ./hostfile.$JOB_ID -nn 40 batch -inputfile input.mph -outputfile output.mph

Hybrid Mode

Either mode has its pros and cons. Shared mode utilizes CPU cores better than distributed mode but can only run in one node, while distributed mode can utilize more than one node. It is usually best to run a solver in a way to take advantage of both modes. This can be done easily at the command line through fine tuning of the options -nn, -nnhost, -np.

Example 4: Solving a model in hybrid mode on 2 cluster nodes with 40 cores. In this example, Comsol will spawn 2 MPI tasks in total (one on each cluster node). Each MPI task will be running on 20 cores.

 #BSUB -n 40
 cat $LSB_DJOB_HOSTFILE |uniq  > nodefile.$JOB_ID
 comsol batch -f ./hostfile.$JOB_ID -nn 2 -nnhost 1 -np 20 -inputfile input.mph -outputfile output.mph

Example 5: Solving a model in hybrid mode on 2 cluster nodes with 40 cores. In this example, Comsol will spawn 4 MPI tasks in total (one on each cluster node). Each MPI task will be running on 10 cores.

 #BSUB -n 40
 cat $LSB_DJOB_HOSTFILE |uniq  > nodefile.$JOB_ID
 comsol batch -f ./hostfile.$JOB_ID -nn 4 -nnhost 2 -np 10 -inputfile input.mph -outputfile output.mph

Parameter Sweep

Comsol models configured with parameter sweep can also benefit from parallel computing in different ways. A model configured with parameter sweep needs to run under a range of parameters or combinations of parameters, and each set of parameters can be calculated independently. Once a model with parameter sweep node created in Comsol GUI, it must be also configured with cluster sweep to distribute the parameters to be processes in parallel.

Example 6: Solving a parameter sweep model on 40 cores. In this example, 10 combinations of parameters will be running in concurrently on two cluster node with 5 combinations of parameters on each cluster node. Each combination of parameters will be running on 4 cores.

 #BSUB -n 200
 cat $LSB_DJOB_HOSTFILE |uniq  > nodefile.$JOB_ID
 comsol -f  ./hostfile.$JOB_ID -nn 10 -nnhost 5 -np 4 -inputfile input.mph -outputfile output.mph

If each combination of parameters requires large amount of memory to solve, then we can specify one combination of parameters per node such that the entire memory on the node will be used for solving one combination of parameters.

Example 7: Solving a parameter sweep model on 200 cores with each parameter combination taking an entire cluster node.

 #BSUB -n 200
 cat $LSB_DJOB_HOSTFILE |uniq  > nodefile.$JOB_ID
 comsol -f ./hostfile.$JOB_ID -nn 10 -nnhost 1 -np 20 -inputfile input.mph -outputfile output.mph


Common problems

1. Disk quota exceeded in home directory.

By default, comsol stores all temparary files in your home directory. For large models, are likely to get "Disk quota exceeded" error due to huge amount of temporary files dumped into y our home directory. To resolve this issue, you need to redirect temporary files to your scratch directory.

  comsole -tmpdir /scratch/user/username/cosmol/tmp -recoverydir /scratch/user/username/comsol/recovery ...