Hprc banner tamu.png

Difference between revisions of "Ada:Compile:PHI"

From TAMU HPRC
Jump to: navigation, search
(Running Native Code)
 
Line 159: Line 159:
  
 
'''micsmc''' will show system/user utilization per card, memory usage and other properties such as temperature end energy consumption and can be very helpful during debugging and profiling.
 
'''micsmc''' will show system/user utilization per card, memory usage and other properties such as temperature end energy consumption and can be very helpful during debugging and profiling.
 +
 +
 +
[[ Category:Ada ]]

Latest revision as of 15:36, 7 April 2017

PHI Programs

The main advantage of programming for PHI is that programs can be written in standard languages like C, C++, Fortran using common parallelization paradigms like OpenM and MPI. Math-libraries like Intel MKL are also supported. Additionally the PHI coprocessor can also be programmed using Intel Cilk Plus, Intel Threading Building Blocks, and pthreads.

Running code on the PHI coprocessor is done in two different ways: native mode and offload mode. In “native mode” the executable will run on the PHI coprocessor exclusively. In "offload mode" parts of the code will run on the host (i.e. cpu) and some parts (i.e. highly parallel segments of code) will be offloaded to the PHI coprocessor and executed there.


Offload Mode

In “offload mode” the code is instrumented with pragmas (in C/C++) or directives (in Fortran) to inform the compiler the sections of the code to offload to the PHI coprocessor and be executed there at runtime. Currently there are two options for adding offload pragmas

  • OpenMP 4.0 Offload pragmas
  • Intel's Language Extensions for Offload (LEO)


The code in the offload regions can be multi threaded using parallelization paradigms like OpenMP.

Example: Simple c++ program containing (LEO) pragma to offload computation to PHI coprocessor with id 0. Program contains two OpenMP loops; one loop will be executed on host, other loop will will be offloaded to PHI coprocessor.

int main(){
    int max_thread;

// OpenMP region on HOST
#pragma omp parallel for reduction(max:max_thread)
   {
      for(i=0;i<N;i++){
         max_thread =omp_get_thread_num();
      }
   }
   std::cout << max_thread << "\n";

#pragma offload target(mic:0)
   {
// OpenMP region on PHI
#pragma omp parallel for reduction(max:max_thread)
      for(i=0;i<N;i++){
         max_thread=omp_get_thread_num();
      }
   }
   std::cout << max_thread << "\n";
}


Compiling for Offloading

The Intel compiler is able to recognize both LEO and OpenMP4.0 type pragmas. No additional flags are needed to process the LEO pragmas. The compiler automatically creates a fat binary that contains code for the host as well as code that can be executed on the PHI. The table below shows some flags to enable/disable offloading.

FLAG Description
-openmp enable processing of OpenMP 4.0 pragmas, including pragmas for offloading
-no-offload Offloading will be disabled. All code will be executed on the host

Example: To compile the code from the example about, use the following command:

[login8]$ icpc -openmp -o phi_sample.x phi_sample.cpp

Running Code with Offload Sections

By default, all environment variables defined on the host will be copied to the PHI coprocessor during offload. To change this behavior, the environmental variable MIC_ENV_PREFIX can be used. When MIC_ENV_PREFIX is set only the environment variables that start with the value of the MIC_ENV_PREFIX environment variable will be copied to the PHI coprocessor.The environment variables set on the PHI coprocessor have the prefix value removed. This feature is very useful when the host and PHI coprocessor use different values for the same environmental variable.

Example 1: Run sample_phi.x program, use 16 threads on host and 128 threads on PHI

[login8]$ export MIC_ENV_PREFIX=MIC
[login8]$ export OMP_NUM_THREADS=16
[login8]$ export MIC_OMP_NUM_THREADS=128
[login8]$ ./sample_phi.x


The following table shows some of the more common env variables to direct offloading and running on the coprocessor.

Env var Description
MIC_ENV_PREFIX Sets the prefix for Intel Xeon Phi environment variables.
MIC_OMP_NUM_THREADS Sets the number of threads to utilize per Intel Xeon Phi.
MIC_KMP_AFFINITY Sets the thread layout on the Intel Xeon Phi. Options are: balanced,compact,scatter
MIC_LD_LIBRARY_PATH Sets the LD_LIBRARY_PATH value for the Intel Xeon Phi environment.
OFFLOAD_REPORT=[0..3] Prints information about an offload as the execution proceeds on the host and on the target.

Automatic Offloading

Some libraries are capable of offloading computation to a MIC coprocesses. MKL is probably the most well known example of such a library. Any existing application that is linked against MKL may take advantage of automatic offloading with no additional changes. On hosts with a Xeon Phi coprocessor, setting library-specific environment variables such as MKL_MIC_ENABLE will instruct the library to offload functions to the coprocessor. For more information about see the MKL Automatic Offloading page

Native Mode

Compiling PHI for Native Execution

PHI coprocessors don't have compilers installed. Therefore, the program needs to be cross-compiled on the host first. The compiler will produce code targeted for execution on the PHI coprocessor. To enable cross-compilation for PHI coprocessor architecture, use the following flag:

FLAG Description
-mmic Builds an application that runs natively on Intel(R) MIC Architecture.


Using the above flag the compiler will produce code that can be run directly on the PHI coprocessor. The code will NOT run on the host.


Running Native Code

Running code natively on the PHI coprocessor can be done by logging in to the PHI coprocessor and running the code there.

Example 1: Run code directly on PHI

[login8]$ ssh mic0
[login8-mic0]$ ./simple_phi.mic 

Another way to execute is by using the micnativeloadex command.

Example 2: Run native application through host

[login8]$ micnativeloadex simple_phi.mic

In this case the environmental variables from the host will be copied to the PHI coprocessor and the 'MIC_ENV_PREFIX environmental variable can be used to direct what variables to copy.

Monitoring PHI Performance

Intel provides a very useful GUI tool named micsmc to monitor utilization of PHI cards. To start the GUI simple type the following on a node with a PHI card installed:

[login8]$ micsmc 

NOTE: X11 forwarding must be enabled

micsmc will show system/user utilization per card, memory usage and other properties such as temperature end energy consumption and can be very helpful during debugging and profiling.