Compilers

Global

Programming Models Support

The current compiler support for the existing programming models for GPU offload is the following:

Programming Model	GCC	IBM XL C/C++ IBM XL Fortran	PGI Accelerator
CUDA	Supported	Supported	Supported
CUDA Fortran	Supported	Supported	Supported
OpenACC	Experimental	Not Supported	Supported
OpenMP	>= Version 7	>= Versions 13.1.6 / 15.1.6	Not Supported

Invocation

The invocation commands of the compilers are the following:

Compiler	C	C++	Fortran 77	Fortran 90	Fortran 03	Fortran 08
GCC	gcc	g++	gfortran	gfortran	gfortran	gfortran
IBM XL	xlc_r	xlC_r	xlf_r	xlf90_r	xlf2003_r	xlf2008_r
PGI	pgcc	pgCC	pgf90	pgf90	pgf90	pgf90

For IBM XL C/C++ & Fortran, the commands specified above constitute the « thread-safe » variants, to be used when dealing with multithreaded applications. It is recommended to exclusively use these variants, even in the case of non-multithreaded applications, though.

Compiler Options Overview

The table below provides the main equivalences between the options of the different compilers:

Purpose	GCC	IBM XL C/C++ IBM XL Fortran	PGI Accelerator
Architecture	-mcpu=power8	-qarch=pwr8	-m64
Disabled Optimization	-O0	-O0 -qnoopt	-O0
Recommended Optimization	-O3 -mcpu=power8 -funroll-loops	-O3 [-qipa] [-qhot]	-fast -Mipa=fast,inline [-Msmartalloc]
Profile-Guided Optimization (PGO)	-fprofile-generate/-fprofile-use	-qpdf1/-qpdf2	-Mpfi/-Mpfo
Inter-Procedural Optimization	-flto	-qipa	-Mipa
Vectorization	-ftree-vectorize	-qsimd=auto	-Mvect
Automatic Parallelization	-floop-parallelize-all	-qsmp=auto	-Mconcur
OpenMP Support	-fopenmp	-qsmp=omp	-mp
OpenACC Offload	N/A	N/A	-acc -Minfo=accel -ta:tesla:cc70
OpenMP Offload	N/A	-qoffload -qtgtarch=sm_60	N/A
Loop Optimization	-fpeel-loops -funroll-loops	-qhot
Debugging Symbols	-g	-g	-g
Verbose		-V

GPU Device Target Specification

The target device can be explicitely specified through the following options:

Target	IBM XL C/C++ IBM XL Fortran	PGI Accelerator
Tesla K80	-qtgtarch=sm_37	-ta:cc37
Tesla P100	-qtgtarch=sm_60	-ta:cc60
Tesla V100	-qtgtarch=sm_70	-ta:cc70

For IBM XL C/C++ & IBM XL Fortran, the former -qxflag option has been replaced by the -qtgtarch option.

Compiler-Specific

IBM XL C/C++ & IBM XL Fortran

Optimization Levels

The IBM XL C/C++ and IBM XL Fortran compilers allow the following optimization levels:

Options	Purpose
-O0 -qnohot -qnosmp -qstrict	Disable any optimization
-O2	Limited optimization (safe)
-O3 -qstrict	Medium optimization without modification of the code semantics
-O3	Medium optimization with potential modifications of the code semantics
-O4	Agressive optimization + Light Inter-Procedural Analysis (IPA)
-O5	Equivalent to -O4 level with deeper IPA

Increasing the optimization level tends to lead to a potentially significant increase in compilation time. It is therefore recommended to keep limited optimization levels while porting an application onto the platform.

For a given optimization level, the IBM XL compilers automatically introduce additional options by default. These options can be in turn complemented with other additional options. The table below details the main possible combinations:

Optimization Level	Additional Options Introduced By Default	Additional Options	Other Options with Potential Benefits
-O0	-qsimd=auto	-qarch=pwr9
-O2	-qmaxmem=8192	-qarch=pwr9	-qmaxmem=-1
	-qsimd=auto	-qtune=pwr9	-qhot=level=0
	-qstrict=all
-O3	-qhot=noarraypad:level=0:novector:fastmath	-qarch=pwr9
	-qnostrict	-qtune=pwr9
	-qmaxmem=-1
	-qsimd=auto
-O4	-qarch=auto	-qarch=pwr9	-qsmp=auto
	-qcache=auto	-qcache
	-qhot=noarraypad:level=1:vector:fastmath	-qtune=pwr9
	-qipa
	-qmaxmem=-1
	-qnostrict
	-qsimd=auto
	-qtune=auto
-O5	-qarch=auto	-qarch=pwr9	-qsmp=auto
	-qcache=auto	-qcache
	-qhot=noarraypad:level=1:vector:fastmath	-qtune=pwr9
	-qipa=level=2
	-qmaxmem=-1
	-qnostrict
	-qsimd=auto
	-qtune=auto

Other Compiler Options

Floating-Point and Integer Control

Option	Purpose
-qautodbl={dbl4,dbl8,dbl}	Convert single-precision floating-point calculations to double-precision and double-precision calculations to extended-precision
-qdpc	Promote basic real constants as double-precision constants
-qintsize={2,4*,8}	Set default size of INTEGER / LOGICAL data entities
-qrealsize={4*,8}	Set default size of REAL / DOUBLE PRECISION / (DOUBLE) COMPLEX values

Object Code Control

Option	Purpose
-qsaveopt	Save compilation-related information to the corresponding object file

Note: Compilation-related information stored thanks to the -qsaveopt option can be read through the strings --all command:

$ strings -a <binary> | grep '^@(#)opt'
@(#)opt c /opt/ibm/xlC/13.1.6/bin/.orig/xlc_r -F/opt/ibm/xlC/13.1.6/etc/xlc.cfg.rhel.7.3.gcc.4.8.5.cuda.9.0 -DHAVE_CONFIG_H -I. -I/gpfs/shared/fftw -q64 -qfullpath -qsaveopt -O3 -qarch=pwr8 -qtune=pwr8 fftwf77.lo -MD -MP -MF .deps/fftwf77.Tpo -c fftwf77.c

PGI Accelerator

Default Compiler Options

A user-specific set of default compiler options can be defined through the following procedure:

Create Following File inside Home Directory:
```
~/.mypgirc
```
Populate File with Acceptable Directives:
- Default Compute Capability:
```
set DEFCOMPUTECAP=60;
```
- Default CUDA Version:
```
set DEFCUDAVERSION=10.1;
```

Note: These options would override any system-wide options defined by system administrator.

`-fast` Compilation Flag configuration

The -fast compilation flag is a shortcut for a consistent set of predefined optimization options.

Its current signification can be made explicit through the following command:

$ pgcc -fast -help
Reading rcfile /opt/pgi/linuxpower/20.02/bin/.pgccrc
-fast               Common optimizations; includes -O2 -Munroll=c:1 -Mlre -Mautoinline
                    == -Mvect=simd
-M[no]vect[=[no]simd|[no]assoc]
                    Control automatic vector pipelining
    [no]simd        Generate [don't generate] SIMD instructions
    [no]assoc       Allow [disallow] reassociation

Compilers

Compilers

Global

Programming Models Support

Invocation

Compiler Options Overview

GPU Device Target Specification

Compiler-Specific

IBM XL C/C++ & IBM XL Fortran

Optimization Levels

Other Compiler Options

PGI Accelerator

Default Compiler Options

`-fast` Compilation Flag configuration

results matching ""

No results matching ""

Compilers

Global

Programming Models Support

Invocation

Compiler Options Overview

GPU Device Target Specification

Compiler-Specific

IBM XL C/C++ & IBM XL Fortran

Optimization Levels

Other Compiler Options

PGI Accelerator

Default Compiler Options

-fast Compilation Flag configuration

results matching ""

No results matching ""

`-fast` Compilation Flag configuration