Some tests fail with Intel oneAPI

Problems running VASP: crashes, internal errors, "wrong" results.

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
hmenke
Newbie
Newbie
Posts: 2
Joined: Tue Apr 25, 2023 2:17 pm

Some tests fail with Intel oneAPI

#1 Post by hmenke » Tue May 16, 2023 8:42 am

When building VASP with Intel oneAPI (icc/icpc/ifort 2021.9.0, mpi 2021.9.0, mkl 2023.1.0) some of the tests fail:

Code: Select all

==================================================================
SUMMARY:
==================================================================
The following tests failed, please check the output file manually:
bulk_InP_SOC_G0W0_sym bulk_InP_SOC_G0W0_sym_RPR bulk_SiO2_LOPTICS bulk_SiO2_LOPTICS_RPR bulk_SiO2_LPEAD bulk_SiO2_LPEAD_RPR SiC8_GW0R Tl_x Tl_x_RPR Tl_y Tl_y_RPR Tl_z Tl_z_RPR
Checking testsuite/testsuite.log all of them show the same pattern:

Code: Select all

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 3694556 RUNNING AT mpet-joker
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 3694557 RUNNING AT mpet-joker
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 2 PID 3694558 RUNNING AT mpet-joker
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 3 PID 3694559 RUNNING AT mpet-joker
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
This is the makefile.include. Note that I am already building with debugging information but I don't know how to run individual tests interactively.

Code: Select all

# Default precompiler options                                                                                                                                                                                                                
CPP_OPTIONS = -DHOST=\"LinuxIFC\" \                                                                                                                                                                                                          
              -DMPI -DMPI_BLOCK=8000 -Duse_collective \                                                                                                                                                                                      
              -DscaLAPACK \                                                                                                                                                                                                                  
              -DCACHE_SIZE=4000 \                                                                                                                                                                                                            
              -Davoidalloc \                                                                                                                                                                                                                 
              -Dvasp6 \                                                                                                                                                                                                                      
              -Duse_bse_te \                                                                                                                                                                                                                 
              -Dtbdyn \                                                                                                                                                                                                                      
              -Dfock_dblbuf                                                                                                                                                                                                                  
                                                                                                                                                                                                                                             
CPP         = fpp -f_com=no -free -w0  $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)                                                                                                                                                                
                                                                                                                                                                                                                                             
FC          = mpiifort                                                                                                                                                                                                                       
FCL         = mpiifort                                                                                                                                                                                                                       
                                                                                                                                                                                                                                             
FREE        = -free -names lowercase                                                                                                                                                                                                         
                                                                                                                                                                                                                                             
FFLAGS      = -assume byterecl -w -g3                                                                                                                                                                                                        
                                                                                                                                                                                                                                             
OFLAG       = -O2                                                                                                                                                                                                                            
OFLAG_IN    = $(OFLAG)                                                                                                                                                                                                                       
DEBUG       = -O0                                                                                                                                                                                                                            
                                                                                                                                                                                                                                             
OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o                                                                                                                                                                                     
OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = $(FC)
CC_LIB      = icc
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = icpc
LLIBS       = -lstdc++

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##

# When compiling on the target machine itself, change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -xHOST
FFLAGS     += $(VASP_TARGET_CPU)
 
# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
# (Note: for Intel Parallel Studio's MKL use -mkl instead of -qmkl)
FCL        += -qmkl=sequential
MKLROOT    ?= /path/to/your/mkl/installation
LLIBS      += -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
INCS        =-I$(MKLROOT)/include/fftw

# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT  ?= ${HOME}/Code/VASP/hdf5/install
LLIBS      += -L$(HDF5_ROOT)/lib -l:libhdf5_fortran.a -l:libhdf5.a -lz
INCS       += -I$(HDF5_ROOT)/include

# For the VASP-2-Wannier90 interface (optional)
CPP_OPTIONS    += -DVASP2WANNIER90
WANNIER90_ROOT ?= ${HOME}/Code/VASP/wannier90/
LLIBS          += -L$(WANNIER90_ROOT) -l:libwannier.a

# For the fftlib library (hardly any benefit in combination with MKL's FFTs)
#CPP_OPTION += -Dsysv
#FCL         = mpif90 fftlib.o -qmkl
#CXX_FFTLIB  = icpc -qopenmp -std=c++11 -DFFTLIB_USE_MKL -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(MKLROOT)/include/fftw
#LIBS       += fftlib

andreas.singraber
Global Moderator
Global Moderator
Posts: 231
Joined: Mon Apr 26, 2021 7:40 am

Re: Some tests fail with Intel oneAPI

#2 Post by andreas.singraber » Tue May 16, 2023 10:01 am

Hello!

Could you please add which hardware you are using? You can find out by running this command on the machine you are using for compilation:

Code: Select all

lscpu
Also, please add which VASP version you are using!

Thank you!

Best,
Andreas Singraber

hmenke
Newbie
Newbie
Posts: 2
Joined: Tue Apr 25, 2023 2:17 pm

Re: Some tests fail with Intel oneAPI

#3 Post by hmenke » Tue May 16, 2023 10:17 am

The VASP version is 6.4.1, output of lscpu is further below.

In the meantime I realized that I can run single tests using VASP_TESTSUITE_TESTS=bulk_InP_SOC_G0W0_sym make test. In the resulting log I found this

Code: Select all

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
libpthread-2.31.s  00007FCB33125420  Unknown               Unknown  Unknown
vasp_ncl           0000000000542894  Unknown               Unknown  Unknown
vasp_ncl           000000000098289D  Unknown               Unknown  Unknown
vasp_ncl           000000000113C0F3  Unknown               Unknown  Unknown
vasp_ncl           00000000013EB49E  Unknown               Unknown  Unknown
vasp_ncl           0000000001E96BF9  Unknown               Unknown  Unknown
vasp_ncl           0000000001E6D861  Unknown               Unknown  Unknown
vasp_ncl           000000000040A93D  Unknown               Unknown  Unknown
libc-2.31.so       00007FCB32DF4083  __libc_start_main     Unknown  Unknown
vasp_ncl           000000000040A85E  Unknown               Unknown  Unknown
Following up on this error message led me to this page related to the segfaults in the Intel documentation https://www.intel.com/content/www/us/en ... rrors.html which suggests strongly that this is something related to stack space exhaustion.

---

Code: Select all

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          36
On-line CPU(s) list:             0-35
Thread(s) per core:              2
Core(s) per socket:              18
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) W-2295 CPU @ 3.00GHz
Stepping:                        7
CPU MHz:                         3719.067
CPU max MHz:                     4800.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        6000.00
Virtualization:                  VT-x
L1d cache:                       576 KiB
L1i cache:                       576 KiB
L2 cache:                        18 MiB
L3 cache:                        24.8 MiB
NUMA node0 CPU(s):               0-35
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:          Mitigation; Enhanced IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; TSX disabled
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xt
                                 opology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra
                                 nd lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep 
                                 bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm
                                  ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512_vnni md_clear flush_l1d arch_capabilities

andreas.singraber
Global Moderator
Global Moderator
Posts: 231
Joined: Mon Apr 26, 2021 7:40 am

Re: Some tests fail with Intel oneAPI

#4 Post by andreas.singraber » Fri May 19, 2023 12:56 pm

Hello!

I was not able to reproduce the segmentation faults, all tests you mentioned did pass successfully. However, I have two potential solutions which you can try:

(1) Check the stack size limit with

Code: Select all

ulimit -s
If this returns a number, e.g. 8192, then please increase it with

Code: Select all

ulimit -s unlimited
Note: This change is not permanent, if you log out you'll need to issue the command again. To set it permanently add it e.g. to your ~/.bashrc file.
Try if this fixes the broken tests. If not, proceed with

(2) Avoid AVX512 instructions by replacing

Code: Select all

-xHOST
in your makefile.include with

Code: Select all

-march=core-avx2
We had problems with AVX512 before, maybe this is the case here as well.

I hope this helps fixing the issue, please report back if any of the options did work for you!

Best,
Andreas Singraber

Post Reply