Page 1 of 1

Vasp 6.1.1 fails SiC_HSE tests

Posted: Fri Sep 04, 2020 5:46 pm
by rkingsbury
Hello, I have comipiled VASP 6.1.0 and VASP 6.1.1 on the NERSC CORI cluster using their official toolchain. The resulting binaries pass all tests in the "fast" test suite except for the four SiC_HSE tests. Has anyone else experienced a similar failure with VASP6?

The tests were run with

Code: Select all

OMP_NUM_THREADS=1
on a single node, with either 4 or 8 MPI ranks (

Code: Select all

srun -n 8 -c 32
or

Code: Select all

srun -n 4 -c 32
).

The test output is

Code: Select all

==================================================================
SUMMARY:
==================================================================
The following tests failed, please check the output file manually:
SiC_HSE06_ALGO=A SiC_HSE06_ALGO=A_RPR SiC_HSE06_ALGO=D SiC_HSE06_ALGO=D_RPR 
an example result from one of the failing tests is

Code: Select all

RK
Ryan Kingsbury
Additional comments•2020-09-03 15:16:37
The NERSC-provided vasp binary (6.1.0-knl) fails the SiC_HSE06_* tests (4 tests total) in the built-in VASP testsuite. I encountered this problem because I am compiling a patched version of VASP 6.1 and wish to validate it using the built-in test suite. I understand that the VASP-provided test suite is designed to be run with 1, 2, 4 or 8 MPI ranks and have run the test suite on a single KNL node with the following srun commands:

OMP_NUM_THREADS=1
VASP_TESTSUITE_EXE_STD='srun -N 1 -n 8 -c 32 --cpu_bind=cores vasp_std`
and
VASP_TESTSUITE_EXE_STD='srun -N 1 -n 4 -c 32 --cpu_bind=cores vasp_std'

The exact test output was

==================================================================
SUMMARY:
==================================================================
The following tests failed, please check the output file manually:
SiC_HSE06_ALGO=A SiC_HSE06_ALGO=A_RPR SiC_HSE06_ALGO=D SiC_HSE06_ALGO=D_RPR

and an example output from a single failing test is

exiting run_recipe SiC_HSE06_ALGO=A
ERROR: the frequencies are different, please check
--------------------------------------------------
f= 820.46 cm-1
f= 820.46 cm-1
f= 820.46 cm-1
---------------------------------------------------------------------------
Comparing files: freq and freq.ref
3 number(s) differ.
Max diff.: 0.389999999999986
(at row number: 1 column number: 2 )
Tolerance: 0.250000000000000
---------------------------------------------------------------------------
ERROR: the test yields different results for the energies, please check
-----------------------------------------------------------------------
-17.68675223
-17.68675223
-17.68442408
-17.68442408
-17.68443044
-17.68443044
-17.68443016
-17.68443016
-17.68442175
-17.68442175
---------------------------------------------------------------------------
Comparing files: energy_outcar and energy_outcar.ref
10 number(s) differ.
#!/bin/bash
Max diff.: 0.124483999999999
(at row number: 7 column number: 1 )
Tolerance: 5.000000000000000E-004

Re: Vasp 6.1.1 fails SiC_HSE tests

Posted: Mon Sep 07, 2020 12:36 pm
by merzuk.kaltak
Most probably these tests fail, because you are running the testsuite with more than 8 MPI (in total).
We have successfully run the testsuite (on 1, 2, 4, 6, and 8 MPI-ranks) using executables built with various toolchains as mentioned here.
Do you have failed tests when running with 1, 2, 4, 6 or MPI ranks?

Re: Vasp 6.1.1 fails SiC_HSE tests

Posted: Wed Sep 09, 2020 2:57 am
by rkingsbury
Thank you for your reply. I get these failures using either 4 or 8 MPI ranks (via

Code: Select all

srun -N 1 -n 8 -c 32
, for example)

Re: Vasp 6.1.1 fails SiC_HSE tests

Posted: Wed Sep 09, 2020 7:30 am
by merzuk.kaltak
Please attach ./testsuite/testsuite.log (or the stdout) including your makefile.include file as a zip file. Also, which compiler suite and libraries do you use?

Re: Vasp 6.1.1 fails SiC_HSE tests

Posted: Wed Sep 09, 2020 5:53 pm
by rkingsbury
Please see attached archive. In this instance I ran only the `SiC_HSE06_ALGO=A_RPR` test. I or a colleague will reply later on with the compiler details.
2020-09-09.zip

Re: Vasp 6.1.1 fails SiC_HSE tests

Posted: Wed Sep 09, 2020 6:17 pm
by rkingsbury
Regarding compiler details, we are on Xeon Phi (Knight's Landing) CPUs (https://docs.nersc.gov/systems/cori/#knl-compute-nodes) and use the ifort compiler, version 19.0.3.199 20190206.

Re: Vasp 6.1.1 fails SiC_HSE tests

Posted: Wed Sep 23, 2020 7:28 pm
by rkingsbury
Hello, can you offer any further guidance for troubleshooting this failure? Please note that I submitted our makefile.include to the moderator. Thank you!

Re: Vasp 6.1.1 fails SiC_HSE tests

Posted: Wed Sep 30, 2020 11:59 am
by merzuk.kaltak
Could you run this specific SiC_HSE test on 2, 4 and 6 MPI ranks and post the testsuite.log and the OUTCAR (located in ./testsuite/tests/SiC_HSE_RPR/) of these runs.
Also, I would recommend to compile VASP with the same compiler suite for an alternative hardware, preferably an ordinary Xeon (not Phi), and run this HSE test only.
To run a specific test you have to put these lines into your batch script

Code: Select all

export VASP_TESTSUITE_TESTS="SiC_HSE SiC_HSE_RPR"
Also, it could be helpful to reduce compiler optimization. From your post I have extracted following makefile.include (which I think is not the complete one):

Code: Select all

CPP_OPTIONS= -DHOST=\"VASP6.1.1-r2SCAN-MP\"\
             -DMPI -DMPI_BLOCK=8000 -Duse_collective \
             -DscaLAPACK \
             -DCACHE_SIZE=4000 \
             -Davoidalloc \
             -Dvasp6 \
             -Duse_bse_te \
             -Dtbdyn \
             -Dfock_dblbuf \
             -D_OPENMP \
             -Duse_shmem \
             -Dshmem_bcast_buffer \
             -Dshmem_rproj \
             -Dmemalign64 \
             -D_OPENMP45 -DSIMD512 \
             -DIntelKNL \
             -DVASP2WANNIER90v2 \
             -Dlibbeef \
             -DPROFILING
CPP        = fpp -f_com=no -free -w0  $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)
FC         = ftn -qopenmp
FCL        = ftn -qopenmp #-mkl
FREE       = -free -names lowercase
For instance, I would recompile vasp (after a make veryclean) without the

Code: Select all

-DSIMD512
option and run the test again.

Re: Vasp 6.1.1 fails SiC_HSE tests

Posted: Wed Oct 14, 2020 10:11 pm
by rkingsbury
Thank you for the advice. Recompiling without the

Code: Select all

-DSIMD512
option allowed the binary to pass all tests in the 'fast' test suite with 8 MPI ranks.

Can you elaborate on how the absence of this flag is expected to affect performance, or why it would be related to the HSE test failures?