Page 1 of 1

Error with mpirun VASP 6.3.2 with OpenACC+OpenMP

Posted: Fri Sep 30, 2022 4:10 am
by richie_fong
Hi! I have compiled VASP 6.3.2 with OpenACC + OpenMP using the makefile.include.nvhpc_ompi_mkl_omp_acc shown below, and ran the 'make test' successfully. However, when I tried to run my job script shown below on a HPC allocation of 4 MPI ranks (1 rank per GPU) + 12 OpenMP threads per rank on a node with 48 core (24 cores per socket) AMD Milan 7413 + 4x Nvidia A100, it showed an error message indicated below. The mpirun command I used is as shown below. Hope to receive some advice on this issue. Thank you!

Job script

Code: Select all

mpirun -np 4 --map-by ppr:2:socket:PE=12 --bind-to core \
              -x OMP_NUM_THREADS=12 -x OMP_STACKSIZE=512m \
              -x OMP_PLACES=cores -x OMP_PROC_BIND=close \
              --report-bindings vasp_std
Output file

Code: Select all

----------------------------------------------------
    OOO  PPPP  EEEEE N   N M   M PPPP
   O   O P   P E     NN  N MM MM P   P
   O   O PPPP  EEEEE N N N M M M PPPP   -- VERSION
   O   O P     E     N  NN M   M P
    OOO  P     EEEEE N   N M   M P
 ----------------------------------------------------
 running    4 mpi-ranks, with   12 threads/rank
 distrk:  each k-point on    1 cores,    4 groups
 distr:  one band on    1 cores,    1 groups
 OpenACC runtime initialized ...    4 GPUs detected
 vasp.6.3.2 27Jun22 (build Sep 28 2022 21:15:38) complex
 POSCAR found type information on POSCAR LiMnNbO
 POSCAR found :  4 types and      64 ions
 Reading from existing POTCAR
 scaLAPACK will be used selectively (only on CPU)
FATAL ERROR: data in update device clause was not found on device 4: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/std/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 3: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/std/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 1: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/std/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 2: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/std/fock.f90 xc_fock_reader line:567

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9289,1],3]
  Exit code:    1
--------------------------------------------------------------------------
Makefile.include

Code: Select all

# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -D_OPENMP \
              -D_OPENACC \
              -DUSENCCL -DUSENCCLP2P

CPP         = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#       to one that comes with your NVIDIA-HPC SDK
FC          = mpif90 -acc -gpu=cc60,cc70,cc80,cuda11.7 -mp
FCL         = mpif90 -acc -gpu=cc60,cc70,cc80,cuda11.7 -mp -c++libs

FREE        = -Mfree

FFLAGS      = -Mbackslash -Mlarge_arrays

OFLAG       = -fast

DEBUG       = -Mfree -O0 -traceback

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

LLIBS       = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = nvfortran
CC_LIB      = nvc -w
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = nvc++ --no_warnings

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS     += $(VASP_TARGET_CPU)

# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT      =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')

# If the above fails, then NVROOT needs to be set manually
#NVHPC      ?= /home/ljfong/VASP632/nvhpc
#NVVERSION   = 22.7
#NVROOT      = $(NVHPC)/Linux_x86_64/$(NVVERSION)

## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
OFLAG_IN   = -fast -Mwarperf
SOURCE_IN  := nonlr.o

# Software emulation of quadruple precsion (mandatory)
QD         ?= $(NVROOT)/compilers/extras/qd
LLIBS      += -L$(QD)/lib -lqdmod -lqd
INCS       += -I$(QD)/include/qd

# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT    ?= /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2022.1.0
LLIBS_MKL   = -Mmkl -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64
INCS       += -I$(MKLROOT)/include/fftw

# Use a separate scaLAPACK installation (optional but recommended in combination with OpenMPI)
# Comment out the two lines below if you want to use scaLAPACK from MKL instead
#SCALAPACK_ROOT ?= /path/to/your/scalapack/installation
#LLIBS_MKL   = -L$(SCALAPACK_ROOT)/lib -lscalapack -Mmkl

LLIBS      += $(LLIBS_MKL)

# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT  ?= /home/ljfong/VASP632/hdf5
LLIBS      += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS       += -I$(HDF5_ROOT)/include

# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS    += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /home/ljfong/VASP632/wannier/wannier90-3.1.0
#LLIBS          += -L$(WANNIER90_ROOT)/lib -lwannier

# For the fftlib library (hardly any benefit for the OpenACC GPU port, especially in combination with MKL's FFTs)
#CPP_OPTIONS+= -Dsysv
#FCL        += fftlib.o
#CXX_FFTLIB  = nvc++ -mp --no_warnings -std=c++11 -DFFTLIB_USE_MKL -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(MKLROOT)/include/fftw
#LIBS       += fftlib
#LLIBS      += -ldl

Re: Error with mpirun VASP 6.3.2 with OpenACC+OpenMP

Posted: Fri Sep 30, 2022 7:46 am
by martin.schlipf
Did you try your jobscript with one of the tests in the testsuite or on your own input?

If you have not done so already, please try to reproduce the failure on one of the tests in the testsuite. Please check the README of the testsuite (specifically section 2.2) on how to use your MPI options also to run the testsuite. If all the test in the testsuite run successfully even if you use your MPI options, then it would be something triggered by your specific input.

Re: Error with mpirun VASP 6.3.2 with OpenACC+OpenMP

Posted: Fri Sep 30, 2022 8:13 am
by martin.schlipf
One more thing you can try is suppressing the OpenMP parallelization by

Code: Select all

export OMP_NUM_THREADS=1
.

If you cannot reproduce the failure with any test in the testsuite, please provide a complete set of input files, so that we can try to reproduce it locally.

Re: Error with mpirun VASP 6.3.2 with OpenACC+OpenMP

Posted: Fri Sep 30, 2022 3:18 pm
by richie_fong
Thank you for the reply. As suggested, I tried both export OMP_NUM_THREADS=1 and on the testsuite. However, the same issue still occurs. I have attached the testsuite.log.

Code: Select all

Lmod is automatically replacing "intel/2020.1.217" with "nvhpc/22.7".


Lmod is automatically replacing "intel/2020.1.217" with "nvhpc/22.7".

==================================================================
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
VASP TESTSUITE SHA:

Reference files have been generated with 4 MPI ranks.
Note that tests might fail if an other number of ranks is used!

Executables and additional INCAR tags used for this test:

VASP_TESTSUITE_EXE_STD="mpirun -np 4 --map-by ppr:2:socket:PE=12 --bind-to core -x OMP_NUM_THREADS=1 -x OMP_STACKSIZE=512m -x OMP_PLACES=cores -x OMP_PROC_BIND=close /home/ljfong/VASP632/vasp.6.3.2/testsuite/../bin/vasp_std"
VASP_TESTSUITE_EXE_NCL="mpirun -np 4 --map-by ppr:2:socket:PE=12 --bind-to core -x OMP_NUM_THREADS=1 -x OMP_STACKSIZE=512m -x OMP_PLACES=cores -x OMP_PROC_BIND=close /home/ljfong/VASP632/vasp.6.3.2/testsuite/../bin/vasp_ncl"
VASP_TESTSUITE_EXE_GAM="mpirun -np 4 --map-by ppr:2:socket:PE=12 --bind-to core -x OMP_NUM_THREADS=1 -x OMP_STACKSIZE=512m -x OMP_PLACES=cores -x OMP_PROC_BIND=close /home/ljfong/VASP632/vasp.6.3.2/testsuite/../bin/vasp_gam"
VASP_TESTSUITE_INCAR_PREPEND=""
VASP_TESTSUITE_REFERENCE=""
VASP_TESTSUITE_SKIP_HYB=""
VASP_TESTSUITE_SKIP_NCL=""
VASP_TESTSUITE_SKIP_SOC=""
VASP_TESTSUITE_SKIP_MD=""
VASP_TESTSUITE_SKIP_TBMD=""
VASP_TESTSUITE_SKIP_RPA=""
VASP_TESTSUITE_SKIP_GW=""
VASP_TESTSUITE_SKIP_ACFDT=""
VASP_TESTSUITE_SKIP_CRPA=""
VASP_TESTSUITE_SKIP_BSE=""
VASP_TESTSUITE_SKIP_NOSYM=""
VASP_TESTSUITE_SKIP_VASP6=""
VASP_TESTSUITE_SKIP_GAMMA=""
VASP_TESTSUITE_SKIP_VASP45="Y"
VASP_TESTSUITE_SKIP_VASP46=""
VASP_TESTSUITE_SKIP_LREAL=""
VASP_TESTSUITE_SKIP_LRESP=""
VASP_TESTSUITE_SKIP_PEAD=""
VASP_TESTSUITE_SKIP_NCORE1=""
VASP_TESTSUITE_SKIP_WAN90=""
VASP_TESTSUITE_SKIP_KOPT=""
VASP_TESTSUITE_SKIP_ML=""
VASP_TESTSUITE_RUN_HYB=""
VASP_TESTSUITE_RUN_NCL=""
VASP_TESTSUITE_RUN_SOC=""
VASP_TESTSUITE_RUN_MD=""
VASP_TESTSUITE_RUN_TBMD=""
VASP_TESTSUITE_RUN_RPA=""
VASP_TESTSUITE_RUN_GW=""
VASP_TESTSUITE_RUN_ACFDT=""
VASP_TESTSUITE_RUN_CRPA=""
VASP_TESTSUITE_RUN_BSE=""
VASP_TESTSUITE_RUN_NOSYM=""
VASP_TESTSUITE_RUN_VASP6=""
VASP_TESTSUITE_RUN_GAMMA=""
VASP_TESTSUITE_RUN_LREAL=""
VASP_TESTSUITE_RUN_LRESP=""
VASP_TESTSUITE_RUN_PEAD=""
VASP_TESTSUITE_RUN_NCORE1=""
VASP_TESTSUITE_RUN_WAN90=""
VASP_TESTSUITE_RUN_KOPT=""
VASP_TESTSUITE_RUN_ML=""
VASP_TESTSUITE_RUN_FAST=""

Executed at: 10_22_09/30/22
==================================================================

------------------------------------------------------------------

CASE: andersen_nve
------------------------------------------------------------------
CASE: andersen_nve
entering run_recipe andersen_nve
andersen_nve step STD
------------------------------------------------------------------
andersen_nve step STD
entering run_vasp_g
 ----------------------------------------------------
    OOO  PPPP  EEEEE N   N M   M PPPP
   O   O P   P E     NN  N MM MM P   P
   O   O PPPP  EEEEE N N N M M M PPPP   -- VERSION
   O   O P     E     N  NN M   M P
    OOO  P     EEEEE N   N M   M P
 ----------------------------------------------------
 running    4 mpi-ranks, with    1 threads/rank
 distrk:  each k-point on    2 cores,    2 groups
 distr:  one band on    1 cores,    2 groups
 OpenACC runtime initialized ...    4 GPUs detected
 vasp.6.3.2 27Jun22 (build Sep 28 2022 21:15:38) gamma-only                      
 POSCAR found type information on POSCAR C H 
 POSCAR found :  2 types and       8 ions
 Reading from existing POTCAR
 scaLAPACK will be used selectively (only on CPU)
FATAL ERROR: data in update device clause was not found on device 4: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 1: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 2: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 3: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[22846,1],3]
  Exit code:    1
--------------------------------------------------------------------------
exiting run_vasp_g

exiting run_recipe andersen_nve
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the test yields different results for the energies, please check
-----------------------------------------------------------------------
	        -40.43139155
	        -40.43139155
	        -40.16490800
	        -40.41852499
	        -40.41852499
	        -40.22598800
	        -40.39746240
	        -40.39746240
	        -40.26328500
	        -40.38589715
	        -40.38589715
	        -40.20627900
	        -40.37920714
	        -40.37920714
	        -40.20310400
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files energy_outcar and 
 energy_outcar.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------
 
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the test yields different results for the forces, please check
---------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files force and force.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------
 
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the stress tensor is different, please check
---------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files stress and stress.ref
  disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------
 

CASE: andersen_nve_constrain
------------------------------------------------------------------
CASE: andersen_nve_constrain
entering run_recipe andersen_nve_constrain
andersen_nve_constrain step STD
------------------------------------------------------------------
andersen_nve_constrain step STD
entering run_vasp_g
 ----------------------------------------------------
    OOO  PPPP  EEEEE N   N M   M PPPP
   O   O P   P E     NN  N MM MM P   P
   O   O PPPP  EEEEE N N N M M M PPPP   -- VERSION
   O   O P     E     N  NN M   M P
    OOO  P     EEEEE N   N M   M P
 ----------------------------------------------------
 running    4 mpi-ranks, with    1 threads/rank
 distrk:  each k-point on    2 cores,    2 groups
 distr:  one band on    1 cores,    2 groups
 OpenACC runtime initialized ...    4 GPUs detected
 vasp.6.3.2 27Jun22 (build Sep 28 2022 21:15:38) gamma-only                      
 POSCAR found type information on POSCAR C H 
 POSCAR found :  2 types and       8 ions
 Reading from existing POTCAR
 scaLAPACK will be used selectively (only on CPU)
FATAL ERROR: data in update device clause was not found on device 1: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 4: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 2: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 3: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[22640,1],1]
  Exit code:    1
--------------------------------------------------------------------------
exiting run_vasp_g

exiting run_recipe andersen_nve_constrain
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the test yields different results for the energies, please check
-----------------------------------------------------------------------
	        -40.43139157
	        -40.43139157
	        -40.20288100
	        -40.42075479
	        -40.42075479
	        -40.26420400
	        -40.41036500
	        -40.41036500
	        -40.29119500
	        -40.40035358
	        -40.40035358
	        -40.23681400
	        -40.39177577
	        -40.39177577
	        -40.23577400
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files energy_outcar and 
 energy_outcar.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------
 
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the test yields different results for the forces, please check
---------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files force and force.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------
 
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the stress tensor is different, please check
---------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files stress and stress.ref
  disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------
 

CASE: andersen_nve_constrain_fixed
------------------------------------------------------------------
CASE: andersen_nve_constrain_fixed
entering run_recipe andersen_nve_constrain_fixed
andersen_nve_constrain_fixed step STD
------------------------------------------------------------------
andersen_nve_constrain_fixed step STD
entering run_vasp_g
 ----------------------------------------------------
    OOO  PPPP  EEEEE N   N M   M PPPP
   O   O P   P E     NN  N MM MM P   P
   O   O PPPP  EEEEE N N N M M M PPPP   -- VERSION
   O   O P     E     N  NN M   M P
    OOO  P     EEEEE N   N M   M P
 ----------------------------------------------------
 running    4 mpi-ranks, with    1 threads/rank
 distrk:  each k-point on    2 cores,    2 groups
 distr:  one band on    1 cores,    2 groups
 OpenACC runtime initialized ...    4 GPUs detected
 vasp.6.3.2 27Jun22 (build Sep 28 2022 21:15:38) gamma-only                      
 POSCAR found type information on POSCAR C H 
 POSCAR found :  2 types and       8 ions
 Reading from existing POTCAR
 scaLAPACK will be used selectively (only on CPU)
FATAL ERROR: data in update device clause was not found on device 1: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 2: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 4: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

FATAL ERROR: data in update device clause was not found on device 3: name=lexch
 file:/home/ljfong/VASP632/vasp.6.3.2/build/gam/fock.f90 xc_fock_reader line:567

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[21019,1],3]
  Exit code:    1
--------------------------------------------------------------------------
exiting run_vasp_g

exiting run_recipe andersen_nve_constrain_fixed
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the test yields different results for the energies, please check
-----------------------------------------------------------------------
	        -40.43139157
	        -40.43139157
	        -40.23702800
	        -40.41628138
	        -40.41628138
	        -40.24759500
	        -40.39892731
	        -40.39892731
	        -40.23702500
	        -40.37944239
	        -40.37944239
	        -40.20187700
	        -40.35865076
	        -40.35865076
	        -40.19547000
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files energy_outcar and 
 energy_outcar.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------
 
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the test yields different results for the forces, please check
---------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files force and force.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------
 
Warning: ieee_inexact is signaling
FORTRAN STOP
ERROR: the stress tensor is different, please check
---------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files stress and stress.ref
  disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------


Re: Error with mpirun VASP 6.3.2 with OpenACC+OpenMP

Posted: Mon Oct 03, 2022 9:39 am
by martin.schlipf
Hmm, there doesn't seem to be an obvious flaw in your setup. Perhaps you can try to load the modules very carefully to avoid messages like

Code: Select all

Lmod is automatically replacing "intel/2020.1.217" with "nvhpc/22.7".
and make sure that the toolchain during compilation and execution are exactly the same.

Could you provide us with more information regarding you toolchain, i.e., which exact version of compiler, MPI and LAPACK/BLAS are you using?

Re: Error with mpirun VASP 6.3.2 with OpenACC+OpenMP

Posted: Mon Oct 03, 2022 9:53 am
by martin.schlipf
More ideas:

Can you use ldd vasp_std on your VASP executable and check if all the libraries are linked to the paths you expect?

Did you try to build without OpenMP support and does that influence whether you see the error?

Re: Error with mpirun VASP 6.3.2 with OpenACC+OpenMP

Posted: Thu Oct 20, 2022 3:20 pm
by richie_fong
I tried to compile the VASP 6.3.2 openacc without openmp using makefile.include.nvhpc_acc but the same issue occured when using GPU, while the test suite with CPU works fine.

Re: Error with mpirun VASP 6.3.2 with OpenACC+OpenMP

Posted: Fri Oct 21, 2022 1:13 pm
by martin.schlipf
Can you try to get a bit more basic?
module purge
module load nvhpc/22.7
module load fftw/3.3.10
and then try to recompile VASP with the makefile.include.nvhpc_acc with as little modifications as possible:
In particular, please do not use flexiblas and the Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2 part in the file. You can also compile without HDF5 and Wannier90 support for now. You should also not need to explicitly scalapack if you use the -Mscalapack written to the default makefile.include. Finally, please double-check that your path to the fftw is correct and that it is compatible with nvfortran.

If you checked all of that then you do

Code: Select all

make veryclean
make all
to rebuild VASP. If that version still fails the tests, please copy the output of your terminal and attach it starting from the line, where you type module purge.