Installing VASP/5.4.4 for Parallel OpenMPI Cluster

Message

samuel.mathews · #1 Post by **samuel.mathews** » Wed May 29, 2019 4:09 pm

Hello.
My goal is to compiled VASP/5.4.4 on a large cluster where OpenMPI is implemented, with an internal compiler and using Intel MKL. I perform this successfully, however I run into some issues during execution that I have isolated to something in my compilation, so I post this question here.
When I run simulations on a small system, like a water molecule, VASP works perfectly and I receive no errors. However, when I run simulations on a large system containing 60 atoms (as opposed to 3), VASP loads but stops before/during the first SCF step, with the following errors:

Code: Select all

 LDA part: xc-table for Pade appr. of Perdew
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ...
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
srun: error: blg7213: task 115: Segmentation fault (core dumped)
srun: error: blg7142: tasks 58,62,67: Segmentation fault (core dumped)
srun: error: blg7213: task 85: Segmentation fault (core dumped)
srun: error: blg7142: tasks 59,71-72: Segmentation fault (core dumped)
srun: error: blg9253: tasks 120,122,130,132,135,142,146,148,155-156,158: Segmentation fault (core dumped)
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: blg9253: tasks 121,123-124,131,136,139-140,144,151,153-154,157: Segmentation fault (core dumped)
srun: error: blg7213: tasks 80-84,86-114,116-119: Segmentation fault (core dumped)
srun: error: blg7142: tasks 40-57,60-61,63-66,68-70,73-79: Segmentation fault (core dumped)
srun: error: blg7120: tasks 0-39: Segmentation fault (core dumped)
srun: error: blg9253: tasks 125-129,133-134,137-138,141,143,145,147,149-150,152,159: Segmentation fault (core dumped)

srun and blgXXX are specific to the cluster, but the segmentation fault (core dumped) error is the one of interest.
I did some research on the forums and found that increasing my stack size could help, but I confirmed with the support of the cluster providers that my stack size is unlimited.
I receive this error whether I use 1, 100, or any number of cores in between. I also tested various values of the parallelization tags in INCAR to no avail. With respect to slurm, the scheduler, I request to be allocated a large amount of memory to be sure that the stack size never touches the maximum, and I still have these issues.

At the end of this post is the makefile.include, without the GPU section which I do not need.

I welcome suggestions or comments that might help me for this issue.

Thanks,
Sam

Code: Select all

# Precompiler options
CPP_OPTIONS= -DHOST=\"LinuxIFC\"\
             -DMPI -DMPI_BLOCK=8000 \
             -Duse_collective \
             -DscaLAPACK \
             -DCACHE_SIZE=4000 \
             -Davoidalloc \
             -Duse_bse_te \
             -Dtbdyn \
             -Duse_shmem

CPP        = fpp -f_com=no -free -w0  $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)

FC         = mpifort
FCL        = mpifort -mkl=sequential -lstdc++

FREE       = -free -names lowercase

FFLAGS     = -assume byterecl -w
OFLAG      = -O2
OFLAG_IN   = $(OFLAG)
DEBUG      = -O0

MKL_PATH   = $(MKLROOT)/lib/intel64
BLAS       =
LAPACK     =
BLACS      = -lmkl_blacs_intelmpi_lp64
SCALAPACK  = $(MKL_PATH)/libmkl_scalapack_lp64.a $(BLACS)

OBJECTS    = fftmpiw.o fftmpi_map.o fft3dlib.o fftw3d.o

INCS       =-I$(MKLROOT)/include/fftw

LLIBS      = $(SCALAPACK) $(LAPACK) $(BLAS)


OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB    = $(CPP)
FC_LIB     = $(FC)
CC_LIB     = icc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB   = $(FREE)

OBJECTS_LIB= linpack_double.o getshmem.o

# For the parser library
CXX_PARS   = icpc

LIBS       += parser
LLIBS      += -Lparser -lparser -lstdc++

# Normally no need to change this
SRCDIR     = ../../src
BINDIR     = ../../bin

samuel.mathews · #2 Post by **samuel.mathews** » Fri May 31, 2019 6:19 pm

Hello.

After compiling various times, I realize my error lies in BLACS.
What is originally:

Code: Select all

BLACS      = -lmkl_blacs_intelmpi_lp64

Should be replaced with

Code: Select all

BLACS      = -lmkl_blacs_openmpi_lp64

I am performing the timing tests on some of the other tags, but it works correctly.

My Community

Installing VASP/5.4.4 for Parallel OpenMPI Cluster

Installing VASP/5.4.4 for Parallel OpenMPI Cluster

Re: Installing VASP/5.4.4 for Parallel OpenMPI Cluster