VASP 5.3.3 band parallelization and Call to ZHEGV failed.

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
gevolo
Newbie
Newbie
Posts: 11
Joined: Tue May 05, 2009 3:57 pm

VASP 5.3.3 band parallelization and Call to ZHEGV failed.

#1 Post by gevolo » Fri Dec 06, 2013 7:09 pm

Dear all,

I am trying to compile Vasp 5.3.3 using intel's composer XE 2013 SP1 (on CentOS 6.4). After some modifications on the makefile I successfully created a vasp executable using mkl's SCALAPACK (the makefile is attached below).

I tried to run a test case (simple relaxation of Si) and found out that after one ionic step the run crashed with some warnings "Sub-Space-Matrix is not hermitian in DAV " and at the end "Error EDDDAV: Call to ZHEGV failed. Returncode = ...". At first I though it might be the installation of the LAPACK so I used vasp's LAPACK (LAPACK= ../vasp.5.lib/lapack_double.o) but again got the same error. (IALGO = 48 also doesn't help)

I finally figured out that if I use NPAR=1 (or NCORES=#cores) and thus killing the parallelization over the bands, the run is successful with both executables (with scalapack and without).

Furthermore, I compiled vasp 5.2.2 using the same settings (compiler, mkl etc.) and the benchmark run is successful regardless of the NPAR value.

Has anyone any idea on how to deal with this, and compile (with these settings) a version of VASP5.3.3 that successfully runs with band parallelisation?

Thank you in advance,
GV

This behaviour might be the same with this previous forum post (http://cms.mpi.univie.ac.at/vasp-forum/ ... hp?3.12354)
and also resemble to this post: http://cms.mpi.univie.ac.at/vasp-forum/ ... hp?2.10409.

======
Makefile
======
.SUFFIXES: .inc .f .f90 .F

SUFFIX=.f90

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

CPP_=fpp -f_com=no -free -w0 $*.F $*$(SUFFIX)

FFLAGS = -FR -names lowercase -assume byterecl
OFLAG=-O2 -ip

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

MKL_PATH=$(MKLROOT)/lib/intel64

MKL_FFTW_PATH=$(MKLROOT)/interfaces/fftw3xf/

FC=mpif90 -f90=ifort
FCL=$(FC)

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DNGZhalf \
-DMPI_BLOCK=8000 -Duse_collective -DscaLAPACK

#-----------------------------------------------------------------------
# libraries
#-----------------------------------------------------------------------

MKL = $(MKLROOT)/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/lib/intel64/libmkl_sequential.a -Wl,--end-gro$

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o \
$(MKL)

LINK =

FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o mgrid.o xclib.o vdw_nl.o xclib_grad.o \
radial.o pseudo.o gridq.o ebs.o \
mkpoints.o wave.o wave_mpi.o wave_high.o spinsym.o \
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \
constrmag.o cl_shift.o relativistic.o LDApU.o \
paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o rhfatm.o hyperfine.o paw.o \
mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o pot.o \
dos.o elf.o tet.o tetweight.o hamil_rot.o \
chain.o dyna.o k-proj.o sphpro.o us.o core_rel.o \
aedens.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o hamil_high.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
chgloc.o fast_aug.o fock_multipole.o fock.o mkpoints_change.o sym_grad.o \
mymath.o internals.o npt_dynamics.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
nmr.o pead.o subrot.o subrot_scf.o \
force.o pwlhf.o gw_model.o optreal.o steep.o davidson.o david_inner.o \
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \
linear_optics.o \
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
mlwf.o ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o \
local_field.o ump2.o ump2kpar.o fcidump.o ump2no.o \
bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \
rmm-diis_mlr.o linear_response_NMR.o wannier_interpol.o linear_response.o

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F
$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules have been tested for ifc.11 and ifc.12 only

fft3dlib.o : fft3dlib.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
fftw3d.o : fftw3d.F
$(CPP)
$(FC) -FR -lowercase -O1 $(INCS) -c $*$(SUFFIX)
fftmpi.o : fftmpi.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
fftmpiw.o : fftmpiw.F
$(CPP)
$(FC) -FR -lowercase -O1 $(INCS) -c $*$(SUFFIX)
wave_high.o : wave_high.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
# the following rules are probably no longer required (-O3 seems to work)
wave.o : wave.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
paw.o : paw.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
cl_shift.o : cl_shift.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
us.o : us.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
LDApU.o : LDApU.F
$(CPP)
Last edited by gevolo on Fri Dec 06, 2013 7:09 pm, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2922
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

VASP 5.3.3 band parallelization and Call to ZHEGV failed.

#2 Post by admin » Thu Jan 16, 2014 4:33 pm

please check if the geometry of the first ionic update was reasonable, I rather think that this is the reason for the crash. parallelization over bands is default in vasp.5.3 (NPAR=NCPU), therefore I don't think there is a bug in vasp for this parameter.
Last edited by admin on Thu Jan 16, 2014 4:33 pm, edited 1 time in total.

Post Reply