Performance loss due to context switching

Problems running VASP: crashes, internal errors, "wrong" results.

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
jun
Newbie
Newbie
Posts: 2
Joined: Fri Jun 24, 2016 4:08 am
License Nr.: 5-2411

Performance loss due to context switching

#1 Post by jun » Wed Aug 10, 2016 3:50 am

Hi all,

VASP sometime just gets much slower randomly on our clusters. By looking at the OUTCAR timing info I noticed that for the slow jobs Voluntary context switches are so high. At first I suspected that it might relate to MKL routines spawning too many threads, so I compiled VASP again with sequential MKL and I also tried explicitly export MKL_NUM_THREAD=1 to see if it could be better. However, the oversubscription still persist. I don't know whether this is because of my compilation or the setting of our clusters.

Here is the makefile.include I used:
# Precompiler options
CPP_OPTIONS= -DMPI -DHOST=\"IFC91_ompi_phoenix\" -DIFC \
-DCACHE_SIZE=16000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=8000 -Duse_collective \
-DnoAugXCmeta -Duse_bse_te \
-Duse_shmem -Dtbdyn

CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)

# Changed to libmkl_sequential.a
FC = mpifort -I${MKLROOT}/include
FCL = mpifort -mkl=sequential -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a \
${MKLROOT}/lib/intel64/libmkl_core.a \
${MKLROOT}/lib/intel64/libmkl_sequential.a -Wl,--end-group

FREE = -free -names lowercase

FFLAGS = -assume byterecl -heap-arrays 64
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0

MKL_PATH = $(MKLROOT)/lib/intel64
BLAS =
LAPACK =
BLACS =
SCALAPACK =

OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o \
/home/a1692208/vasp5.4/fftw3xf/libfftw3xf_intel.a
INCS =-I$(MKLROOT)/include/fftw

LLIBS = $(SCALAPACK) $(LAPACK) $(BLAS) -lpthread -lm -ldl

OBJECTS_O1 += fft3dfurth.o fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = icc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)

OBJECTS_LIB= linpack_double.o getshmem.o

# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin
Our clusters run SLURM I think computational resources are assigned automatically and shouldn't be any problem. Am I right? Does anyone have experience avoiding oversubscribe?

Thanks in advance.

Jun.

Post Reply