Page 1 of 1

VASP- GPU fails to converge

Posted: Thu Jun 16, 2022 5:25 am
by scanmat_centre
We are using VASP- GPU for hybrid calculations and we are getting error as follows

Device Memory Info:
Total: 16276.2 MB
Free: 1.2 MB
Used: 16275.0 MB
Requested: 1.9 MB

CUDA Error in cuda_mem.cu, line 179: out of memory
Failed to allocate device memory!

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 55048 RUNNING AT scanmatdgx1
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

[/color]


But we have enough space in the device.
How to proceed further?

Re: VASP- GPU fails to converge

Posted: Fri Jun 17, 2022 6:56 am
by martin.schlipf
Can you provide a bit more information about how you run these calculations? Did you try smaller systems successfully and are now running this larger calculation that fails, or do you get the same error for any case you use?
In the former case, how do you know that you have enough memory? What specifically did you compare against what?
In the latter case, could you provide the input files for the calculations you run?
Either way, can you also tell me which version of VASP you are using and whether you use the deprecated CUDA port or the OpenACC version?

Re: VASP- GPU fails to converge

Posted: Sat Jun 25, 2022 6:23 am
by scanmat_centre
Yes. for smaller systems, it ran successfully. for supercells only, it is failing.
I am using VASP 5.4.1
and my input files are as follows


INCAR

System = z

!Star Parameters for this run:

ISTART = 1 !0 Start job: 1 restart constant energy cut-off 2 restart constant basis set
PREC = Accurate
LWAVE= .TRUE.
LREAL = TRUE !
!!Electronic relaxation :
EDIFF = 1E-6 ! accuracy required 1E-6
NELMIN = 5 !no of ELM steps !
LORBIT = 11
!!Ionic relaxation:
ENCUT = 400
ISMEAR = 0
SIGMA = 0.01
EDIFFG= -0.01
#GGA = PE
LHFCALC = .TRUE.
HFSCREEN = 0.2
PRECREEN = Fast
AEXX = 0.25
ALGO = All
LVDW= TRUE
IVDW = 1
NBANDS= 100




script


#!/bin/bash
#SBATCH --job-name=12.5Sbnd
#SBATCH --output=slurm-%j.out
#SBATCH --error=slurm-%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=14
#SBATCH --distribution=cyclic:cyclic
#SBATCH --time=420:00:00
#SBATCH --mem-per-cpu=8000
##SBATCH --mail-type=END,FAIL
##SBATCH --mail-user=email@ufl.edu
#SBATCH --partition=debug
#SBATCH --gres=gpu:1
date;hostname;pwd


ulimit -s unlimited
ulimit -l unlimited
ulimit -m unlimited

pwd; hostname; date |tee result
# Setting some variables

module load vasp
module load CUDA/9.0

#for i in 15 16; do
# echo "n = $i"

WORK=$SLURM_SUBMIT_DIR

echo $WORK

# making scratch directory
SCRATCH=/home/${USER}/example/${SLURM_JOBID}
echo ${SCRATCH}
mkdir -p $SCRATCH/test
RUN=$SCRATCH/test

# Goto run dir
cd $RUN

# Copy inpufiles to common scratch
cp $WORK/INCAR_bnd $RUN/INCAR
cp $WORK/CONTCAR-opt $RUN/POSCAR
cp $WORK/POTCAR $RUN
cp $WORK/IBZKPT-bnd $RUN/KPOINTS
#cp $WORK/WAVECAR $RUN/WAVECAR
#cp $WORK/CHGCAR $RUN/CHGCAR
ls -ltr

mpirun vasp_gpu
#mpirun vasp_std

cp OUTCAR $WORK/OUTCAR-band
cp CONTCAR $WORK/CONTCAR-band
cp DOSCAR $WORK/DOSCAR
cp PROCAR $WORK/PROCAR
cp EIGENVAL $WORK/EIGENVAL
cp vasprun.xml $WORK/vasprunband.xml

cd $WORK
rm -rf $RUN

#done

Re: VASP- GPU fails to converge

Posted: Mon Jun 27, 2022 9:36 am
by martin.schlipf
I'm still not sure how you judge that you have enough memory. It seems that you would like to do band structure calculations, in this case you can reduce the memory demand by splitting the calculation into multiple subparts or by using less points per line.
Unfortunately, I cannot provide more specific advice for your system, because the old Cuda port is not maintained anymore. If you can reproduce this behavior with the OpenACC version, we would need to look into it more carefully.

Re: VASP- GPU fails to converge

Posted: Mon Jun 27, 2022 2:07 pm
by scanmat_centre
The memory I am talking about is the memory possessed by the deice- I mean the supercomputer in which we are running the calculation.
Am I wrong in assuming memory?
Or is there any other measure I have to consider?

Re: VASP- GPU fails to converge

Posted: Mon Jun 27, 2022 3:33 pm
by martin.schlipf
Well there are two parts to the comparison, the memory available on the device and the memory that VASP needs to perform the calculation.
In particular for band structure calculations, the memory requirement can be quite a bit larger than for the self-consistency calculation, because the number of k-points is often larger.

Then again, I don't know how efficient the hybrid functional in the old Cuda port was. This part was worked on a lot in the OpenACC port to enhance the performance on one or more GPUs.

Re: VASP- GPU fails to converge

Posted: Tue Jun 28, 2022 4:53 am
by scanmat_centre
Is there any way I can modify the memory requirement for vasp to perform the calculation?
Anything I have to do with the script.. ?

The error comes like this,

Device Memory Info:
Total: 16276.2 MB
Free: 1.2 MB
Used: 16275.0 MB
Requested: 1.9 MB

Re: VASP- GPU fails to converge

Posted: Tue Jun 28, 2022 6:26 am
by martin.schlipf
scanmat_centre wrote: Tue Jun 28, 2022 4:53 am Is there any way I can modify the memory requirement for vasp to perform the calculation?
Anything I have to do with the script.. ?
Smaller energy cutoffs, less k-points, prec = normal

Of course you need to test whether this affects your results.

Re: VASP- GPU fails to converge

Posted: Wed Jun 29, 2022 5:10 am
by scanmat_centre
I will check with them.

Re: VASP- GPU fails to converge

Posted: Wed Jul 06, 2022 8:13 am
by scanmat_centre
Thank you, its working now.
I have reduced ENCUT, and changed Precision to Normal from Accurate.