GPU Job Fails Unless INCAR is Symlinked from NFS Instead of Lustre

Message

Zhiyuan Yin · #1 Post by **Zhiyuan Yin** » Thu Jul 17, 2025 7:30 am

Hi,

I encountered a reproducible issue running VASP 6.4.2 on an HPC system using NVIDIA V100 32GB SXM2 GPUs and Lustre-backed project directories. The job setup is standard, using gamma-point VASP compiled with the NVHPC toolkit.
When INCAR is located directly in the Lustre filesystem (i.e., inside the job’s working directory), VASP fails with a CUDA out-of-memory error during initialization (right before entering the main loop):

running 2 mpi-ranks, with 1 threads/rank, on 1 nodes
distrk: each k-point on 2 cores, 1 groups
distr: one band on 1 cores, 2 groups
OpenACC runtime initialized ... 2 GPUs detected
vasp.6.4.2 20Jul23 (build Nov 18 2024 12:20:25) gamma-only
POSCAR found type information on POSCAR Ag
POSCAR found : 1 types and 577 ions
scaLAPACK will be used selectively (only on CPU)
LDA part: xc-table for Pade appr. of Perdew
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
WAVECAR not read
entering main loop
N E dE d eps ncg rms rms(c)
Out of memory allocating 4040409600 bytes of device memory
Failing in Thread:1
Out of memory allocating 4040409600 bytes of device memory
total/free CUDA memory: 34079637504/2775973888
Failing in Thread:1
Present table dump for device[2]: NVIDIA Tesla GPU 1, compute capability 7.0, threadid=1
total/free CUDA memory: 34079637504/1972764672

However, I found that simply moving the INCAR file to the user’s NFS-backed $HOME directory and symlinking it back into the Lustre job directory fully resolves the issue. But I have trouble understanding this fix.

#2 Post by **christopher_sheldon1** » Thu Jul 17, 2025 1:18 pm

Hi Zhiyuan,

Thank you for reporting this. That is unusual behaviour. Could you upload the INCAR, POSCAR, KPOINTS, OUTCAR, and your log files and I'll try to reproduce it?

Best wishes,

Chris

My Community

GPU Job Fails Unless INCAR is Symlinked from NFS Instead of Lustre

GPU Job Fails Unless INCAR is Symlinked from NFS Instead of Lustre

Re: GPU Job Fails Unless INCAR is Symlinked from NFS Instead of Lustre