Trouble running CRAYHIP (AMD MI300A) port of VASP 6.6.0 on more than 1 MPI rank or 1 GPU
Dear VASP admins and devs,
Thanks so much for the hard work on the first release of the port for AMD/Intel GPUs. We have a Cray machine with some GPU nodes (four MI300A GPUs/node) on which we want to run VASP and are trying to get the 6.6.0 version built with GNU Make. We started from the "cray_omp_off" makefile.include template:
Code: Select all
# Precompiler options
CPP_OPTIONS = -DHOST=\"LinuxFTN\" \
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-DMPI_INPLACE \
-Dvasp6 \
-Dtbdyn \
-Dfock_dblbuf
# activate OpenMP and gpu offloading
CPP_OPTIONS += -D_OPENMP \
-DOMP_OFFLOAD \
-DCRAYHIP
CPP = cpp --traditional -E -P -Wno-endif-labels $*$(FUFFIX) >$*$(SUFFIX) $(CPP_OPTIONS)
FC = ftn -hnoacc -homp
FCL = $(FC)
FREE = -ffree -N 1023
FFLAGS = -dC -rmo -emEb
# lower the ipa level for inlining to 0 to avoid compiler problems
FFLAGS += -hipa0
# suppress warnings
FFLAGS += -m 4
# O2 recommended for optimal GPU performance, O1 significantly slower in certain
# GPU kernels
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0
# fine grain control over lapack, by default ftn will link libsci with the
# appropriate configuration
# LAPACK = -L${CRAY_LIBSCI_PREFIX_DIR}/lib -lsci_cray_mpi
# LLIBS = $(LAPACK)
# FFTW_ROOT ?= /opt/cray/pe/fftw/3.3.8.11/x86_rome
LLIBS += -L$(FFTW_ROOT)/lib -lfftw3 -lfftw3_omp
INCS = -I$(FFTW_ROOT)/include
# HIP
CLANG = cc
# ROCM_PATH ?= /opt/rocm
HIPCC ?= ${ROCM_PATH}/bin/hipcc
ROCM_INCS = -I${ROCM_PATH}/include -I${ROCM_PATH}/include/hip -I${ROCM_PATH}/include/rocblas -I${ROCM_PATH}/include/rocsolver -I${ROCM_PATH}/include/rocfft
ROCM_LIBS = -L${ROCM_PATH}/hip/lib -lamdhip64 \
-L${ROCM_PATH}/lib -lrocblas -lrocfft -lrocsolver -lcraymp
# using RCCL aka NCCL for direct multi-GPU communication, recommended for best
# performance
CPP_OPTIONS += -DUSENCCL
ROCM_LIBS += -lrccl
LLIBS += $(ROCM_LIBS)
LIBS += HIP
LLIBS += -LHIP -lHipInterface
#
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = cc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)
OBJECTS_LIB= linpack_double.o getshmem.o
# For the parser library
CXX_PARS = CC
LLIBS += -lstdc++
# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin
# HDF5-support (optional but strongly recommended, and mandatory for some
# features)
CPP_OPTIONS+= -DVASP_HDF5
# HDF5_ROOT ?= /path/to/your/hdf5/installation
LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS += -I$(HDF5_ROOT)/include
# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /path/to/your/wannier90/installation
#LLIBS += -L$(WANNIER90_ROOT)/lib -lwannier
# Get major version of crayftn
CRAYFTNVER=$(shell crayftn --version 2>/dev/null | grep "Version" | sed -n 's/.*Version \([0-9]\+\)\..*/\1/p')
CPP_OPTIONS += -D__DCRAYFTN_VERSION=$(CRAYFTNVER)
### special cray workarounds cce v19.0.0, remove for cce20
# error Unsupported OpenMP construct Calls -- _cray_dv_broadcast : W_G%CPTWFP=0
OBJECTS_O2 += rot.o
# fexcg has to be higher optimization level for kernel not too spill
OBJECTS_O2 += fexcg.o mbj.o ldalib.o ggalib.o mggalib.o
# error: unexpected type in TYPE_DEREF l818 (copyin_wavefun1_array)
OBJECTS_O1 += openmp.o
# error: unexpected type in TYPE_DEREF l724 (twoelectron4o_acc)
OBJECTS_O1 += twoelectron4o.o
# error: unexpected type in TYPE_DEREF l377 (calculate_local_field_fock)
OBJECTS_O1 += local_field.o
# for the next problem we use OBJECTS_O3 to remove omp
FFLAGS_3 += -hnoomp
# error: Found inner_ref/inner_def object without Fortran internal procedure l5515
OBJECTS_O3 += bse.o
# error: Found inner_ref/inner_def object without Fortran internal procedure l1644
OBJECTS_O3 += GG_base.o
# MLFF problems with ISTART=2
OBJECTS_O1 += ml_ff_math.o ml_ff_ff2.o
#################
On the Cray machine, we have the following libraries/frameworks loaded:
Code: Select all
Currently Loaded Modulefiles:
craype-x86-genoa
libfabric/1.22.0
craype-network-ofi
perftools-base/25.03.0
xpmem/2.11.5-1.3_g73ade43320bc
cce/19.0.0
craype/2.7.34
cray-dsmml/0.3.1
cray-mpich/8.1.32
cray-libsci/25.03.0
PrgEnv-cray/8.6.0
cray-libpals/1.6.1
cray-pals/1.6.1
bct-env/0.2
mpscp/1.3a
rocm/6.3.0
cray-hdf5/1.14.3.5
craype-accel-amd-gfx942
cray-fftw/3.3.10.10
The build process works fine, but when testing the binary on a representative system, I'm unable to start the main loop when using more than one GPU or MPI rank.
My Slurm submission script is like this:
Code: Select all
#!/bin/bash
#SBATCH --job-name=test-amdgpu-vasp
#SBATCH --account=<account_name>
#SBATCH --qos=debug
#SBATCH --nodes=1
#SBATCH --constraint=gpu
#SBATCH --cpus-per-task=4
#SBATCH --gpus-per-node=4
#SBATCH --ntasks-per-node=4
#SBATCH --exclusive
#SBATCH --time=30:00
#
#SBATCH --requeue
#SBATCH --open-mode=append
# Use 4-8 OpenMP threads, as recommended in
https://vasp.at/wiki/GPU_ports_of_VASP#Environment_variables
export OMP_NUM_THREADS=4
export OMP_PLACES=threads
export OMP_PROC_BIND=spread
# Setting offload env vars as described in
https://vasp.at/wiki/GPU_ports_of_VASP#Environment_variables.
export MPICH_GPU_SUPPORT_ENABLED=1
export OMP_STACKSIZE=2048m
# Remove STOPCAR file so job isn't blocked
if [ -f "STOPCAR" ]; then
rm STOPCAR
fi
# Load dynamic libraries needed by VASP.
module load cray-mpich
module load rocm/6.3.0
module load cray-hdf5/1.14.3.5
module load craype-accel-amd-gfx942
module load cray-fftw/3.3.10.10
# Ensure that stack size is unlimited.
ulimit -s unlimited
# Start VASP binary.
# This fails.
# mpirun -np 4 --bind-to core ./vasp_gam
# This fails as well.
# srun --unbuffered --cpu-bind=cores --gpu-bind=none ./vasp_gam
# And this fails as well.
mpirun -np 4 --cpu-bind=core --gpu-bind=none ./vasp_gam
# mpirun -np 2 ./vasp_gam
wait
In stdout, I'm seeing all GPUs detected and offloading initialize successfully:
Code: Select all
running 4 mpi-ranks, with 4 threads/rank, on 1 nodes
distrk: each k-point on 4 cores, 1 groups
distr: one band on 1 cores, 4 groups
Offloading initialized ... 4 GPUs detected
vasp.6.6.0 06Mar2026 (build Mar 24 2026 09:52:28) gamma-only
POSCAR found type information on POSCAR <redacted>
POSCAR found : 7 types and 1038 ions
Reading from existing POTCAR
scaLAPACK will be used
Reading from existing POTCAR
-----------------------------------------------------------------------------
| |
| ----> ADVICE to this user running VASP <---- |
| |
| You enforced a specific xc type in the INCAR file but a different |
| type was found in the POTCAR file. |
| I HOPE YOU KNOW WHAT YOU ARE DOING! |
| |
-----------------------------------------------------------------------------
When running with a single MPI rank and on a single MI300A GPU, VASP will then successfully enter the main loop and start SCF cycles. But with multiple MPI ranks, I instead get job failure, CPU and GPU core dumps, and sterr that typically looks like this:
Code: Select all
Memory access fault by GPU node-4 (Agent handle: 0x1f1e6770) on address 0x14614ec00000. Reason: Unknown.
Memory access fault by GPU node-4 (Agent handle: 0x2598e770) on address 0x152048e04000. Reason: Unknown.
Memory access fault by GPU node-4 (Agent handle: 0x17a4b770) on address 0x145c9c001000. Reason: Unknown.
srun: error: nid-ai05: task 2: Aborted (core dumped)
srun: Terminating StepId=135072.0
slurmstepd: error: *** STEP 135072.0 ON nid-ai05 CANCELLED AT 2026-03-24T18:38:23 ***
srun: error: nid-ai05: task 0: Terminated
srun: error: nid-ai05: tasks 1,3: Aborted (core dumped)
srun: Force Terminated StepId=135072.0
The errors persist whether I use mpirun (from Cray PALS) or Slurm's srun to launch the job, and whether or not I enable RCCL via the -DUSENCCL option in the makefile.include. I am using v19.0.0 of the Cray compilers, with the necessary optimization fixes at the bottom of the makefile.
I've asked my local HPC admin for help, but wanted to know if you've seen anything similar in your testing and if there's anything I'm doing wrong?
Thanks in advance!