Page 1 of 2

error encountered while running VASP in GPU

Posted: Tue Mar 26, 2024 6:33 am
by bhargabkakati
Dear experts, I have compiled vasp (without wannier90 interface) in my GPU. I got an error while trying to run VASP as shown in the screenshot (note : I got the same error while trying to run quantum espresso also). Any help to resolve the issue would be highly appreciated. Thank you. Here are my system specification:

OS: Ubuntu 22.04
CPU: 36 Core
GPU: Nvidia RTX A6000
CUDA Version: 12.2

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

Error:

mpirun -np 8 /home/cms-gpu/softwares/vasp.6.4.2/bin/vasp_ncl
running 8 mpi-ranks, on 1 nodes

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO

libgomp: TODO
distrk: each k-point on 8 cores, 1 groups
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[45111,1],1]
Exit code: 1

Re: error encountered while running VASP in GPU

Posted: Tue Mar 26, 2024 10:11 am
by martin.schlipf
Could you provide the makefile.include and tell us which modules you load, please?

It also seems like the error stems from OPENMP so you might want to compile without that option as a first attempt.

Re: error encountered while running VASP in GPU

Posted: Tue Mar 26, 2024 10:15 am
by bhargabkakati
sure, here is the makefile.include file.
Thank you.

Re: error encountered while running VASP in GPU

Posted: Tue Mar 26, 2024 10:17 am
by bhargabkakati
and, although I am not so sure (as I am new to ubuntu), I have used openmp, nvfortran.

Re: error encountered while running VASP in GPU

Posted: Tue Mar 26, 2024 10:24 am
by martin.schlipf
Thanks, I will try to reproduce this. At a first glance it seems strange that you get this error since you do not have OpenMP in your makefile.include, so I do not know why you would need to link to libgomp.

Re: error encountered while running VASP in GPU

Posted: Tue Mar 26, 2024 10:27 am
by bhargabkakati
Will look forward to your insight. Thank you.

Re: error encountered while running VASP in GPU

Posted: Tue Mar 26, 2024 2:04 pm
by martin.schlipf
When looking at your makefile.include, I noticed that you had replaced mpif90 with the explicit path to the NVIDIA compiler. What was the reason for that? If I were to guess, I would assume that you did not add the compiler to your PATH and hence the system decided to use a built-in mpif90 or did not find a mpif90 at all.

Did you do the same procedure for mpirun as well or did you add mpirun to your PATH? If not, it is possible that you use the mpirun of a different library which will typically not work. You can check this by

Code: Select all

which mpirun
This should show /opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/mpi/bin/mpirun or something very similar. If it does not, please add mpirun to your PATH or explicitly use the path to that executable.

Re: error encountered while running VASP in GPU

Posted: Wed Mar 27, 2024 4:14 am
by bhargabkakati
Hello, "which mpirun" showed "/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/12.3/openmpi4/openmpi-4.1.5/bin/mpirun". What to do now to solve the issue?
thanks.

Re: error encountered while running VASP in GPU

Posted: Wed Mar 27, 2024 8:59 am
by martin.schlipf
I wrote a small example code.

Code: Select all

! example.f90
program main

    implicit none

    real x(1000), y(1000), sum_
    integer ii

    sum_ = 0
    call random_number(x)
    call random_number(y)

    !$acc parallel reduction(+:sum_)
    do ii = 1, size(x)
        sum_ = sum_ + x(ii) * y(ii)
    end do
    !$acc end parallel

    write(0,*) sum_, sum(x * y)

end program main
Can you try to compile this with the same flags

Code: Select all

/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/mpi/bin/mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.3 example.f90
and check whether you get the same error when you run the executable?

Another difference I noticed is the cuda version. In out tests, we use always cuda11.0.

Re: error encountered while running VASP in GPU

Posted: Wed Mar 27, 2024 9:35 am
by bhargabkakati
Hello,
I ran the command you gave and got an "a.out" file. I did not get any error message.

Re: error encountered while running VASP in GPU

Posted: Wed Mar 27, 2024 2:28 pm
by bhargabkakati
Hello, Even though the example program wrote by you ran without any error. VASP is still showing the same error. What can I do?
Thank You.

Re: error encountered while running VASP in GPU

Posted: Thu Mar 28, 2024 8:18 am
by martin.schlipf
When I tried to reproduce your setup, I ran into an issue with FFTW and looking into your makefile.include it may be that your fftw is not compatible with nvfortran. In particular it seems like you use the OpenMP version, which may explain why you get the errors that you see. Perhaps you can modify the example to do one fft and see whether that produces the error.

Re: error encountered while running VASP in GPU

Posted: Thu Mar 28, 2024 9:04 am
by bhargabkakati
Hello sir,
I am very new to this field and I'm afraid I won't be able to modify the example to do fft on my own. Can you please assist me with that?
Thank you.

Re: error encountered while running VASP in GPU

Posted: Thu Mar 28, 2024 4:57 pm
by martin.schlipf
Something like this?

Code: Select all

program main
    implicit none
    #include "fftw3.f"
    integer, parameter :: N = 100
    double complex in, out
    dimension in(N), out(N)
    integer*8 plan
    call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
    call dfftw_execute_dft(plan, in, out)
    call dfftw_destroy_plan(plan)
end program main
which you can compile with

Code: Select all

nvfortran example.f90 -I $FFTW_ROOT/include -L $FFTW_ROOT/lib -lfftw3
after you set FFTW_ROOT to the appropriate folder.

Re: error encountered while running VASP in GPU

Posted: Sat Mar 30, 2024 5:51 am
by bhargabkakati
Hello sir,
I did "nvfortran example2.f90 -I /opt/intel/oneapi/mkl/2024.0/include -L /opt/intel/oneapi/mkl/2024.0/include/fftw -lfftw3" with the code you've given and got "a.out" without any error.