Page 1 of 1

Problem compiling VASP/GPU with CUDA7.5 and 8.0

Posted: Thu Nov 24, 2016 4:43 pm
by jperaltac
Hello,

I tried the easy way to compile vasp with gpu support. However doesn't work because a double definition of a function. I did copied

cp arch/makefile.include.linux_intel_cuda makefile.include

When I run make gpu :

/usr/local/cuda//bin/nvcc -gencode=arch=compute_30,code=\"sm_30,compute_30\" -gencode=arch=compute_35,code=\"sm_35,compute_35\" -m64 --compiler-options -fno-strict-aliasing -DKERNEL_DP -DKERNEL_ZP -DDEBUG -DUSE_STREAM -DMPICH_IGNORE_CXX_SEEK -D__PARA -I/usr/local/cuda//include -I/include -I/opt/intel/impi_latest/include64 -I. -I/usr/local/cuda//include -I/usr/local/cuda//samples/common/inc -DUNIX -O3 -g -o obj/x86_64/release/hamil.cu.o -c hamil.cu
kernels.h(182): error: function "atomicAdd(double *, double)" has already been defined

1 error detected in the compilation of "/tmp/tmpxft_00002c1b_00000000-5_hamil.cpp4.ii".
common.mk:411: fallo en las instrucciones para el objetivo 'obj/x86_64/release/hamil.cu.o'
make[3]: *** [obj/x86_64/release/hamil.cu.o] Error 2

Re: Problem compiling VASP/GPU with CUDA7.5 and 8.0

Posted: Mon Dec 12, 2016 4:00 pm
by admin
All details on software, hardware, installation
and persons behind the GPU port of VASP can be find in the page
http://cms.mpi.univie.ac.at/wiki/index. ... rt_of_VASP

Re: Problem compiling VASP/GPU with CUDA7.5 and 8.0

Posted: Fri Jan 06, 2017 3:56 pm
by crivello
hi everyone,

I have exactly the same problem with cuda 8.0 installed.
I guess that it is because I am not using a Tesla card, but one other NVIDIA device type...
Am I right ?

thanks,
JCC

Re: Problem compiling VASP/GPU with CUDA7.5 and 8.0

Posted: Tue Feb 07, 2017 7:53 pm
by Paranord
I setup a compiler directive to remove the offending routine from src/CUDA/kernels.h

This compiles and runs on at least one system. Others can comment on the advisability of this change.

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
__forceinline__ __device__ double atomicAdd(double* address, double val)
{
unsigned long long int* address_as_ull=(unsigned long long int*)address;
unsigned long long int old=*address_as_ull,assumed;
do {
assumed=old;
old=atomicCAS(address_as_ull,assumed,__double_as_longlong(val+__longlong_as_double(assumed)));
} while(assumed!=old);
return __longlong_as_double(old);
}
#endif

Re: Problem compiling VASP/GPU with CUDA7.5 and 8.0

Posted: Thu Mar 09, 2017 6:30 pm
by amirs
Our guy in HPC had difficulties in installing GPU version of VASP. Below is 3 of his emails regarding his efforts in making vasp_gpu work:

1.
Things aren't looking so good for "vasp_gpu". Their current version only compiles under cuda/7.5, because under cuda/8.0, the compilation runs into an error:

kernels.h(182): error: function "atomicAdd(double *, double)" has already been defined
because cuda/8.0 predefines it, apparently.

But it turns out that Nvidia's device drivers are tightly coupled with their cuda versions, so when I tried to run yesterday's vasp_gpu compiled with cuda/7.5, it generated this error:

CUDA driver version is insufficient for CUDA runtime version

In fact, any of cuda/7.5's utilities fail this way on compute-1-14, because its current Nvidia driver is associated with cuda/8.0. I thought at first that the driver was old and installed the newest one, but it's the same problem.

If I "module load cuda/8.0", then at least cuda/8.0's utilities will run on compute-1-14.

So there are only a couple of ways around this:

1) Install the older Nvidia driver associated with cuda/7.5 on compute-1-14 -- but then it won't work with any programs compiled against cuda/8.0.

or 2) Modify vasp's source code so it doesn't try to have its own "atomicAdd()" function.

or 3) Throw in the towel. vasp_gpu might be something better suited to individual servers, rather than a cluster.
2.
Better that the VASP folks fix their source code to accommodate cuda/8.0 -- or else propose a supported work-around -- than us. They probably haven't thought much about cuda/8 yet, if VASP mostly runs on vanilla Ubuntu and Redhat servers that come with older versions of cuda.
3.
Regarding the source code "fix" (in case you want to forward this to the vasp folks), I knew that vasp was defining the conflicting "atomicAdd(double, double)" function in their file at "src/CUDA/kernels.h". Meanwhile, I found a useful web posting for a different program with a similar problem with CUDA 8.0:

https://github.com/vlfeat/matconvnet/issues/575

The issue is that the CUDA folks didn't define that "atomicAdd()" "double" variant in previous versions of CUDA, but they did provide code suggesting how programmers should define it in their own code. Then in version 8.0, they decided to include their suggested code within CUDA after all, and so now all of these "pre-8.0" programs are running into this atomicAdd() conflict, since it's defined both in their own source code and by CUDA's. Following the web page suggestion above, I modified the "atomicAdd" section of "src/CUDA/kernels.h" as follows:
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
// atomicAdd for "double" already defined under CUDA 8.0; no need to define it here.
#else
<... vasp's old pre-cuda/8.0 atomicAdd definition code placed here...>
#endif
Then I ran "make gpu", and it finally created a version of "vasp_gpu" that runs under cuda/8.0. Woo!
Give it a try, and let us (HPC Support) know how it goes, and -- if so -- whether it's much of an improvement over regular vasp.