Useful compilation experience for IBM power9 with V100 gpu

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
guiyang_huang1
Newbie
Newbie
Posts: 12
Joined: Tue Nov 12, 2019 7:00 pm

Useful compilation experience for IBM power9 with V100 gpu

#1 Post by guiyang_huang1 » Wed Nov 13, 2019 4:47 pm

For the code I have, "c_size_t" needs to be replaced by "c_int", and "c_loc" needs to be added to some parameters, like CHAM to c_loc(CHAM).
The compiler would show errors. They can be resolved by the above revisions.

For cpp options, "CPP = gcc -E -P -C -w $*$(FUFFIX) >$*$(SUFFIX) $(CPP_OPTIONS)" can be used for the gnu, pgi and xl compiler.

"-DUSE_PINNED_MEMORY" can not be used for gpu and xl compiler, otherwise, when vasp runs, it would show errors. I have not found out the solutions.

"-DUSE_PINNED_MEMORY" can be used for pgi compiler. It has noticeable positive effects if the system size is not too small.

Magma is very important for the gpu vasp, otherwise I observed that for single k-point, using more nodes, the calculations is not noticeably fast for some calculations.

Using magma, OMP_NUM_THREADS should be set. A suitable OMP_NUM_THREADS can make the calculations much faster. Even if the other part are not compiled with share memory compilation options.
For less than 2 nodes, two mpi can be used for one gpu. Setting a suitable OMP_NUM_THREADS to use all cpu.
For more than 2 nodes, one mpi should be used for one gpu. Setting a suitable OMP_NUM_THREADS to use all cpu.

For xl compiler, trailing underscore needs to be added manually in order to use the correct magma functions. Search mamga in the source codes.

essl's fftw wrapper is faster. But it can limit the number of nodes which can be used. It can not use too many nodes. For the fttw, there is no limitation of the number of nodes. If possible, essl's fftw wrapper should still be used, since it is faster.


For xl compiled gpu vasp, there exist error if the system size is large for hybrid DFT calculations.
The wavefunction optimization would diconverge. It did not show a wrong converged result.
Such behavior does not exist for pgi or gnu compiled gpu vasp.
However, for the normal calculations, the results are the same, and the xl compiled gpu vasp is faster. (93s vs 100 s for pgi vs 102 s for gnu, 1292 s vs 1361 s for pgi, 1623 s for gnu)


For all of them, hybrid DFT calculation for ISIF=3 seems to have bugs. Using ISIF=4 or 2, and change the lattice constant manually, it can obtain the correct results.

esslsmp and esslsmpcuda seems to have almost no effects on the speed. Using multithread can have noticeable improvement should be attributed to the magma.
Maybe, when using mamga, if the problem size is large, it would use mamga. There is no chance for the esslsmpcua to be used.

For the normal vasp, esslsmpcuda can have positive effects. But suitable OMP_NUM_THREADS and suitable NCORE needs to be set. Otherwise, it would be slower than essl.

guiyang_huang1
Newbie
Newbie
Posts: 12
Joined: Tue Nov 12, 2019 7:00 pm

Re: Useful compilation experience for IBM power9 with V100 gpu

#2 Post by guiyang_huang1 » Wed Dec 18, 2019 10:35 pm

The LOOP or LOOP+ of the pgi compiled vasp is smaller than that of xl compiled vasp. It may be due to that the pinned memory can be used for pgi compiler.
But the total elapsed time of the xl compiled vasp is always and noticeably smaller than that of the pgi compiled vasp.
It seems the last step of each ionic step is much faster for the xl compiled gpu vasp.

I can not compile vasp successfully using "-mp" options for pgi. It runs much slower than that without "-mp", and strange messages would also be shown.

Using xl, "-qsmp=omp" can be used successfully for vasp.
That xl compiled vasp is faster for the last step may be attributed to "-qsmp=omp".

For normal structural relaxations, xl compiled gpu vasp has no problem. But for md simulations or hybrid DFT calculations or other untested functions, the xl compiled vasp can have problems.

Post Reply