Page 1 of 1

Compile VASP with vectorization support

Posted: Wed Feb 08, 2023 7:24 am
by Dankomaister
Hi,

I have a question regarding compiling VASP with vectorization support such as AVX2 or AVX512 in particular.
Are there any additional precompiler flags that needs to be added to fully utilize all vectorization code paths?

For example, I noticed that the file src/simd.inc has the following commented out section of code

Code: Select all

!!#if   defined(__MIC__) || defined(__AVX512F__)
!!#define SIMD512
!!#undef  SIMD256
!!#elif defined(__AVX__) || defined(__AVX2__)
!!#define SIMD256
!!#undef  SIMD512
!!#endif
Does this mean that we manually have to set the SIMD256 or SIMD512 precompiler flags to fully utilize AVX2 or AVX512?
Any clarification on this would be helpful.

/Daniel

Re: Compile VASP with vectorization support

Posted: Mon Feb 13, 2023 2:05 am
by Dankomaister
Anyone have any insights on this?

Re: Compile VASP with vectorization support

Posted: Mon Feb 13, 2023 1:25 pm
by fabien_tran1
Hi,

Sorry for the late answer. Yes, -DSIMD256 or -DSIMD512 needs to be added in makefile.include as an additional option (CPP_OPTIONS). However, the implementation (https://doi.org/10.1002/qua.25851) is not supported and may be broken. Therefore, its use is not recommended.

Re: Compile VASP with vectorization support

Posted: Tue Feb 14, 2023 2:21 am
by Dankomaister
Great thanks for clearing this up!

I have some a few further question, is -DSIMD256 and -DSIMD512 only relevant when compiling VASP with OpenMP?
Or will it also benefit the pure MPI version? and are there any plans to support this vectorization in the future?

/Daniel

Re: Compile VASP with vectorization support

Posted: Tue Feb 14, 2023 8:38 am
by fabien_tran1
Yes, SIMD works only in conjunction with OpenMP. At the moment, no decision has been made about the future of SIMD.

Concerning the use of SIMD, I should elaborate a bit more. -DSIMD256 and -DSIMD512 activate SIMD in xclib_grad.F for GGA functionals (91,AM,B3,B5,BO,MK,ML,OR,PE,PS,RE,RP) in non-spin polarized case only (no implementation of SIMD in spin-polarized GGA and also not for meta-GGA functionals). According to a (quick) test I have just made -DSIMD256 seems to work (the results are correct) for all aforementioned GGA functionals. Thus, maybe it is ok to use the SIMD option, but preferably with prior tests calculations to check the correctness of the results and the gain in speed.

Re: Compile VASP with vectorization support

Posted: Tue Feb 14, 2023 12:51 pm
by Dankomaister
Ok so I need to compile with OpenMP, the reason I asked is because we have found that for almost all systems/calculations it is faster to run the pure MPI version of VASP because using optimal values of NCORE is always faster than using NCORE=1 and MPI+OpenMP. I assume using more than one OpenMP thread per MPI task is required to unlock the SIMD optimizations? or can I compile with OpenMP and then only use 1 thread per MPI task so that it is possible to use higher values of NCORE and SIMD optimizations?

The paper you linked shows a 9x speedup when using SIMD which is huge, if that is indeed true it would be beneficial to use MPI+OpenMP with SIMD over the pure MPI version. Maybe that could warrant official support for SIMD? and justify an implementation of SIMD for spin polarized GGA, up to 9x speedup would certainly be useful.

Re: Compile VASP with vectorization support

Posted: Tue Feb 14, 2023 2:54 pm
by fabien_tran1
No, it is not necessary to set OMP_NUM_THREADS to a value larger than 1 to have SIMD activated (I have just checked it). Note that the speedup shown in the paper is for the time spent in the GGAALL_GRID subroutine, and not for the total time. I will test more carefully the SIMD implementation and write something on the VASP manual if finally we think it is safe to use it. Yes, sure it would be good to have SIMD implemented in other parts of the code, and we will think about it.

Re: Compile VASP with vectorization support

Posted: Wed Feb 22, 2023 11:13 am
by fabien_tran1
Update: The implementation of SIMD for range-separated hybrid functionals like HSE has a bug (wrong results). The bug can be fixed by deleting or commenting the following line in xclib_grad.F:
INIT_PRED = .FALSE.

For the GGA functionals (91,AM,B3,B5,BO,MK,ML,OR,PE,PS,RE,RP) there was no problem.