Page 1 of 1

[VASP 6.2 intel 2020.0] FeAl_333_RPAFORCE fails with 12 tasks with Fatal error in PMPI_Waitall:

Posted: Thu Mar 04, 2021 9:41 am
by thibautvery
Hello,

I run the test FeAl_333_RPAFORCE with a version of VASP 6.2 compiled with Intel parallel studio 2020.0 (as recommended on the wiki).
It runs smoothly up to 10 MPI tasks.
With 12 tasks, there is an error just after the end of the first geometric step.

Code: Select all

Abort(17) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Waitall: See the MPI_ERROR field in MPI_Status for the error code
The same test runs well with a version compiled with intel parallel studio 2018.5.

I attached the stdout file and OUTCAR for both runs, the makefile.include file (the same for both compilers) and the output of DDT for the stacktrace.

Do you know if there is a workaround for the problem?

Thibaut VĂ©ry

Re: [VASP 6.2 intel 2020.0] FeAl_333_RPAFORCE fails with 12 tasks with Fatal error in PMPI_Waitall:

Posted: Thu Mar 04, 2021 12:42 pm
by merzuk.kaltak
Dear Thibaut,

This might be an Intel-MPI problem. Is it possible to change the intel-mpi version, but use the same compiler?
Alternatively, you may try other compiler toolchains as listed on our wiki.

Also, tests in the testsuite have been tested for 1, 2, 3, 4, 6 and 8 MPI ranks as mentioned here. In addition the test you are running is testing an undocumented feature of vasp that is still in development. As such, it is never run with "make test" or "make test_all". We are still working on the RPA forces to make them more stable and reliable.

with regards,
Merzuk

Re: [VASP 6.2 intel 2020.0] FeAl_333_RPAFORCE fails with 12 tasks with Fatal error in PMPI_Waitall:

Posted: Wed Sep 08, 2021 7:35 am
by andreas.singraber
Dear Thibaut,

we came across the same error messages and similar behaviour for some other calculation and figured out that there is a compiler bug up to Intel 2021.2 which affects non-blocking broadcast messages (MPI_Ibcast) as used in VASP. I was also able to write a simple reproducer code to trigger the error.

Now, I am not entirely sure that your error has the same origin as the one we found but it is very likely. In this case the fix we found may also work for you: just try to upgrade your Intel compiler to the latest version 2021.3 which resolves the MPI_Ibcast issues. However, there is one downside to this.. with 2021.3 there are problems with the OpenMP version of VASP (see here), so it may be better to try without OpenMP support.

Also, the next version of VASP will allow to set a preprocessor flag to avoid this compiler bug for Intel compiler versions < 2021.3.

If you are able to test 2021.3, please let us know if it fixed your problem as well!

All the best,

Andreas Singraber