Page 1 of 1

"internal error in: mpi.F, M_sumb_d: invalid vector size" in wannier mode

Posted: Tue Jan 10, 2023 10:50 am
by dongshen_wen
Dear developers and users,


I am encountering a problem in the vasp_ncl calculation during writing out the wannier results. I am using vasp6.3.2 with the wannier90 v3 interface.
The calculation of the NSCF was good until it enters the wannier library mode:

Code: Select all

 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1    -0.157375086006E+03   -0.15738E+03   -0.69218E-06345984   0.191E-02
 Calling wannier_setup of wannier90 in library mode
 Wannier90 mode
 Computing MMN (overlap matrix elements)
 -----------------------------------------------------------------------------
|                     _     ____    _    _    _____     _                     |
|                    | |   |  _ \  | |  | |  / ____|   | |                    |
|                    | |   | |_) | | |  | | | |  __    | |                    |
|                    |_|   |  _ <  | |  | | | | |_ |   |_|                    |
|                     _    | |_) | | |__| | | |__| |    _                     |
|                    (_)   |____/   \____/   \_____|   (_)                    |
|                                                                             |
|     internal error in: mpi.F  at line: 1359                                 |
|                                                                             |
|     M_sumb_d: invalid vector size n -1953955840                             |
|                                                                             |
 -----------------------------------------------------------------------------
I've found a similar discussion forum/viewtopic.php?p=23166#p23166 but my case does not seem to be the same. I applied the patch from that thread but the error still exist.
As mentioned in the discussion forum/viewtopic.php?f=4&t=18069&start=15, vasp6.3.x have a better wannier parallelization scheme.

I attach my inputs and error files. Please let me know if you need anything else. Thank you in advance for any hints.

Best,
Dongsheng

Re: internal error in mpi.F

Posted: Thu Jan 12, 2023 9:13 am
by dongshen_wen
A quick follow up: I tried using different parallelization parameters in the bash job script and here is what I found.
In the bash job, when I set (number of tasks)*(number of nodes)=NBANDS, vasp_ncl completed the calculation without the error I mentioned. But I just have one test and will update if I find something new.
dongshen_wen wrote: Tue Jan 10, 2023 10:07 am Dear Alexey,

I tried the patch above but it did not seem to work. I still got the same error message. It seems to me it came from the M_sumb_d subroutine on line 1329 in mpi.F. I saw that the patch updates the M_sum_d to M_sum_d8 so I'm not sure if this is the case.
I've also tried commenting out the LORBIT tag, the error persisted. So I guess it might not come from the same subroutine SPHPRO_FAST as you mentioned.

I will open a new thread to update the files.

Best,
Dongsheng


Re: "internal error in: mpi.F, M_sumb_d: invalid vector size" in wannier mode

Posted: Thu Jan 12, 2023 9:22 am
by andreas.singraber
For later reference: this thread originated from here: https://vasp.at/forum/viewtopic.php?f=3&t=18774

Re: "internal error in: mpi.F, M_sumb_d: invalid vector size" in wannier mode

Posted: Fri Jan 20, 2023 8:47 pm
by andreas.singraber
Dear Dongsheng,

sorry for the delay... I had a closer look and I suspect there is a similar problem with a 4-byte integer overflowing in mlwf.F in line 1234:

Code: Select all

CALLMPI( M_sum_z( WDES%COMM_KINTER, MLWF%M_matrix, SIZE(MLWF%M_matrix)))
The SIZE function returns a 4-byte integer which may be too small to capture the total number of matrix elements and hence overflows. A negative number is then passed on to M_sum_z which is then internally calling M_sumb_d which raises the error you are observing.

I will work on a fix, discuss it with my colleagues and present it to you... please stay tuned...

Best,
Andreas Singraber

Re: "internal error in: mpi.F, M_sumb_d: invalid vector size" in wannier mode

Posted: Thu Jan 26, 2023 10:38 am
by dongshen_wen
Dear Andreas,

Thank you for this update. It also appeared to me that the vasp6.3.2 returns different overlap matrix, projection, or eigenvalues. I tried the same BCC Fe example on vasp6.3.2 and vasp6.2.1. With the .amn, .mmn, and .eig from 6.2.1, the disentanglement and wannierization processes were good (with small spread and reproduced DFT bands). When using these files generated by 6.3.2, the disentanglement and wannierization processes never converged with extremely large spread.
I'll keep posting when I find something new.

Best,
Dongsheng
andreas.singraber wrote: Fri Jan 20, 2023 8:47 pm Dear Dongsheng,

sorry for the delay... I had a closer look and I suspect there is a similar problem with a 4-byte integer overflowing in mlwf.F in line 1234:

Code: Select all

CALLMPI( M_sum_z( WDES%COMM_KINTER, MLWF%M_matrix, SIZE(MLWF%M_matrix)))
The SIZE function returns a 4-byte integer which may be too small to capture the total number of matrix elements and hence overflows. A negative number is then passed on to M_sum_z which is then internally calling M_sumb_d which raises the error you are observing.

I will work on a fix, discuss it with my colleagues and present it to you... please stay tuned...

Best,
Andreas Singraber