GW0: problems with CALCULATE_XI_REAL and memory insufficiency

Question on input files/tags, interpreting output, etc.

Please check whether the answer to your question is given in the VASP online manual or has been discussed in this forum previously!

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
pascal_boulet1
Newbie
Newbie
Posts: 27
Joined: Thu Nov 14, 2019 7:38 pm

GW0: problems with CALCULATE_XI_REAL and memory insufficiency

#1 Post by pascal_boulet1 » Thu Apr 18, 2024 12:57 pm

Dear all,

I am using VASP 6.4.2, and I am trying to calculate the dielectric function using GW0. I don’t think my system is too big for this kind of calculation: 79 occupied orbitals and 10 k-points (Gamma centered 4x4x1 grid).

I have diagonalized exactly the Hamiltonian with 128 bands. Now I am trying to calculate the dielectric function. I am following the tutorial on Si.

Actually, I am facing two problems: one is related to the message “CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.” the other is insufficient memory.

The INCAR is the following:
ALGO = EVGW0R
LREAL = auto
LOPTICS = .TRUE.
LSPECTRAL = .TRUE.
LPEAD = .TRUE.
NOMEGA = 12
NBANDS = 128
NELMGW = 4
ISMEAR = 0
SIGMA = 0.05
EDIFF = 1e-8
KPAR = 8

With this input I get “CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.” and the job stops. Although I understand the message, I am not sure to which keyword it is related. Could you please help me with this?

Now, if I switch to ALGO = EVGW0, the crash disappears. However, with ALGO = EVGW0, I am now having the problem of the “shortage” of memory.

I am using a HPC supercomputer with 8 nodes, 2 processors/node, 64 cores/processors and 256 Go RAM / node. So, it is 2 Go/core, and I guess I should have over 2 To RAM for the job.
I am using MPI+OpenMP: 256 MPI + 4 threads per rank. But using only MPI leads to the same end.

In the OUTCAR I have the following information:
running 256 mpi-ranks, with 4 threads/rank, on 8 nodes
distrk: each k-point on 32 cores, 8 groups
distr: one band on NCORE= 1 cores, 32 groups

total amount of memory used by VASP MPI-rank0 72479. kBytes
available memory per node: 6.58 GB, setting MAXMEM to 6742
...
OPTICS: cpu time 18.2553: real time 4.9331
...
files read and symmetry switched off, memory is now:
total amount of memory used by VASP MPI-rank0 116900. kBytes
...
| This job will probably crash, due to insufficient memory available. |
| Available memory per mpi rank: 6742 MB, required memory: 6841 MB. |

min. memory requirement per mpi rank 6841.5 MB, per node 218927.6 MB

Nothing more.

Note, the command “cat /proc/meminfo | grep MemAvailable” gives “252465220 kB”, But in the log file I get the information:
available memory per node: 6.58 GB, setting MAXMEM to 6742

The figures look contradictory to me between what I have in the OUTCAR or the log file, and what I get from meminfo.

Is there something wrong in my setting or something I misunderstand regarding the memory management?

Thank you for your help and time,
Pascal

michael_wolloch
Global Moderator
Global Moderator
Posts: 63
Joined: Tue Oct 17, 2023 10:17 am

Re: GW0: problems with CALCULATE_XI_REAL and memory insufficiency

#2 Post by michael_wolloch » Thu Apr 18, 2024 1:53 pm

Hi Pascal,

please provide a minimal reproducible example of your problem when you post on the forum.

From what I can see from your post, you are having trouble with KPAR.
You set

Code: Select all

KPAR = 8
and the error you receive warns you that KPAR>1 is not implemented:

Code: Select all

CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.
The solution is to set KPAR = 1, which is also the default. However, the cubic scaling GW routines do require more memory than the old GW routines, so while the error you get will disappear, the memory problem may persist. In that case, you should switch to the old GW routines, but still get rid of KPAR, or at least set it to a lower value (4 or 2).

If you use KPAR, memory requirements increase. In your case, you set KPAR=8, so e.g. 16 k points will be split into 8 groups of cores, that work on 2 k points each. However, all those groups work on all orbitals and have to keep a copy of all orbitals in memory. So your memory requirements will increase by roughly a factor of KPAR! By setting KPAR=8, you effectively negate the additional memory you get from increasing the number of compute nodes, because you have to store 8 times as much stuff.

If your problem is memory per core, you can increase it by decreasing the number of cores you use. E.g. fill every node with only 64 mpi ranks, but make sure that they are evenly distributed over both sockets. Now you have 4 GB/c instead of 2. This will also increase memory bandwidth per core, which is often a bottleneck for VASP on very high core count machines.

Make sure to read up on GW calculations here!

Let me know if this helps,
Michael

pascal_boulet1
Newbie
Newbie
Posts: 27
Joined: Thu Nov 14, 2019 7:38 pm

Re: GW0: problems with CALCULATE_XI_REAL and memory insufficiency

#3 Post by pascal_boulet1 » Mon May 06, 2024 11:32 am

Hello,

Thank you for the hints. I have spent quite some time trying to run the GW0 calculation.

As you said, since I have few k-points, I can change KPAR for 1.

But there is no way! If I select, e.g., 512 cores with 128 orbitals, I have to set NCORES=#CORES/#ORBITALS. If I don't set NCORES=#CORES/#ORBITALS,
then NCORE=1, and in this case, VASP changes the number of orbitals but does not read WAVEDER, since it has been created for 128 orbitals.

But, if I set NCORES=#CORES/#ORBITALS, the job fails because VASP is complaining about a change in the number of k-points between the "DFT" and the "HF" calculations.
So, as a workaround, VASP says to set NCORE to 1!

The snake bites its tail!

So, the number of bands depends on the number of parallel cores used. Is there a way to constrain VASP to strictly use the number of bands specified in WAVEDER?

Otherwise, I have tried what you suggested: I have run the job on various number of nodes (1 node = 128 cores) but setting the number of mpi ranks to 256, which is the number of bands. I have reserved all the nodes to have the full memory. Still, even with 16 nodes, the job fails with an out-of-memory message.

I can try with less bands... but on the website, it is sait that we should use as many orbitals as we can (NBANDS=maximum number of plane-waves). In my case it is +17000 plane-waves, so unfeasible.

You can have a look at the files in the archive I have attached. Maybe you can have some ideas.

Thank you,
Best regards,
Pascal
You do not have the required permissions to view the files attached to this post.

Post Reply