GW0: problems with CALCULATE_XI_REAL and memory insufficiency

Question on input files/tags, interpreting output, etc.

Please check whether the answer to your question is given in the VASP online manual or has been discussed in this forum previously!

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
pascal_boulet1
Newbie
Newbie
Posts: 27
Joined: Thu Nov 14, 2019 7:38 pm

GW0: problems with CALCULATE_XI_REAL and memory insufficiency

#1 Post by pascal_boulet1 » Thu Apr 18, 2024 12:57 pm

Dear all,

I am using VASP 6.4.2, and I am trying to calculate the dielectric function using GW0. I don’t think my system is too big for this kind of calculation: 79 occupied orbitals and 10 k-points (Gamma centered 4x4x1 grid).

I have diagonalized exactly the Hamiltonian with 128 bands. Now I am trying to calculate the dielectric function. I am following the tutorial on Si.

Actually, I am facing two problems: one is related to the message “CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.” the other is insufficient memory.

The INCAR is the following:
ALGO = EVGW0R
LREAL = auto
LOPTICS = .TRUE.
LSPECTRAL = .TRUE.
LPEAD = .TRUE.
NOMEGA = 12
NBANDS = 128
NELMGW = 4
ISMEAR = 0
SIGMA = 0.05
EDIFF = 1e-8
KPAR = 8

With this input I get “CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.” and the job stops. Although I understand the message, I am not sure to which keyword it is related. Could you please help me with this?

Now, if I switch to ALGO = EVGW0, the crash disappears. However, with ALGO = EVGW0, I am now having the problem of the “shortage” of memory.

I am using a HPC supercomputer with 8 nodes, 2 processors/node, 64 cores/processors and 256 Go RAM / node. So, it is 2 Go/core, and I guess I should have over 2 To RAM for the job.
I am using MPI+OpenMP: 256 MPI + 4 threads per rank. But using only MPI leads to the same end.

In the OUTCAR I have the following information:
running 256 mpi-ranks, with 4 threads/rank, on 8 nodes
distrk: each k-point on 32 cores, 8 groups
distr: one band on NCORE= 1 cores, 32 groups

total amount of memory used by VASP MPI-rank0 72479. kBytes
available memory per node: 6.58 GB, setting MAXMEM to 6742
...
OPTICS: cpu time 18.2553: real time 4.9331
...
files read and symmetry switched off, memory is now:
total amount of memory used by VASP MPI-rank0 116900. kBytes
...
| This job will probably crash, due to insufficient memory available. |
| Available memory per mpi rank: 6742 MB, required memory: 6841 MB. |

min. memory requirement per mpi rank 6841.5 MB, per node 218927.6 MB

Nothing more.

Note, the command “cat /proc/meminfo | grep MemAvailable” gives “252465220 kB”, But in the log file I get the information:
available memory per node: 6.58 GB, setting MAXMEM to 6742

The figures look contradictory to me between what I have in the OUTCAR or the log file, and what I get from meminfo.

Is there something wrong in my setting or something I misunderstand regarding the memory management?

Thank you for your help and time,
Pascal

michael_wolloch
Global Moderator
Global Moderator
Posts: 64
Joined: Tue Oct 17, 2023 10:17 am

Re: GW0: problems with CALCULATE_XI_REAL and memory insufficiency

#2 Post by michael_wolloch » Thu Apr 18, 2024 1:53 pm

Hi Pascal,

please provide a minimal reproducible example of your problem when you post on the forum.

From what I can see from your post, you are having trouble with KPAR.
You set

Code: Select all

KPAR = 8
and the error you receive warns you that KPAR>1 is not implemented:

Code: Select all

CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.
The solution is to set KPAR = 1, which is also the default. However, the cubic scaling GW routines do require more memory than the old GW routines, so while the error you get will disappear, the memory problem may persist. In that case, you should switch to the old GW routines, but still get rid of KPAR, or at least set it to a lower value (4 or 2).

If you use KPAR, memory requirements increase. In your case, you set KPAR=8, so e.g. 16 k points will be split into 8 groups of cores, that work on 2 k points each. However, all those groups work on all orbitals and have to keep a copy of all orbitals in memory. So your memory requirements will increase by roughly a factor of KPAR! By setting KPAR=8, you effectively negate the additional memory you get from increasing the number of compute nodes, because you have to store 8 times as much stuff.

If your problem is memory per core, you can increase it by decreasing the number of cores you use. E.g. fill every node with only 64 mpi ranks, but make sure that they are evenly distributed over both sockets. Now you have 4 GB/c instead of 2. This will also increase memory bandwidth per core, which is often a bottleneck for VASP on very high core count machines.

Make sure to read up on GW calculations here!

Let me know if this helps,
Michael

pascal_boulet1
Newbie
Newbie
Posts: 27
Joined: Thu Nov 14, 2019 7:38 pm

Re: GW0: problems with CALCULATE_XI_REAL and memory insufficiency

#3 Post by pascal_boulet1 » Mon May 06, 2024 11:32 am

Hello,

Thank you for the hints. I have spent quite some time trying to run the GW0 calculation.

As you said, since I have few k-points, I can change KPAR for 1.

But there is no way! If I select, e.g., 512 cores with 128 orbitals, I have to set NCORES=#CORES/#ORBITALS. If I don't set NCORES=#CORES/#ORBITALS,
then NCORE=1, and in this case, VASP changes the number of orbitals but does not read WAVEDER, since it has been created for 128 orbitals.

But, if I set NCORES=#CORES/#ORBITALS, the job fails because VASP is complaining about a change in the number of k-points between the "DFT" and the "HF" calculations.
So, as a workaround, VASP says to set NCORE to 1!

The snake bites its tail!

So, the number of bands depends on the number of parallel cores used. Is there a way to constrain VASP to strictly use the number of bands specified in WAVEDER?

Otherwise, I have tried what you suggested: I have run the job on various number of nodes (1 node = 128 cores) but setting the number of mpi ranks to 256, which is the number of bands. I have reserved all the nodes to have the full memory. Still, even with 16 nodes, the job fails with an out-of-memory message.

I can try with less bands... but on the website, it is sait that we should use as many orbitals as we can (NBANDS=maximum number of plane-waves). In my case it is +17000 plane-waves, so unfeasible.

You can have a look at the files in the archive I have attached. Maybe you can have some ideas.

Thank you,
Best regards,
Pascal
You do not have the required permissions to view the files attached to this post.

michael_wolloch
Global Moderator
Global Moderator
Posts: 64
Joined: Tue Oct 17, 2023 10:17 am

Re: GW0: problems with CALCULATE_XI_REAL and memory insufficiency

#4 Post by michael_wolloch » Mon May 06, 2024 1:42 pm

Dear Pascal,

I am a bit confused by your attached files and the information you provide in your post. In all your INCAR files, NBANDS is not set, corresponding to the single step GW procedure. However, you mention problems with reading the WAVEDER file, which would point to a traditional multi-step approach. What are you trying to do? If you are doing a DFT calculation first, you can set the parameter NBANDS there, and also in the GW step. Of course, the KPOINT file, which you did not add to your provided archive must also be the same.

From your OUTCAR-4826388 it seems that you are running the single step GW procedure, which indeed gives you 17000+ orbitals.

So maybe it is enough to simply redo the DFT step(s) of your calculation with an adequate number of bands (e.g. 512 or 1024, since less than 100 should be occupied for your system), recalculate WAVEDER and WAVECAR and copy them over, and then set the same number for NBANDS in your GW0 INCAR.

In the calculation on 16 nodes with 16 MPI ranks per node (OUTCAR-4827224) you get further along than in the others, but the memory is still insufficient since you would need about 19 GBs per rank, or ~300GB total per node:

Code: Select all

 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|     This job will probably crash, due to insufficient memory available.     |
|     Available memory per mpi rank: 4271 MB, required memory: 18814 MB.      |
|     Reducing NTAUPAR or using more computing nodes might solve this         |
|     problem.                                                                |
|                                                                             |
 -----------------------------------------------------------------------------
You said that you have 256 GB available per node, but the information in the warning above would indicate more like ~70GB (16 ranks times 4271 MB per rank). Is it possible that something else is running on the node, or that you somehow limit the amount of accessible memory? Maybe you are not using all available CPU sockets? In your "Janus_GW0_256-4827224.o" file, there is a line in the execution sum-up:

Code: Select all

Limits    : time = 1-00:00:00 , memory/job = 1775 Mo
Could this mean that your slurm configuration limits the memory per job in some way? Please talk to your cluster administration if that is the case.

you could also fall back to the quartic scaling if you want to keep the single-step procedure, since that results in significantly lower memory usage. This will probably still not be enough, however, if your jobs cannot utilize the full memory of your node.

I hope that helps, please report back if you get this to work or if you have gathered more information,
Cheers, Michael

Post Reply