Not enough memory: Difference between revisions

From VASP Wiki
(Created page with "First of all, the memory requirements of the serial version can be estimated using the ''makeparam'' utility (see {{TAG|Memory requirements}}). At present, there is however no...")
 
No edit summary
 
(11 intermediate revisions by 3 users not shown)
Line 1: Line 1:
First of all, the memory requirements of the serial version
Nowadays, for standard DFT and hybrid functional calculations, memory is usually not an issues.
can be estimated using the ''makeparam'' utility (see
Furthermore, by increasing the number of cores, the memory requirements per core can be reduced significantly.
{{TAG|Memory requirements}}). At present, there is however no
way to estimate the memory requirements of
the parallel version.


In fact, it might be difficult to run huge jobs on "thin" T3E or
SP2 nodes. Most tables (pseudopotentials etc.) and the executable
must be held on all nodes (10-20 Mbytes).
In addition one complex array of the size
<math>N_{\rm bands} \times N_{\rm bands}</math> is allocated on each node;
during dynamic simulation even up to three such arrays are allocated.
Upon reading and writing the charge density, a complex
array that can hold all data points of the charge density
is allocated 8*{{TAG|NGXF}}*{{TAG|NGYF}}*{{TAG|NGZF}}). Finally, three such arrays
are allocated (and deallocated) during the charge density symmetrisation
(the charge density symmetrisation takes usually the hugest amount
of memory.)
All other data are distributed among all nodes.


The following things can be tried to reduce the memory
If memory shortage is encountered, the following steps  can be taken in order to reduce the memory requirements per core.
requirements on each node.
*For large and many-atom systems, it is advised to increase {{TAG|NCORE}} to larger values (say to 4, 8 potentially to or even beyond the number of cores per node). This allows to decrease the memory requirements per core for the storage of the non-local projectors. Furthermore, real space projection, {{TAG|LREAL}}= A, also decreases the required memory per core.
*Possibly the executable becomes smaller if the options ''-G1'' (T3E) and ''-g'' are removed from the lines ''OFLAG'' and ''DEBUG'' in the makefile.
*{{TAG|KPAR}} allows to distribute the k-points over cores. Unfortunately, only the calculations are distributed, but the storage of the orbitals is not distributed over cores. This means that using {{TAG|KPAR}}=1 results in the smallest memory footprint per core (but slower calculations, since VASP needs to rely on other less efficient parallelization strategies).
*Switch of symmetrisation ({\tt ISYM}=0). Symmetrisation is done locally on each node requiring three huge arrays. VASP.4.4.2 (and newer versions) have a switch to run a more memory conserving symmetrization. This can be selected by specifying  {{TAG|ISYM}}=2. Results might however differ somewhat from  {{TAG|ISYM}}=1 (usually only 1/100th of an meV). Also avoid writing or reading the {{TAG|CHGCAR}} file ({{TAG|LCHARG}}=''.FALSE.'').
*Switch of symmetrisation ({{TAG|ISYM}}=0). Charge symmetrisation is done locally on each node requiring three fairly large arrays. VASP.4.4.2 (and newer versions) posses a switch to run a more memory conserving symmetrization. From VASP.5 onwards, the memory conserving version, {{TAG|ISYM}}=2, is the default. Results might differ slightly from  {{TAG|ISYM}}=1 (usually by about 1E-5 eV).  
*Use {{TAG|NPAR}}=1.
* Make sure to use scaLAPACK if your system becomes large. If scaLAPACK is not available, VASP needs to store an {{TAG|NBANDS}} x {{TAG|NBANDS}} matrix on each core, in order to diagonalize the Hamiltonian in the subspace of the calculated orbitals. If scaLAPACK is compiled in and used, the matrix is distributed over all cores jointly handling one k-point. Note that decreasing {{TAG|KPAR}} reduces the memory demand for this matrix (if scaLAPACK is used).
 
A final hint is in place. At some key places, the VASP code reports the required memory per core in the {{TAG|OUTCAR}} file. Please search the lines
 
total amount of memory used by VASP MPI-rank0  457796. kBytes
=======================================================================
  base      :      30000. kBytes
  nonlr-proj:      12085. kBytes
  fftplans  :      29652. kBytes
  grid      :      54584. kBytes
  one-center:        211. kBytes
  wavefun  :    331264. kBytes
 
and inspect how much memory VASP uses per core. "base" is the estimated memory use for the executable and libraries, "nonlr-proj" the required memory for the non-local projection operators, "grid" that for 3D arrays representing the charge density, potentials, etc., and "wavefun" the requirements for the one-electron wavefunctions (orbitals). Storage of the orbitals is usually most memory demanding.


It should be mentioned that VASP relies heavily on dynamic memory
allocation (''ALLOCATE'' and ''DEALLOCATE''). As far as we know there
is no memory leakage (ALLOCATE without DEALLOCATE), however unfortunately
it is impossible to be entirely sure that no leakage exists. It should be mentioned
that some users have observed that the code is growing during
dynamic simulations on the T3E.
This is however most likely due to a ``problematic''
dynamic memory management of the f90 runtime system and not due to
programming error in VASP. Unfortunately the
dynamic memory subsystems of most f90 compilers are still
rather inefficient. As a result it might happen, that
the memory becomes more and more fragmented during the run, so that large pieces
of memory can not be allocated. We can only hope for
improvements in the dynamic memory management (for instance
the introduction of garbage collectors).


----
----
[[Category:Performance]][[Category:Howto]]
[[Category:Performance]][[Category:Howto]][[Category:Memory]]

Latest revision as of 09:10, 25 May 2022

Nowadays, for standard DFT and hybrid functional calculations, memory is usually not an issues. Furthermore, by increasing the number of cores, the memory requirements per core can be reduced significantly.


If memory shortage is encountered, the following steps can be taken in order to reduce the memory requirements per core.

  • For large and many-atom systems, it is advised to increase NCORE to larger values (say to 4, 8 potentially to or even beyond the number of cores per node). This allows to decrease the memory requirements per core for the storage of the non-local projectors. Furthermore, real space projection, LREAL= A, also decreases the required memory per core.
  • KPAR allows to distribute the k-points over cores. Unfortunately, only the calculations are distributed, but the storage of the orbitals is not distributed over cores. This means that using KPAR=1 results in the smallest memory footprint per core (but slower calculations, since VASP needs to rely on other less efficient parallelization strategies).
  • Switch of symmetrisation (ISYM=0). Charge symmetrisation is done locally on each node requiring three fairly large arrays. VASP.4.4.2 (and newer versions) posses a switch to run a more memory conserving symmetrization. From VASP.5 onwards, the memory conserving version, ISYM=2, is the default. Results might differ slightly from ISYM=1 (usually by about 1E-5 eV).
  • Make sure to use scaLAPACK if your system becomes large. If scaLAPACK is not available, VASP needs to store an NBANDS x NBANDS matrix on each core, in order to diagonalize the Hamiltonian in the subspace of the calculated orbitals. If scaLAPACK is compiled in and used, the matrix is distributed over all cores jointly handling one k-point. Note that decreasing KPAR reduces the memory demand for this matrix (if scaLAPACK is used).

A final hint is in place. At some key places, the VASP code reports the required memory per core in the OUTCAR file. Please search the lines

total amount of memory used by VASP MPI-rank0   457796. kBytes
=======================================================================
  base      :      30000. kBytes
  nonlr-proj:      12085. kBytes
  fftplans  :      29652. kBytes
  grid      :      54584. kBytes
  one-center:        211. kBytes
  wavefun   :     331264. kBytes

and inspect how much memory VASP uses per core. "base" is the estimated memory use for the executable and libraries, "nonlr-proj" the required memory for the non-local projection operators, "grid" that for 3D arrays representing the charge density, potentials, etc., and "wavefun" the requirements for the one-electron wavefunctions (orbitals). Storage of the orbitals is usually most memory demanding.