ML MB: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
 
(16 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{TAGDEF|ML_FF_MB_MB|[integer]|1000}}
{{DISPLAYTITLE:ML_MB}}
{{TAGDEF|ML_MB|[integer]|see below}}


Description: This flag sets the maximum number of local configurations (basis sets) describing the many-body interactions in the machine learning force field method.
Description: This tag sets the maximum number of local reference configurations (i.e. basis functions in the kernel) in the machine learning force field method.
----
----
The default value is usually a safe value but should be set to a higher value as soon as it is reached. When this happens the code stops and gives an error instructing to increase {{TAG|ML_FF_MB_MB}}. In all cases the previous learning calculation should be repeated. By using {{TAG|ML_FF_LBASIS_DISCARD}}=''.FALSE.'' (default setting for this tag) it can be easily detected if {{TAG|ML_FF_MB_MB}} is reached, since the learning would simply stop.
The defaults for {{TAG|ML_MB}} are different for each different {{TAG|ML_MODE}} setting. Here are the defaults for each mode:
*{{TAGO|ML_MODE|train}}:
**No {{TAG|ML_AB}} present (learning from scratch): <code>min(1500 , max({{TAG|NSW}} , 2*{{TAG|ML_MCONF_NEW}} * MAXAT_SP)</code>
**{{TAG|ML_AB}} present (continuation of learning): <code>MB_AB + min(1500, max({{TAG|NSW}} , 2*{{TAG|ML_MCONF_NEW}} * MAXAT_SP)</code>
*{{TAGO|ML_MODE|select}}: <code>MB_AB + {{TAG|ML_MCONF_NEW}} * MAXAT_SP</code>
*{{TAGO|ML_MODE|refit}}: <code>MB_AB + MAXAT_SP</code>
*{{TAGO|ML_MODE|refitbayesian}}: <code>MB_AB + MAXAT_SP</code>
*{{TAGO|ML_MODE|run}}: <code>MB_AB</code>
using the following definitions:
*<code>MAXAT_SP</code> = greatest number of atoms within all species among the current structures and the structures in the {{TAG|ML_AB}} file (if present).
*<code>MB_AB</code> = greatest number of local reference configurations within all species in the {{TAG|ML_AB}} file.


== Related Tags and Sections ==
The default value for {{TAGO|ML_MODE|train}} and {{TAGO|ML_MODE|select}} is a relatively safe value for most materials. However one might need to increase it to a greater value for liquids, polymers and amorphous systems, or when a MLFF for many different polytypes is trained. By default ({{TAGO|ML_LBASIS_DISCARD|.TRUE.}}), if the number of local reference configurations would exceed {{TAG|ML_MB}}, VASP would continue the calculation and disposes of these, but one should make extensive tests whether the generated MLFF is sufficiently accurate.
{{TAG|ML_FF_LMLFF}}, {{TAG|ML_FF_LBASIS_DISCARD}}, {{TAG|ML_FF_MCONF}}, {{TAG|ML_FF_LCONF_DISCARD}}


{{sc|ML_FF_MB_MB|Examples|Examples that use this tag}}
If VASP stops, subsequent training can be restarted from the existing ML_AB file. This avoids loss of already acquired training data.
 
Depending on the calculation mode VASP adds a little overhead in the allocation of {{TAG|ML_MB}} arrays to buffer for new candidate structures. Here are the buffer sizes for each mode:
*{{TAGO|ML_MODE|train}}:
**No {{TAG|ML_AB}} present (learning from scratch): <code>min({{TAG|NSW}} , {{TAG|ML_MCONF_NEW}}) * MAXAT_SP</code>
**{{TAG|ML_AB}} present (continuation of learning): <code>min({{TAG|NSW}} , {{TAG|ML_MCONF_NEW}}) * MAXAT_SP</code>
*{{TAGO|ML_MODE|select}}: <code>{{TAG|ML_MCONF_NEW}} * MAXAT_SP</code>
*{{TAGO|ML_MODE|refit}}: <code>MAXAT_SP</code>
*{{TAGO|ML_MODE|refitbayesian}}: <code>MAXAT_SP</code>
*{{TAGO|ML_MODE|run}}: <code>0</code>
 
The flag {{TAG|ML_MB}} also determines the maximum number of rows of the design matrix.  This is usually a huge matrix. The design matrix is allocated statically at the beginning of the run, since several parts of the code use MPI shared memory and dynamic reallocation of these arrays can cause issues on many operating systems. An estimate of the size of the design matrix and all other large arrays is printed out to the {{TAG|ML_LOGFILE}} before allocation. The design matrix is fully distributed in a block-cyclic fashion for scaLAPACK, and thus the memory requirement for each core scales inversely proportional to the number of used cores.
 
== Related tags and articles ==
{{TAG|ML_LMLFF}}, {{TAG|ML_MODE}}, {{TAG|ML_MCONF_NEW}}, {{TAG|ML_MCONF}}, {{TAG| ML_LBASIS_DISCARD}}
 
{{sc|ML_MB|Examples|Examples that use this tag}}
----
----


[[Category:INCAR]][[Category:Machine Learning]][[Category:Machine Learned Force Fields]][[Category:VASP6]]
[[Category:INCAR tag]][[Category:Machine-learned force fields]]

Latest revision as of 17:11, 15 February 2024

ML_MB = [integer]
Default: ML_MB = see below 

Description: This tag sets the maximum number of local reference configurations (i.e. basis functions in the kernel) in the machine learning force field method.


The defaults for ML_MB are different for each different ML_MODE setting. Here are the defaults for each mode:

using the following definitions:

  • MAXAT_SP = greatest number of atoms within all species among the current structures and the structures in the ML_AB file (if present).
  • MB_AB = greatest number of local reference configurations within all species in the ML_AB file.

The default value for ML_MODE = train and ML_MODE = select is a relatively safe value for most materials. However one might need to increase it to a greater value for liquids, polymers and amorphous systems, or when a MLFF for many different polytypes is trained. By default (ML_LBASIS_DISCARD = .TRUE.), if the number of local reference configurations would exceed ML_MB, VASP would continue the calculation and disposes of these, but one should make extensive tests whether the generated MLFF is sufficiently accurate.

If VASP stops, subsequent training can be restarted from the existing ML_AB file. This avoids loss of already acquired training data.

Depending on the calculation mode VASP adds a little overhead in the allocation of ML_MB arrays to buffer for new candidate structures. Here are the buffer sizes for each mode:

The flag ML_MB also determines the maximum number of rows of the design matrix. This is usually a huge matrix. The design matrix is allocated statically at the beginning of the run, since several parts of the code use MPI shared memory and dynamic reallocation of these arrays can cause issues on many operating systems. An estimate of the size of the design matrix and all other large arrays is printed out to the ML_LOGFILE before allocation. The design matrix is fully distributed in a block-cyclic fashion for scaLAPACK, and thus the memory requirement for each core scales inversely proportional to the number of used cores.

Related tags and articles

ML_LMLFF, ML_MODE, ML_MCONF_NEW, ML_MCONF, ML_LBASIS_DISCARD

Examples that use this tag