Requests for technical support from the VASP group should be posted in the VASP-forum.

Difference between revisions of "Machine learning force field calculations: Basics"

From Vaspwiki
Jump to navigationJump to search
Line 45: Line 45:
 
== Converging a MLFF calculation ==
 
== Converging a MLFF calculation ==
 
*Change possibly the most important parameters of the machine learning force field:  
 
*Change possibly the most important parameters of the machine learning force field:  
{{TAGBL|ML_FF_MB_MB}} = 1000
 
{{TAGBL|ML_FF_MCONF}} = 1000
 
 
  {{TAGBL|ML_FF_CSF}} = 0.02
 
  {{TAGBL|ML_FF_CSF}} = 0.02
 
  {{TAGBL|ML_FF_CTIFOR}} = 0.0
 
  {{TAGBL|ML_FF_CTIFOR}} = 0.0
Line 55: Line 53:
 
== Caution: number of structures and basis functions ==
 
== Caution: number of structures and basis functions ==
  
The maximum number of structure datasets {{TAG|ML_FF_MCONF}} and basis functions {{TAG|ML_FF_MB_MB}} is a memory bottleneck of the calculation. Therefore you must not set these input variables to too larger numbers initially. If either the number of structure datasets or the size of basis set exceeds its respective maximum number the calculation automatically stops with an error message. Then one increases the appropriate number and [[#Continuing on-the-fly learning from already existing force-fields|restarts the calculation]]. For a detailed description what influences the size of the basis see [[#Other important input tags|below]].
+
The maximum number of structure datasets {{TAG|ML_FF_MCONF}} and basis functions {{TAG|ML_FF_MB_MB}} is a memory bottleneck of the calculation, because the required arrays are allocated statically at the beginning of the calculation. Therefore you must not set these input variables to too larger numbers initially. If either the number of structure datasets or the size of basis set exceeds its respective maximum number the calculation automatically stops with an error message. Then one increases the appropriate number and [[#Continuing on-the-fly learning from already existing force-fields|restarts the calculation]]. For a detailed description what influences the size of the basis see [[#Other important input tags|below]]. Note, that you can switch off the default behavior of the code with {{TAG|ML_FF_LBASIS_DISCARD}}=''.FALSE.'' and {{TAG|ML_FF_LCONF_DISCARD}}=''.FALSE.'', respectively. In this case the calculation will discard the excess basis functions or structure datasets.
  
 
== Other important input tags ==
 
== Other important input tags ==
 
In this section, we describe other important input tags for standard machine learning force field calculations. Typically a user does not need to tweak the default values.
 
In this section, we describe other important input tags for standard machine learning force field calculations. Typically a user does not need to tweak the default values.
  
 +
=== New structure dataset block ===
 +
;{{TAG|ML_FF_MCONF_NEW}}
 +
This tag specifies the number of structure datasets that are stored temporally as candidates for the training data. The purpose of this is to block operations for expensive calculations that would be otherwise sequentially executed. This way a faster performance is obtained at the cost of a small memory overhead. The value of {{TAG|ML_FF_MCONF_NEW}}=5 was optimized empirically, but for different systems other choices might be more performant.
  
=== Important parameters cotrolling basis sets and training sets ===
+
=== Radial basis set ===
*{{TAG|ML_FF_MB_MB}}: This tag controls the maximum number of basis sets. The default value of 1000 was tested to be sufficient in many cases, but several cases need higher values, such as e.g. MgO. If the maximum number of basis sets is reached in a calculation the calculation is stopped by default with a warning message ({{TAG|ML_FF_LBASIS_DISCARD}}=''.FALSE.''). If that appears the calculation has to be restarted with a higher value for {{TAG|ML_FF_MB_MB}}. '''Mind''': If {{TAG|ML_FF_MB_MB}} is set to a too high value memory problems can occur, since the necessary arrays for this tag are allocated statically at the beginning of the calculation.
+
;{{TAG|ML_FF_MRB1_MB}}, {{TAG|ML_FF_MRB2_MB}}: These tags set the number of radial basis sets used to expand the atomic distribution of the radial and angular density. These tags depend very sensitively on the cut-off radius of the descriptor ({{TAG|ML_FF_RCUT1_MB}} and {{TAG|ML_FF_RCUT2_MB}}) and the width of the Gaussian functions used in the broadening of the atomic distributions ({{TAG|ML_FF_SION1_MB}} and {{TAG|ML_FF_SION2_MB}}). The error occuring due to the expansion of the radial basis functions is monitored in the {{TAG|ML_LOGFILE}} file by looking for the following line "''Error in radial expansion: ...''". A typical good value for the error threshold which was purely empirically determined (by us and in reference {{cite|szlachta:prb:2014}}) is <math>\pm 0.02</math>. So the number of basis functions should be adjusted until the error written in the {{TAG|ML_LOGFILE}} is smaller than this value. A more detailed description of the basis sets is given in appendix A of reference {{cite|jinnouchi2:arx:2019}}.
*{{TAG|ML_FF_MCONF}}: This flag sets the maximum number of configurations used for training. The behaviour of this tag is very similar to {{TAG|ML_FF_MB_MB}} and should be also monitored. By default ({{TAG|ML_FF_LCONF_DISCARD}}=''.FALSE.'') the calculation is stopped and gives an error message if the number of configurations reaches the value set in the input. '''Mind''': If {{TAG|ML_FF_MCONF}} is set to a too high value memory problems can occur, since the necessary arrays for this tag are set statically at the beginning of the calculation.
+
;{{TAG|ML_FF_SION1_MB}},{{TAG|ML_FF_SION2_MB}}:
*{{TAG|ML_FF_MCONF_NEW}}: This tag specifies sets the number of configurations that are stored temporally as candidates for the training data. The purpose of this set is to block operations for expensive calculational steps that would be otherwise sequentially calculated. This way a faster performance is obtained at the cost of a small memory overhead. The value of {{TAG|ML_FF_MCONF_NEW}}=5 was fully empirically obtained and is not carved into stone.
+
<!-- TODO -->
*{{TAG|ML_FF_RCUT1_MB}} and {{TAG|ML_FF_RCUT2_MB}}:.
+
 
*Radial basis set:
+
=== Angular momentum quantum numbers ===
**{{TAG|ML_FF_MRB1_MB}}, {{TAG|ML_FF_MRB2_MB}}: These tags set the number of radial basis sets used to expand the atomic distribution of the radial and angular density. These tags depend very sensitively on the cut-off radius of the descriptor ({{TAG|ML_FF_RCUT1_MB}} and {{TAG|ML_FF_RCUT2_MB}}) and the width of the Gaussian functions used in the broadening of the atomic distributions ({{TAG|ML_FF_SION1_MB}} and {{TAG|ML_FF_SION2_MB}}). The error occuring due to the expansion of the radial basis functions is monitored in the {{TAG|ML_LOGFILE}} file by looking for the following line "''Error in radial expansion: ...''". A typical good value for the error threshold which was purely empirically determined (by us and in reference {{cite|szlachta:prb:2014}}) is <math>\pm 0.02</math>. So the number of basis functions should be adjusted until the error written in the {{TAG|ML_LOGFILE}} is smaller than this value. A more detailed description of the basis sets is given in appendix A of reference {{cite|jinnouchi2:arx:2019}}.
+
;{{TAG|ML_FF_LMAX2_MB}}: This tag specifies the maximum angular momentum quantum number of spherical harmonics used to expand atomic distributions.
**{{TAG|ML_FF_SION1_MB}},{{TAG|ML_FF_SION2_MB}}:.
+
;{{TAG|ML_FF_LAFILT2_MB}}: This tag specifies whether angular momentum filtering is active or not. By activating the angular filtering ({{TAG|ML_FF_LAFILT2_MB}}=''.TRUE.'' and using the filtering function from reference {{cite|boyd:book:2000}} the computation can be noticably speeded up without loosing too much accuracy. Also by using the the angular filtering the maximum angular momentum number cut-off {{TAG|ML_FF_LMAX2_MB}}=6 can be lowered to a value of 4 again gaining computational speed. The user is still advised to check the accuracy of the angular filtering for his application.
*Angular momentum quantum numbers:
+
;{{TAG|ML_FF_IAFILT2_MB}}: This tag selects the type of angular filtering. We advise to use the default ({{TAG|ML_FF_IAFILT2_MB}}=2).
**{{TAG|ML_FF_LMAX2_MB}}: This tag specifies the maximum angular momentum quantum number of spherical harmonics used to expand atomic distributions.
+
;{{TAG|ML_FF_AFILT2_MB}}: This parameter sets the filtering parameter of the filtering function from reference {{cite|boyd:book:2000}}. The default of {{TAG|ML_FF_AFILT2_MB}}=0.002 worked well in most tested applications, but we advise the user to check this parameter for his application.
**{{TAG|ML_FF_LAFILT2_MB}}: This tag specifies whether angular momentum filtering is active or not. By activating the angular filtering ({{TAG|ML_FF_LAFILT2_MB}}=''.TRUE.'' and using the filtering function from reference {{cite|boyd:book:2000}} the computation can be noticably speeded up without loosing too much accuracy. Also by using the the angular filtering the maximum angular momentum number cut-off {{TAG|ML_FF_LMAX2_MB}}=6 can be lowered to a value of 4 again gaining computational speed. The user is still advised to check the accuracy of the angular filtering for his application.
 
**{{TAG|ML_FF_IAFILT2_MB}}: This tag selects the type of angular filtering. We advise to use the default ({{TAG|ML_FF_IAFILT2_MB}}=2).
 
**{{TAG|ML_FF_AFILT2_MB}}: This parameter sets the filtering parameter of the filtering function from reference {{cite|boyd:book:2000}}. The default of {{TAG|ML_FF_AFILT2_MB}}=0.002 worked well in most tested applications, but we advise the user to check this parameter for his application.
 
  
 
== Example input for liquid Si ==
 
== Example input for liquid Si ==
  
  {{TAGBL|SYSTEM = Si_lquid
+
  {{TAGBL|SYSTEM}} = Si_lquid
 
  ### Electronic structure part
 
  ### Electronic structure part
 
  {{TAGBL|PREC}} = FAST
 
  {{TAGBL|PREC}} = FAST

Revision as of 16:13, 21 August 2019

In general to perform a machine-learning force field calculation, you need to set

ML_FF_LMLFF = .TRUE.

in the INCAR file. Then depending on the particular calculation, you need to set the values of further INCAR tags. In the first few sections, we list the tags that a user may typically encounter. Most of the other input are using defaults and should be only changed by experienced users in cases where the changes are essential.

Type of machine learning calculation

In this section, we describe the three modes in which machine learning calculations can be run in VASP and show exemplary INCAR settings. A typical example showing these modes in action is the machine-learning of a force field for a material with two phases A and B. Initially, we have no force field of the material, so we choose a small to medium sized supercell of phase A to generate a new force field from scratch. In this step, ab initio calculations are performed whenever necessary improving the force field on this phase until it is sufficiently accurate. When we compare to phase B, the force field learned on phase A might contain useful information about the local configurations. We would run a continuation run and the machine will automatically collect the necessary structure datasets from phase B to refine the force field. In many cases, only few such structure datasets are required, but it is still necessary to verify this for every case. After the force field is sufficiently learned, we can use it to describe much larger cell sizes. Hence, we switch off learning on larger cells and use only the force field which is orders of magnitudes faster than the ab initio calculation. If the sampled atomic environments are similar to the structure datasets used for learning, the force field is transferable for the same constituting elements, but it should be still cautiously judged whether the force field can describe rare events in the larger cell.

On-the-fly force field generation from scratch

To generate a new force field, we don't need any special input files. First, we set up a molecular dynamics calculation as usual (see Molecular Dynamics) adding the machine learning related ones to the INCAR file. To start from scratch add

ML_FF_ISTART = 0

Running the calculation will result in generating the main output files ML_LOGFILE, ML_ABNCAR and ML_FFNCAR files. The latter two are required for restarting from an existing force field.

Continuing on-the-fly learning from already existing force-fields

To continue from an previous run, copy the following files

cp ML_ABNCAR ML_ABCAR
cp CONTCAR POSCAR

The file ML_ABCAR contains the ab initio reference data. You can also start from a new POSCAR file. To proceed learning and obtain an improved force field set

ML_FF_ISTART = 1

in the INCAR file.

Force field calculations without learning

Once you have a sufficiently accurate force field, you can use it to predict properties. Copy the structures and the force field information

cp ML_ABNCAR ML_ABCAR
cp ML_FFNCAR ML_FFCAR
cp CONTCAR POSCAR

The file {TAG|ML_FFNCAR}} holds the force field parameters. You can also use different POSCAR files, e.g., a larger supercell. In the INCAR file, select only force field only calculations by setting

ML_FF_ISTART = 2

Reference total energies

To obtain the force field, we need a reference total energy. For ML_FF_ISCALE_TOTEN_MB=2 this reference energy is set to the average of the total energy of the training data. However, for more accurate calculations reference atomic calculations should be performed (see Calculation of atoms). You can then specify that you want to use the atomic energy and give reference energies for all atoms by setting the following variables in the INCAR file

ML_FF_ISCALE_TOTEN_MB=1
ML_FF_EATOM = E_at1 E_at2 ...

If the tag ML_FF_EATOM is not specified default values of 0.0 eV/atom are assumed. Although sufficiently accurate force fields can be obtained in most tested calculations, we have not tested this feature enough and hence would suggest to calculate the energies of isolated atoms before.

Converging a MLFF calculation

  • Change possibly the most important parameters of the machine learning force field:
ML_FF_CSF = 0.02
ML_FF_CTIFOR = 0.0
ML_FF_SION2_MB = 0.50
ML_FF_MRB2_MB = 9
ML_FF_RCUT2_MB = 5.0

Caution: number of structures and basis functions

The maximum number of structure datasets ML_FF_MCONF and basis functions ML_FF_MB_MB is a memory bottleneck of the calculation, because the required arrays are allocated statically at the beginning of the calculation. Therefore you must not set these input variables to too larger numbers initially. If either the number of structure datasets or the size of basis set exceeds its respective maximum number the calculation automatically stops with an error message. Then one increases the appropriate number and restarts the calculation. For a detailed description what influences the size of the basis see below. Note, that you can switch off the default behavior of the code with ML_FF_LBASIS_DISCARD=.FALSE. and ML_FF_LCONF_DISCARD=.FALSE., respectively. In this case the calculation will discard the excess basis functions or structure datasets.

Other important input tags

In this section, we describe other important input tags for standard machine learning force field calculations. Typically a user does not need to tweak the default values.

New structure dataset block

ML_FF_MCONF_NEW

This tag specifies the number of structure datasets that are stored temporally as candidates for the training data. The purpose of this is to block operations for expensive calculations that would be otherwise sequentially executed. This way a faster performance is obtained at the cost of a small memory overhead. The value of ML_FF_MCONF_NEW=5 was optimized empirically, but for different systems other choices might be more performant.

Radial basis set

ML_FF_MRB1_MB, ML_FF_MRB2_MB
These tags set the number of radial basis sets used to expand the atomic distribution of the radial and angular density. These tags depend very sensitively on the cut-off radius of the descriptor (ML_FF_RCUT1_MB and ML_FF_RCUT2_MB) and the width of the Gaussian functions used in the broadening of the atomic distributions (ML_FF_SION1_MB and ML_FF_SION2_MB). The error occuring due to the expansion of the radial basis functions is monitored in the ML_LOGFILE file by looking for the following line "Error in radial expansion: ...". A typical good value for the error threshold which was purely empirically determined (by us and in reference [1]) is . So the number of basis functions should be adjusted until the error written in the ML_LOGFILE is smaller than this value. A more detailed description of the basis sets is given in appendix A of reference [2].
ML_FF_SION1_MB,ML_FF_SION2_MB

Angular momentum quantum numbers

ML_FF_LMAX2_MB
This tag specifies the maximum angular momentum quantum number of spherical harmonics used to expand atomic distributions.
ML_FF_LAFILT2_MB
This tag specifies whether angular momentum filtering is active or not. By activating the angular filtering (ML_FF_LAFILT2_MB=.TRUE. and using the filtering function from reference [3] the computation can be noticably speeded up without loosing too much accuracy. Also by using the the angular filtering the maximum angular momentum number cut-off ML_FF_LMAX2_MB=6 can be lowered to a value of 4 again gaining computational speed. The user is still advised to check the accuracy of the angular filtering for his application.
ML_FF_IAFILT2_MB
This tag selects the type of angular filtering. We advise to use the default (ML_FF_IAFILT2_MB=2).
ML_FF_AFILT2_MB
This parameter sets the filtering parameter of the filtering function from reference [3]. The default of ML_FF_AFILT2_MB=0.002 worked well in most tested applications, but we advise the user to check this parameter for his application.

Example input for liquid Si

SYSTEM = Si_lquid
### Electronic structure part
PREC = FAST
ALGO = FAST
SIGMA = 0.1
ISPIN = 1
ISMEAR = 0
ENCUT = 325
NELM = 100
EDIFF = 1E-4
NELMIN = 6
LREAL = A
ISYM = -1

### MD part
IBRION = 0
ISIF = 2
NSW = 30000
POTIM = 1.0

### Output part
NBLOCK = 1
NWRITE = 1
INIWAV = 1
IWAVPR = 1
ISTART = 0
LWAVE = .FALSE.
LCHARG = .FALSE.

### Machine Learning part
### Major tags for machine learning
ML_FF_LMLFF = .TRUE.
ML_FF_ISTART = 0
ML_FF_EATOM = -0.7859510000
ML_FF_MB_MB = 1000
ML_FF_MCONF = 1000
ML_FF_MCONF_NEW = 5
ML_FF_CSF = 0.02
ML_FF_CTIFOR = 0.0
ML_FF_CSIG = 2E-1
ML_FF_WTOTEN = 1.0
ML_FF_WTIFOR = 1.0
ML_FF_WTSIF =  1.0

### Descriptor related tags for machine learning
ML_FF_W1_MB = 0D-1
ML_FF_W2_MB = 10D-1
ML_FF_LNORM1_MB = .FALSE.
ML_FF_LNORM2_MB = .TRUE.
ML_FF_RCUT1_MB = 6.0
ML_FF_RCUT2_MB = 5.0
ML_FF_MRB1_MB = 6
ML_FF_MRB2_MB = 9
ML_FF_SION1_MB = 0.50
ML_FF_SION2_MB = 0.50
ML_FF_NHYP1_MB = 1
ML_FF_NHYP2_MB = 4
ML_FF_LMAX2_MB = 4
ML_FF_LAFILT2_MB = .TRUE.
ML_FF_IAFILT2_MB = 2
ML_FF_AFILT2_MB = 2D-3

### Lesser important tags for machine learning
ML_FF_MSPL1_MB = 100
ML_FF_MSPL2_MB = 100
ML_FF_NR1_MB = 100
ML_FF_NR2_MB = 100
ML_FF_NWRITE = 2
ML_FF_NDIM_SCALAPACK = 2
ML_FF_ISAMPLE = 3
ML_FF_IERR = 3
ML_FF_LMLMB = .TRUE.
ML_FF_CDOUB = 2.0
ML_FF_LCRITERIA = .TRUE.
ML_FF_CSLOPE = 1E-1
ML_FF_NMDINT = 10
ML_FF_MHIS = 10
ML_FF_IWEIGHT = 3
ML_FF_LEATOM_MB = .FALSE.
ML_FF_LHEAT_MB = .FALSE.
ML_FF_ISOAP1_MB = 1
ML_FF_ISOAP2_MB = 1
ML_FF_ICUT1_MB = 1
ML_FF_ICUT2_MB = 1
ML_FF_IBROAD1_MB = 2
ML_FF_IBROAD2_MB = 2
ML_FF_IREG_MB = 2
ML_FF_SIGV0_MB = 1.0
ML_FF_SIGW0_MB = 1.0

References