Requests for technical support from the VASP group should be posted in the VASP-forum.

Difference between revisions of "Machine learning force field calculations: Basics"

From Vaspwiki
Jump to navigationJump to search
Line 1: Line 1:
 
== How to run machine learning force field calculations ==
 
== How to run machine learning force field calculations ==
In this section we describe how to run calculations using machine learning force field method in VASP. Many of the tags will be covered in more detail below.
+
In this section, we describe the three modes in which machine learning calculations can be run in VASP and show exemplary INCAR settings. Many of the tags will be covered in more detail [[#Major input tags|below]]. In all modes you will set
 +
{{TAGBL|ML_FF_LMLFF}} = .TRUE.
 +
to switch on the force field calculation.
 +
A typical example showing these modes in action is the machine-learning of a force field for a material with two phases A and B.
 +
Initially, we have no force field of the material, so we choose a small to medium sized supercell of phase A to [[#On-the-fly force field generation from scratch|generate a new force field from scratch]]. In this step, ab initio calculations are performed whenever necessary improving the force field on this phase until it is sufficiently accurate.
 +
When we compare to phase B, the force field learned on phase A might contain useful information about the local configurations. [[#Continuing on-the-fly learning from already existing force-fields|We would run a continuation run]] and the machine will automatically collect the necessary structure datasets from phase B to refine the force field. In many cases, only few such structure datasets are required, but it is still necessary to verify this for every case.
 +
After the force field is sufficiently learned, we can use it to describe much larger cell sizes. Hence, we [[#Force field calculations without learning|switch off learning]] on larger cells and use only the force field which is orders of magnitudes faster than the ab initio calculation. If the sampled atomic environments are similar to the structure datasets used for learning, the force field is transferable for the same constituting elements, but it should be still cautiously judged whether the force field can describe rare events in the larger cell.
 +
 
 +
=== On-the-fly force field generation from scratch ===
  
There are three modes of how to run a machine learning calculation:
+
To generate a new force field, we don't need any special input files. First, we set up a molecular dynamics calculation as usual ([[:Category:Molecular Dynamics|see Molecular Dynamics]]) adding the machine learning related ones to the {{TAG|INCAR}} file. To start from scratch add
#On-the-fly force field generation from scratch. This step involves ab initio calculations.
+
{{TAGBL|ML_FF_ISTART}} = 0
#Continuing on-the-fly learning from already existing force-fields. This step involves ab initio calculations.
+
Running the calculation will result in generating the main output files {{TAG|ML_LOGFILE}}, {{TAG|ML_ABNCAR}} and {{TAG|ML_FFNCAR}} files. The latter two are required for restarting from an existing force field.
#Force field calculations without learning. This step involves no ab initio calculations.
 
  
A typical example where we want to learn a force field for multiple phases would consist of the following steps:
+
=== Continuing on-the-fly learning from already existing force-fields ===
*First we choose a small to medium sized supercell of phase A. Then we run on-the-fly force field calculations until the force field on the that phase is sufficiently accurate.
+
To continue from an previous run, copy the following files
*Next we would like to learn phase B. We would run a continuation run using the force field learned for phase A. The machine will automatically collect the necessary reference structures from phase B adding it to the force field. In many cases this will be only very few since the force field from phase A is already good for phase B, but this is very much case dependent.
+
cp {{TAGBL|ML_ABNCAR}} {{TAGBL|ML_ABCAR}}
*We would repeat the previous step for arbitrary phases.  
+
cp {{TAGBL|CONTCAR}} {{TAGBL|POSCAR}}
*After the force field is sufficiently learned we would apply it to our problem but probably to much larger cell sizes. Hence we want to switch off learning on larger cells and use only the force field which is orders of magnitudes faster than the ab initio calculations. The force field is in many cases transferable for the same constituting elements and if the phases used provide mostly sampled atomic environments. But it should be still cautiously judged whether the force field can describe rare events in the larger cell or not.
+
The file {{TAG|ML_ABCAR}} contains the ab initio reference data. You can also start from a new {{TAG|POSCAR}} file.
 +
To proceed learning and obtain an improved force field set
 +
{{TAGBL|ML_FF_ISTART}} = 1
 +
in the {{TAG|INCAR}} file.
 +
*Run calculation.
  
=== 1) On-the-fly force field generation from scratch ===
+
=== Force field calculations without learning ===
 +
Once you have a sufficiently accurate force field, you can use it to predict properties. Copy the structures and the force field information
 +
cp {{TAGBL|ML_ABNCAR}} {{TAGBL|ML_ABCAR}}
 +
cp {{TAGBL|ML_FFNCAR}} {{TAGBL|ML_FFCAR}}
 +
cp {{TAGBL|CONTCAR}} {{TAGBL|POSCAR}}
 +
The file {TAG|ML_FFNCAR}} holds the force field parameters. You can also use different {{TAG|POSCAR}} files, e.g., a larger supercell.
 +
In the {{TAG|INCAR}} file, select only force field only calculations by setting
 +
{{TAGBL|ML_FF_ISTART}} = 2
  
In this calculation we need no special input files. We first need to setup a molecular dynamics calculation as usual (for molecular dynamics calculations see [[:Category:Molecular Dynamics|Molecular Dynamics]]). All the important tags for machine learning are added to the {{TAG|INCAR}} file. For a detailed description of the tags see [[Machine learning force field calculations#Major input tags|below]].
 
  
Here are the steps one needs to do to set up an on-the-fly machine learning force field calculation from the scratch:
 
*Set up usual MD.
 
*Enable machine learning by adding the following line to the {{TAG|INCAR}} file:
 
{{TAGBL|ML_FF_LMLFF}} = .TRUE.
 
*Select learning for scratch by adding the following line to the {{TAG|INCAR}} file:
 
{{TAGBL|ML_FF_ISTART}} = 0
 
 
*Calculate atomic energy (see {{TAG|Calculation of atoms}}) of all constituting atom types and add as a list to the {{TAG|INCAR}} file (the order of the elements must coincide with the order of the elements in the {{TAG|POTCAR}} file):
 
*Calculate atomic energy (see {{TAG|Calculation of atoms}}) of all constituting atom types and add as a list to the {{TAG|INCAR}} file (the order of the elements must coincide with the order of the elements in the {{TAG|POTCAR}} file):
 
  {{TAGBL|ML_FF_EATOM}} = E_at1 E_at2 ...
 
  {{TAGBL|ML_FF_EATOM}} = E_at1 E_at2 ...
 
:This option is only used if {{TAG|ML_FF_ISCALE_TOTEN_MB}}=1 is used. For {{TAG|ML_FF_ISCALE_TOTEN_MB}}=2 the reference energy becomes the average of the total energy of the training data.
 
:This option is only used if {{TAG|ML_FF_ISCALE_TOTEN_MB}}=1 is used. For {{TAG|ML_FF_ISCALE_TOTEN_MB}}=2 the reference energy becomes the average of the total energy of the training data.
*Change possibly the most important parameters of the machine learning force field:
+
*Change possibly the most important parameters of the machine learning force field:  
 
  {{TAGBL|ML_FF_MB_MB}} = 1000
 
  {{TAGBL|ML_FF_MB_MB}} = 1000
 
  {{TAGBL|ML_FF_MCONF}} = 1000
 
  {{TAGBL|ML_FF_MCONF}} = 1000
Line 34: Line 45:
 
  {{TAGBL|ML_FF_MRB2_MB}} = 9
 
  {{TAGBL|ML_FF_MRB2_MB}} = 9
 
  {{TAGBL|ML_FF_RCUT2_MB}} = 5.0
 
  {{TAGBL|ML_FF_RCUT2_MB}} = 5.0
*Run force field generation and monitor learning. The main output files for the force field generation are the {{TAG|ML_LOGFILE}}, {{TAG|ML_ABNCAR}} and {{TAG|ML_FFNCAR}} files. If the number of reference structures exceeds the maximum number of allowed reference structures {{TAG|ML_FF_MCONF}} or the basis set exceeds the maximum number of allowed basis functions {{TAG|ML_FF_MB_MB}} the calculation automatically stops with an error message. THen one simply needs to increase the appropriate number and continue the calculation as described under 2). '''MIND''': {{TAG|ML_FF_MCONF}} and {{TAG|ML_FF_MB_MB}} shall not be set in advance to too large numbers, since they are the dimensions of the design matrix which is the memory bottle neck of the calculations. This matrix has to be allocated statically in advance of the calculations to make the algorithm efficient.
+
*Run force field generation and monitor learning. If the number of reference structures exceeds the maximum number of allowed reference structures {{TAG|ML_FF_MCONF}} or the basis set exceeds the maximum number of allowed basis functions {{TAG|ML_FF_MB_MB}} the calculation automatically stops with an error message. THen one simply needs to increase the appropriate number and continue the calculation as described under 2). '''MIND''': {{TAG|ML_FF_MCONF}} and {{TAG|ML_FF_MB_MB}} shall not be set in advance to too large numbers, since they are the dimensions of the design matrix which is the memory bottle neck of the calculations. This matrix has to be allocated statically in advance of the calculations to make the algorithm efficient.
 
+
For a detailed description of the tags see [[Machine learning force field calculations#Major input tags|below]].
=== 2) Continuing on-the-fly learning from already existing force-fields ===
 
Principally the calculation steps are the same as for learning from scratch.
 
 
 
Here are the steps that need to be carried out to continue learning on-the-fly from already existing force-fields:
 
*Copy {{TAG|ML_ABNCAR}} to {{TAG|ML_ABCAR}}. The {{TAG|ML_ABCAR}} file stores the ab initio reference data.
 
*Either copy {{TAG|CONTCAR}} to {{TAG|POSCAR}} if you want to continue to learn on the same phase or get new {{TAG|POSCAR}} if training is continued on another phase.
 
*Select continuation runs with learning by changing {{TAGBL|ML_FF_ISTART}} in the {{TAG|INCAR}} file to:
 
{{TAGBL|ML_FF_ISTART}} = 1
 
*Run calculation.
 
 
 
=== 3) Force field calculations without learning ===
 
Here are the steps that need to be carried out to run the force fields without learning:
 
*Copy {{TAG|ML_ABNCAR}} to {{TAG|ML_ABCAR}}. Copy {{TAG|ML_FFNCAR}} to {{TAG|ML_FFCAR}}. The {TAG|ML_FFNCAR}} holds the force field parameters.
 
*Get the desired {{TAG|POSCAR}} file for productions (possibly different than the {{TAG|POSCAR}} file used for learning.
 
*Select only force field only calculations by changing {{TAGBL|ML_FF_ISTART}} in the {{TAG|INCAR}} file to:
 
{{TAGBL|ML_FF_ISTART}} = 2
 
*Run calculation.
 
 
 
  
 
== Major input tags ==
 
== Major input tags ==

Revision as of 15:05, 21 August 2019

How to run machine learning force field calculations

In this section, we describe the three modes in which machine learning calculations can be run in VASP and show exemplary INCAR settings. Many of the tags will be covered in more detail below. In all modes you will set

ML_FF_LMLFF = .TRUE.

to switch on the force field calculation. A typical example showing these modes in action is the machine-learning of a force field for a material with two phases A and B. Initially, we have no force field of the material, so we choose a small to medium sized supercell of phase A to generate a new force field from scratch. In this step, ab initio calculations are performed whenever necessary improving the force field on this phase until it is sufficiently accurate. When we compare to phase B, the force field learned on phase A might contain useful information about the local configurations. We would run a continuation run and the machine will automatically collect the necessary structure datasets from phase B to refine the force field. In many cases, only few such structure datasets are required, but it is still necessary to verify this for every case. After the force field is sufficiently learned, we can use it to describe much larger cell sizes. Hence, we switch off learning on larger cells and use only the force field which is orders of magnitudes faster than the ab initio calculation. If the sampled atomic environments are similar to the structure datasets used for learning, the force field is transferable for the same constituting elements, but it should be still cautiously judged whether the force field can describe rare events in the larger cell.

On-the-fly force field generation from scratch

To generate a new force field, we don't need any special input files. First, we set up a molecular dynamics calculation as usual (see Molecular Dynamics) adding the machine learning related ones to the INCAR file. To start from scratch add

ML_FF_ISTART = 0

Running the calculation will result in generating the main output files ML_LOGFILE, ML_ABNCAR and ML_FFNCAR files. The latter two are required for restarting from an existing force field.

Continuing on-the-fly learning from already existing force-fields

To continue from an previous run, copy the following files

cp ML_ABNCAR ML_ABCAR
cp CONTCAR POSCAR

The file ML_ABCAR contains the ab initio reference data. You can also start from a new POSCAR file. To proceed learning and obtain an improved force field set

ML_FF_ISTART = 1

in the INCAR file.

  • Run calculation.

Force field calculations without learning

Once you have a sufficiently accurate force field, you can use it to predict properties. Copy the structures and the force field information

cp ML_ABNCAR ML_ABCAR
cp ML_FFNCAR ML_FFCAR
cp CONTCAR POSCAR

The file {TAG|ML_FFNCAR}} holds the force field parameters. You can also use different POSCAR files, e.g., a larger supercell. In the INCAR file, select only force field only calculations by setting

ML_FF_ISTART = 2


  • Calculate atomic energy (see Calculation of atoms) of all constituting atom types and add as a list to the INCAR file (the order of the elements must coincide with the order of the elements in the POTCAR file):
ML_FF_EATOM = E_at1 E_at2 ...
This option is only used if ML_FF_ISCALE_TOTEN_MB=1 is used. For ML_FF_ISCALE_TOTEN_MB=2 the reference energy becomes the average of the total energy of the training data.
  • Change possibly the most important parameters of the machine learning force field:
ML_FF_MB_MB = 1000
ML_FF_MCONF = 1000
ML_FF_CSF = 0.02
ML_FF_CTIFOR = 0.0
ML_FF_SION2_MB = 0.50
ML_FF_MRB2_MB = 9
ML_FF_RCUT2_MB = 5.0
  • Run force field generation and monitor learning. If the number of reference structures exceeds the maximum number of allowed reference structures ML_FF_MCONF or the basis set exceeds the maximum number of allowed basis functions ML_FF_MB_MB the calculation automatically stops with an error message. THen one simply needs to increase the appropriate number and continue the calculation as described under 2). MIND: ML_FF_MCONF and ML_FF_MB_MB shall not be set in advance to too large numbers, since they are the dimensions of the design matrix which is the memory bottle neck of the calculations. This matrix has to be allocated statically in advance of the calculations to make the algorithm efficient.

For a detailed description of the tags see below.

Major input tags

In this section we describe the most important input tags for standard machine learning force field calculations. Most of the tags are using defaults and should be only changed by experienced users in cases where the changes are essential. All tags controlling the method are set in the INCAR file.

Enabling machine learning

To enable machine learned force fields the tag ML_FF_LMLFF=.TRUE. has to be always set.

Type of machine learning calculation

The second most important tag is the ML_FF_ISTART tag. This tag selects how the machine learned force field is used. The following three cases are possible for this tag:

  1. ML_FF_ISTART=0: Starting from scratch the force field is learned on the fly.
  2. ML_FF_ISTART=1: Training data was already gathered from previous runs and on-the-fly learning is continued using this data.
  3. ML_FF_ISTART=2: Force-field is used to carry out calculation where the learning is turned off.

All three of these modes are described in more detail later on this page.

Reference total energies

For accurate calculations the usage of the reference total energies of isolated atoms in the system should be provided using the ML_FF_EATOM tag. These are written as a list for each atom species according to the order they appear in the POSCAR and POTCAR files. A sample input using 3 species would look like this:

ML_FF_EATOM = E_1 E_2 E_3

The energies of the atoms should be obtained from previous calculations using an isolated atom in a sufficiently large box. This step is calculationally cheap. It is not mandatory to use the ML_FF_EATOM tag but strongly advised (see Calculation of atoms. If the tag is not specified default values of 0.0 eV/atom are assumed. Although sufficiently accurate force fields can be obtained in most tested calculations, we have not tested this feature enough and hence would suggest to calculate the energies of isolated atoms before. This ML_FF_EATOM is only used if ML_FF_ISCALE_TOTEN_MB=1 is used. For ML_FF_ISCALE_TOTEN_MB=2 the reference energy becomes the average of the total energy of the training data.

Important parameters cotrolling basis sets and training sets

  • ML_FF_MB_MB: This tag controls the maximum number of basis sets. The default value of 1000 was tested to be sufficient in many cases, but several cases need higher values, such as e.g. MgO. If the maximum number of basis sets is reached in a calculation the calculation is stopped by default with a warning message (ML_FF_LBASIS_DISCARD=.FALSE.). If that appears the calculation has to be restarted with a higher value for ML_FF_MB_MB. Mind: If ML_FF_MB_MB is set to a too high value memory problems can occur, since the necessary arrays for this tag are allocated statically at the beginning of the calculation.
  • ML_FF_MCONF: This flag sets the maximum number of configurations used for training. The behaviour of this tag is very similar to ML_FF_MB_MB and should be also monitored. By default (ML_FF_LCONF_DISCARD=.FALSE.) the calculation is stopped and gives an error message if the number of configurations reaches the value set in the input. Mind: If ML_FF_MCONF is set to a too high value memory problems can occur, since the necessary arrays for this tag are set statically at the beginning of the calculation.
  • ML_FF_MCONF_NEW: This tag specifies sets the number of configurations that are stored temporally as candidates for the training data. The purpose of this set is to block operations for expensive calculational steps that would be otherwise sequentially calculated. This way a faster performance is obtained at the cost of a small memory overhead. The value of ML_FF_MCONF_NEW=5 was fully empirically obtained and is not carved into stone.
  • ML_FF_RCUT1_MB and ML_FF_RCUT2_MB:.
  • Radial basis set:
    • ML_FF_MRB1_MB, ML_FF_MRB2_MB: These tags set the number of radial basis sets used to expand the atomic distribution of the radial and angular density. These tags depend very sensitively on the cut-off radius of the descriptor (ML_FF_RCUT1_MB and ML_FF_RCUT2_MB) and the width of the Gaussian functions used in the broadening of the atomic distributions (ML_FF_SION1_MB and ML_FF_SION2_MB). The error occuring due to the expansion of the radial basis functions is monitored in the ML_LOGFILE file by looking for the following line "Error in radial expansion: ...". A typical good value for the error threshold which was purely empirically determined (by us and in reference [1]) is . So the number of basis functions should be adjusted until the error written in the ML_LOGFILE is smaller than this value. A more detailed description of the basis sets is given in appendix A of reference [2].
    • ML_FF_SION1_MB,ML_FF_SION2_MB:.
  • Angular momentum quantum numbers:
    • ML_FF_LMAX2_MB: This tag specifies the maximum angular momentum quantum number of spherical harmonics used to expand atomic distributions.
    • ML_FF_LAFILT2_MB: This tag specifies whether angular momentum filtering is active or not. By activating the angular filtering (ML_FF_LAFILT2_MB=.TRUE. and using the filtering function from reference [3] the computation can be noticably speeded up without loosing too much accuracy. Also by using the the angular filtering the maximum angular momentum number cut-off ML_FF_LMAX2_MB=6 can be lowered to a value of 4 again gaining computational speed. The user is still advised to check the accuracy of the angular filtering for his application.
    • ML_FF_IAFILT2_MB: This tag selects the type of angular filtering. We advise to use the default (ML_FF_IAFILT2_MB=2).
    • ML_FF_AFILT2_MB: This parameter sets the filtering parameter of the filtering function from reference [3]. The default of ML_FF_AFILT2_MB=0.002 worked well in most tested applications, but we advise the user to check this parameter for his application.

Example input for liquid Si

{{TAGBL|SYSTEM = Si_lquid
### Electronic structure part
PREC = FAST
ALGO = FAST
SIGMA = 0.1
ISPIN = 1
ISMEAR = 0
ENCUT = 325
NELM = 100
EDIFF = 1E-4
NELMIN = 6
LREAL = A
ISYM = -1

### MD part
IBRION = 0
ISIF = 2
NSW = 30000
POTIM = 1.0

### Output part
NBLOCK = 1
NWRITE = 1
INIWAV = 1
IWAVPR = 1
ISTART = 0
LWAVE = .FALSE.
LCHARG = .FALSE.

### Machine Learning part
### Major tags for machine learning
ML_FF_LMLFF = .TRUE.
ML_FF_ISTART = 0
ML_FF_EATOM = -0.7859510000
ML_FF_MB_MB = 1000
ML_FF_MCONF = 1000
ML_FF_MCONF_NEW = 5
ML_FF_CSF = 0.02
ML_FF_CTIFOR = 0.0
ML_FF_CSIG = 2E-1
ML_FF_WTOTEN = 1.0
ML_FF_WTIFOR = 1.0
ML_FF_WTSIF =  1.0

### Descriptor related tags for machine learning
ML_FF_W1_MB = 0D-1
ML_FF_W2_MB = 10D-1
ML_FF_LNORM1_MB = .FALSE.
ML_FF_LNORM2_MB = .TRUE.
ML_FF_RCUT1_MB = 6.0
ML_FF_RCUT2_MB = 5.0
ML_FF_MRB1_MB = 6
ML_FF_MRB2_MB = 9
ML_FF_SION1_MB = 0.50
ML_FF_SION2_MB = 0.50
ML_FF_NHYP1_MB = 1
ML_FF_NHYP2_MB = 4
ML_FF_LMAX2_MB = 4
ML_FF_LAFILT2_MB = .TRUE.
ML_FF_IAFILT2_MB = 2
ML_FF_AFILT2_MB = 2D-3

### Lesser important tags for machine learning
ML_FF_MSPL1_MB = 100
ML_FF_MSPL2_MB = 100
ML_FF_NR1_MB = 100
ML_FF_NR2_MB = 100
ML_FF_NWRITE = 2
ML_FF_NDIM_SCALAPACK = 2
ML_FF_ISAMPLE = 3
ML_FF_IERR = 3
ML_FF_LMLMB = .TRUE.
ML_FF_CDOUB = 2.0
ML_FF_LCRITERIA = .TRUE.
ML_FF_CSLOPE = 1E-1
ML_FF_NMDINT = 10
ML_FF_MHIS = 10
ML_FF_IWEIGHT = 3
ML_FF_LEATOM_MB = .FALSE.
ML_FF_LHEAT_MB = .FALSE.
ML_FF_ISOAP1_MB = 1
ML_FF_ISOAP2_MB = 1
ML_FF_ICUT1_MB = 1
ML_FF_ICUT2_MB = 1
ML_FF_IBROAD1_MB = 2
ML_FF_IBROAD2_MB = 2
ML_FF_IREG_MB = 2
ML_FF_SIGV0_MB = 1.0
ML_FF_SIGW0_MB = 1.0

References