Requests for technical support from the VASP group should be posted in the VASP-forum.

Difference between revisions of "Liquid Si - MLFF"

From Vaspwiki
Jump to navigationJump to search
 
(25 intermediate revisions by 3 users not shown)
Line 97: Line 97:
 
  {{TAGBL|SIGMA}} = 0.1
 
  {{TAGBL|SIGMA}} = 0.1
 
  {{TAGBL|LREAL}} = Auto
 
  {{TAGBL|LREAL}} = Auto
{{TAGBL|ALGO}} = FAST
 
 
  {{TAGBL|ISYM}} = -1
 
  {{TAGBL|ISYM}} = -1
 
  {{TAGBL|NELM}} = 100
 
  {{TAGBL|NELM}} = 100
Line 115: Line 114:
 
  {{TAGBL|NSW}} = 10000
 
  {{TAGBL|NSW}} = 10000
 
  {{TAGBL|POTIM}} = 3.0
 
  {{TAGBL|POTIM}} = 3.0
 
#Parameters for initialization of wavefunctions and charge density
 
{{TAGBL|INIWAV}} = 1
 
{{TAGBL|IWAVPR}} = 1
 
{{TAGBL|ISTART}} = 0
 
{{TAGBL|ICHARG}} = 0
 
 
   
 
   
 
  #Machine learning paramters
 
  #Machine learning paramters
 
  {{TAGBL|ML_FF_LMLFF}} = .TRUE.
 
  {{TAGBL|ML_FF_LMLFF}} = .TRUE.
 
  {{TAGBL|ML_FF_ISTART}} = 0
 
  {{TAGBL|ML_FF_ISTART}} = 0
{{TAGBL|ML_FF_NWRITE}} = 2
 
 
  {{TAGBL|ML_FF_EATOM}} = -.70128086E+00
 
  {{TAGBL|ML_FF_EATOM}} = -.70128086E+00
  
 
;{{TAG|ML_FF_LMLFF}} = ''.TRUE.'': switches on the machine learning of the force field
 
;{{TAG|ML_FF_LMLFF}} = ''.TRUE.'': switches on the machine learning of the force field
 
;{{TAG|ML_FF_ISTART}} = 0: corresponds to learning from scratch
 
;{{TAG|ML_FF_ISTART}} = 0: corresponds to learning from scratch
;{{TAG|ML_FF_NWRITE}} = 2: leads to a more verbose output where the error on energies, forces and stress are written to the {{TAG|ML_LOGFILE}} file. This setting is very handy to check the accuracy of the force field.
 
 
;{{TAG|ML_FF_EATOM}} = -.70128086E+00: sets the atomic reference energy for each species. We show to obtain this energy below.
 
;{{TAG|ML_FF_EATOM}} = -.70128086E+00: sets the atomic reference energy for each species. We show to obtain this energy below.
;{{TAG|IWAVPR}} = 1: extrapolates the total charge density from the atomic ones. This is more advisable than the default, because during the on-the-fly learning several hundreds or thousands of molecular dynamics steps can lie between two ab initio calculations.
 
  
 
== Calculation ==
 
== Calculation ==
Line 147: Line 137:
 
  {{TAG|SIGMA}} = 0.1
 
  {{TAG|SIGMA}} = 0.1
 
  {{TAG|LREAL}} = Auto
 
  {{TAG|LREAL}} = Auto
{{TAG|ALGO}} = FAST
 
 
  {{TAG|ISYM}} = 0
 
  {{TAG|ISYM}} = 0
 
  {{TAG|NELM}} = 100
 
  {{TAG|NELM}} = 100
Line 191: Line 180:
 
  ====================================================================================================
 
  ====================================================================================================
 
You can see that additional information about the Bayesian error and the spilling factor is reported.
 
You can see that additional information about the Bayesian error and the spilling factor is reported.
The first entry of the Bayesian error is the estimated error from our model. If the estimated error exceeds the threshold listed next to it a new structure dataset is generated. The threshold is automatically determined during the calculations. The first and second entry for the spilling factor are the calculated spilling factor and the threshold, respectively. The threshold for the spilling factor is usually always kept constant during the calculations.
+
The first entry of the Bayesian error is the estimated error from our model. If the estimated error exceeds the threshold listed next to it a new structure dataset is generated. The threshold is automatically determined during the calculations. The first and second entry for the spilling factor are the calculated spilling factor and the threshold, respectively. The threshold for the spilling factor is usually kept constant during the calculations.
  
 
=== Structral properties of the force field ===
 
=== Structral properties of the force field ===
Next we will look at the accuracy of structural properties of the force field. For the we first run a 3000 fs molecular dynamics calculation with and without the force field starting from the new {{TAG|CONTCAR}} file. First we save that new {{TAG|CONTCAR}} file to
+
To examine the accuracy of structural properties, we compare the deviations between a 3 ps molecular dynamics run using the force field and a full ab initio calculation. For a meaningful comparison, it is best to start from the same initial structure. We will use the liquid structure, we obtained in the previous step and back it up
 
  cp CONTCAR POSCAR.T2000_relaxed
 
  cp CONTCAR POSCAR.T2000_relaxed
Now do the following steps to run the force field calculation:
 
*Copy ''POSCAR.T2000_relaxed'' to {{TAG|POSCAR}}.
 
*Copy {{TAG|ML_ABNCAR}} to {{TAG|ML_ABCAR}}.
 
*Copy {{TAG|ML_FFNCAR}} to {{TAG|ML_FFCAR}}.
 
*Change {{TAG|INCAR}} tags:
 
**Change {{TAG|ML_FF_ISTART}}=0 to {{TAG|ML_FF_ISTART}}=2: This will switch of learning and only use the machine-learned force field.
 
**Change {{TAG|NSW}}=10000 to {{TAG|NSW}}=1000.
 
*Run calculation.
 
*Copy {{TAG|XDATCAR}} to ''XDATCAR.MLFF_3ps''.
 
  
Now do the following steps to run the ab initio reference calculation:
+
Now, we proceed with the force field calculation and set up the required files
*Copy ''POSCAR.T2000_relaxed'' to {{TAG|POSCAR}}.
+
cp POSCAR.T2000_relaxed {{TAGBL|POSCAR}}
*Change {{TAG|ML_FF_LMLFF}}=''.TRUE.'' to {TAG|ML_FF_LMLFF}}=''.FALSE.'': This will turn off the machine learning completely.
+
cp {{TAGBL|ML_ABNCAR}} {{TAGBL|ML_ABCAR}}
*Run caclulation.
+
cp {{TAGBL|ML_FFNCAR}} {{TAGBL|ML_FFCAR}}
*Copy {{TAG|XDATCAR}} to ''XDATCAR.AI_3ps''.
+
To run a shorter simulation using only the force field, we change the following {{TAG|INCAR}} tags to
 +
{{TAGBL|ML_FF_ISTART}} = 2
 +
{{TAGBL|NSW}} = 1000
 +
After the calculation finished, we backup the history of the atomic positions
 +
cp {{TAGBL|XDATCAR}} XDATCAR.MLFF_3ps
  
To analyze the pair correlation function use the PERL script <span class="mw-customtoggle-perlscript">''pair_correlation_function.pl'' (click to expand)</span>
+
We repeat the same process for the ab initio simulation recovering the same initial structure
<div class="mw-collapsible mw-collapsed" id="mw-customcollapsible-perlscript">
+
cp POSCAR.T2000_relaxed {{TAGBL|POSCAR}}
#!/usr/bin/perl
+
switching the machine learning off in the {{TAG|INCAR}} file by setting
+
{{TAGBL|ML_FF_LMLFF}} = .FALSE.
use strict;
+
We store the trajectory once the calculation completed
use warnings;
+
cp {{TAGBL|XDATCAR}} XDATCAR.AI_3ps
+
 
#configuration for which ensemble average is to be calculated
+
To analyze the pair correlation function, we use the PERL script ''pair_correlation_function.pl''
my $confmin=1;            #starting index of configurations in XDATCAR file for pair correlation function
+
<div class="toccolours mw-customtoggle-script">'''Click to show/hide pair_correlation_function.pl'''</div>
my $confmax=20000;          #last index of configurations in XDATCAR file for pair correlation function
+
<div class="mw-collapsible mw-collapsed" id="mw-customcollapsible-script">{{:Pair correlation script}}</div>
my $confskip=1;          #stepsize for configuration loop
+
and process the previously saved {{TAG|XDATCAR}} files
my $species_1 = 1;        #species 1 for which pair correlation function is going to be calculated
 
my $species_2 = 1;        #species 2 for which pair correlation function is going to be calculated
 
#setting radial grid
 
my $rmin=0.0;            #minimal value of radial grid
 
my $rmax=10.0;            #maximum value of radial grid
 
my $nr=300;                #number of equidistant steps in radial grid
 
my $dr=($rmax-$rmin)/$nr; #stepsize in radial grid
 
my $tol=0.0000000001;    #tolerance limit for r->0
 
 
 
my $z=0;                  #counter
 
my $numelem;              #number of elements
 
my @elements;            #number of atoms per element saved in list/array
 
my $lattscale;            #scaling factor for lattice
 
my @b;                    #Bravais matrix
 
my $nconf=0;              #number of configurations in XDATCAR file
 
my @cart;                #Cartesian coordinates for each atom and configuration
 
my $atmin_1=0;            #first index of species one
 
my $atmax_1;              #last index of species one
 
my $atmin_2=0;            #first index of species two
 
my $atmax_2;              #last index of species two
 
my @vol;                  #volume of cell (determinant of Bravais matrix)
 
my $pi=4*atan2(1, 1);    #constant pi
 
my $natom=0;              #total number of atoms in cell
 
my @pcf;                  #pair correlation function (list/array)
 
my $mult_x=1;            #periodic repetition of cells in x dimension
 
my $mult_y=1;            #periodic repetition of cells in y dimension
 
my $mult_z=1;            #periodic repetition of cells in z dimension
 
my @cart_super;          #Cartesian cells over multiple cells
 
my @vec_len;              #Length of lattice vectors in 3 spatial coordinates
 
#my $ensemble_type="NpT";  #Set Npt or NVT. Needs to be set since both have different XDATCAR file.
 
my $ensemble_type="NVT";  #Set Npt or NVT. Needs to be set since both have different XDATCAR file.
 
my $av_vol=0;              #Average volume in cell
 
 
#reading in XDATCAR file
 
while (<>)
 
{
 
    chomp;
 
    $_=~s/^/ /;
 
    my @help=split(/[\t,\s]+/);
 
    $z++;
 
    if ($z==2)
 
    {
 
      $lattscale = $help[1];
 
    }
 
    if ($z==3)
 
    {
 
      $b[$nconf+1][1][1]=$help[1]*$lattscale;
 
      $b[$nconf+1][1][2]=$help[2]*$lattscale;
 
      $b[$nconf+1][1][3]=$help[3]*$lattscale;
 
    }
 
    if ($z==4)
 
    {
 
      $b[$nconf+1][2][1]=$help[1]*$lattscale;
 
      $b[$nconf+1][2][2]=$help[2]*$lattscale;
 
      $b[$nconf+1][2][3]=$help[3]*$lattscale;
 
    }
 
    if ($z==5)
 
    {
 
      $b[$nconf+1][3][1]=$help[1]*$lattscale;
 
      $b[$nconf+1][3][2]=$help[2]*$lattscale;
 
      $b[$nconf+1][3][3]=$help[3]*$lattscale;
 
    }
 
    if ($z==7)
 
    {
 
      if ($nconf==0)
 
      {
 
          $numelem=@help-1;
 
          for (my $i=1;$i<=$numelem;$i++)
 
          {
 
            $elements[$i]=$help[$i];
 
            $natom=$natom+$help[$i];
 
          }
 
      }
 
    }
 
    if ($_=~m/Direct/)
 
    {
 
      $nconf=$nconf+1;
 
      #for NVT ensemble only one Bravais matrix exists, so it has to be copied
 
      if ($ensemble_type eq "NVT")
 
      {
 
          for (my $i=1;$i<=3;$i++)
 
          {
 
            for (my $j=1;$j<=3;$j++)
 
            {
 
                $b[$nconf][$i][$j]=$b[1][$i][$j];
 
            }
 
          }
 
      }
 
      for (my $i=1;$i<=$natom;$i++)
 
      {
 
          $_=<>;
 
          chomp;
 
          $_=~s/^/ /;
 
          my @helpat=split(/[\t,\s]+/);
 
          $cart[$nconf][$i][1]=$b[1][1][1]*$helpat[1]+$b[1][1][2]*$helpat[2]+$b[1][1][3]*$helpat[3];
 
          $cart[$nconf][$i][2]=$b[1][2][1]*$helpat[1]+$b[1][2][2]*$helpat[2]+$b[1][2][3]*$helpat[3];
 
          $cart[$nconf][$i][3]=$b[1][3][1]*$helpat[1]+$b[1][3][2]*$helpat[2]+$b[1][3][3]*$helpat[3];
 
      }
 
      if ($ensemble_type eq "NpT")
 
      {
 
          $z=0;
 
      }
 
    }
 
    last if eof;
 
}
 
 
if ($confmin>$nconf)
 
{
 
    print "Error, confmin larger than number of configurations. Exiting...\n";
 
    exit;
 
}
 
if ($confmax>$nconf)
 
{
 
    $confmax=$nconf;
 
}
 
 
 
for (my $i=1;$i<=$nconf;$i++)
 
{
 
    #calculate lattice vector lengths
 
    $vec_len[$i][1]=($b[$i][1][1]*$b[$i][1][1]+$b[$i][1][2]*$b[$i][1][2]+$b[$i][1][3]*$b[$i][1][3])**0.5;
 
    $vec_len[$i][2]=($b[$i][2][1]*$b[$i][2][1]+$b[$i][2][2]*$b[$i][2][2]+$b[$i][2][3]*$b[$i][2][3])**0.5;
 
    $vec_len[$i][3]=($b[$i][3][1]*$b[$i][3][1]+$b[$i][3][2]*$b[$i][3][2]+$b[$i][3][3]*$b[$i][3][3])**0.5;
 
    #calculate volume of cell
 
    $vol[$i]=$b[$i][1][1]*$b[$i][2][2]*$b[$i][3][3]+$b[$i][1][2]*$b[$i][2][3]*$b[$i][3][1]+$b[$i][1][3]*$b[$i][2][1]*$b[$i][3][2]-$b[$i][3][1]*$b[$i][2][2]*$b[$i][1][3]-$b[$i][3][2]*$b[$i][2][3]*$b[$i][1][1]-$b[$i][3][3]*$b[$i][2][1]*$b[$i][1][2];
 
    $av_vol=$av_vol+$vol[$i];
 
}
 
$av_vol=$av_vol/$nconf;
 
 
#choose species 1 for which pair correlation function is going to be calculated
 
$atmin_1=1;
 
if ($species_1>1)
 
{
 
    for (my $i=1;$i<$species_1;$i++)
 
    {
 
      $atmin_1=$atmin_1+$elements[$i];
 
    }
 
}
 
$atmax_1=$atmin_1+$elements[$species_1]-1;
 
#choose species 2 to which paircorrelation function is calculated to
 
$atmin_2=1;
 
if ($species_2>1)
 
{
 
    for (my $i=1;$i<$species_2;$i++)
 
    {
 
      $atmin_2=$atmin_2+$elements[$i];
 
    }
 
}
 
$atmax_2=$atmin_2+$elements[$species_2]-1;
 
#initialize pair correlation function
 
for (my $i=0;$i<=($nr-1);$i++)
 
{
 
    $pcf[$i]=0.0;
 
}
 
# loop over configurations, make histogram of pair correlation function
 
for (my $j=$confmin;$j<=$confmax;$j=$j+$confskip)
 
{
 
    for (my $k=$atmin_1;$k<=$atmax_1;$k++)
 
    {
 
        for (my $l=$atmin_2;$l<=$atmax_2;$l++)
 
        {
 
          if ($k==$l) {next};
 
          for (my $g_x=-$mult_x;$g_x<=$mult_x;$g_x++)
 
          {
 
              for (my $g_y=-$mult_y;$g_y<=$mult_y;$g_y++)
 
              {
 
                for (my $g_z=-$mult_y;$g_z<=$mult_z;$g_z++)
 
                {
 
                    my $at2_x=$cart[$j][$l][1]+$vec_len[$j][1]*$g_x;
 
                    my $at2_y=$cart[$j][$l][2]+$vec_len[$j][2]*$g_y;
 
                    my $at2_z=$cart[$j][$l][3]+$vec_len[$j][3]*$g_z;
 
                    my $dist=($cart[$j][$k][1]-$at2_x)**2.0+($cart[$j][$k][2]-$at2_y)**2.0+($cart[$j][$k][3]-$at2_z)**2.0;
 
                    $dist=$dist**0.5;
 
                    #determine integer multiple
 
                    my $zz=int(($dist-$rmin)/$dr+0.5);
 
                    if ($zz<$nr)
 
                    {
 
                      $pcf[$zz]=$pcf[$zz]+1.0;
 
                    }
 
                }
 
              }
 
          }
 
        }
 
    }
 
}
 
 
 
#make ensemble average, rescale functions and print
 
for (my $i=0;$i<=($nr-1);$i++)
 
{
 
    my $r=$rmin+$i*$dr;
 
    if ($r<$tol)
 
    {
 
      $pcf[$i]=0.0;
 
    }
 
    else
 
    {
 
      $pcf[$i]=$pcf[$i]*$av_vol/(4*$pi*$r*$r*$dr*(($confmax-$confmin)/$confskip)*($atmax_2-$atmin_2+1)*($atmax_1-$atmin_1+1));#*((2.0*$mult_x+1.0)*(2.0*$mult_y+1.0)*(2.0*$mult_z+1.0)));
 
    }
 
    print $r," ",$pcf[$i],"\n";
 
}
 
</div>
 
Obtain the pair correlation function from the previously saved {{TAG|XDATCAR}} files:
 
 
  perl pair_correlation_function.pl XDATCAR.MLFF_3ps > pair_MLFF_3ps.dat
 
  perl pair_correlation_function.pl XDATCAR.MLFF_3ps > pair_MLFF_3ps.dat
 
  perl pair_correlation_function.pl XDATCAR.AI_3ps > pair_AI_3ps.dat
 
  perl pair_correlation_function.pl XDATCAR.AI_3ps > pair_AI_3ps.dat
Plot the pair correlation functions using the following command:
+
We can compare the pair correlation functions, e.g. with gnuplot using the following command
 
  gnuplot -e "set terminal jpeg; set xlabel 'r(Ang)'; set ylabel 'PCF'; set style data lines; plot 'pair_MLFF_3ps.dat', 'pair_AI_3ps.dat' " > PC_MLFF_vs_AI_3ps.jpg
 
  gnuplot -e "set terminal jpeg; set xlabel 'r(Ang)'; set ylabel 'PCF'; set style data lines; plot 'pair_MLFF_3ps.dat', 'pair_AI_3ps.dat' " > PC_MLFF_vs_AI_3ps.jpg
The pair correlation functions obtained that way should look similar like that:
+
The pair correlation functions obtained that way should look similar to this figure
  
 
[[File:PC MLFF vs AI 3ps.jpg|400px]]
 
[[File:PC MLFF vs AI 3ps.jpg|400px]]
  
We see that pair correlation is quite well reproduced although the error on the force is a little bit large. Nevertheless we just used this learning to very create a liquid structure very fast, since creating a liquid from a perfect crystal can be a time consuming task.
+
We see that pair correlation is quite well reproduced although the error in the force of 0.247664 eV/<math>\AA</math> shown above is a little bit too large. This error is maybe too large for accurate production calculations (usually an accuracy of approximately 0.1 eV/<math>\AA</math> is targeted), but since the pair correlation function is well reproduced it is perfectly fine to use this on-the-fly force field in the time-consuming melting of the crystal.
  
=== Obtaining more accurate force field ===
+
=== Obtaining a more accurate force field ===
  
Including the melting phase in the force field may impact the accuracy of the force field. To improve it is usually advisable to learn on the pure structures, which in our case this means to use the {{TAG|CONTCAR}} file obtained after the melting. Furthermore, the force field was learned using only a single k point so that we can improve the accuracy of the reference data by including more k points.
+
Including the melting phase in the force field may impact the accuracy of the force field. To improve it is usually advisable to learn on the pure structures, which in our case this means to use the {{TAG|CONTCAR}} file obtained after the melting. Furthermore, the force field was learned using only a single k point so that we can improve the accuracy of the reference data by including more k points. In most calculations, it is important to conduct accurate ab initio calculations since bad reference data can limit the accuracy or even inhibit the learning of a force field.
After learning how to get a good starting structure for the the liquid we will restart the learning but with better ab initio parameters using the {{TAG|CONTCAR}} from the previous subsection. It is important in most calculations to have quite accurate ab initio calculations since bad ab initio parameters can really limit the accuracy of the learning or even inhibit a proper learning.
 
  
We do the following steps to restart learning:
+
We restart from the liquid structure obtained [[#Creating the liquid structure|before]]
*Copy ''POSCAR.T2000_relaxed'' to {{TAG|POSCAR}}.
+
cp POSCAR.T2000_relaxed {{TAGBL|POSCAR}}
*Change {{TAG|INCAR}} tags:
+
and change the following {{TAG|INCAR}} tags
**Set {{TAG|ALGO}}=''Normal''.
+
{{TAGBL|ALGO}} = Normal
**Set {{TAG|ML_FF_LMLFF}}=''.FALSE.'' back to {{TAG|ML_FF_LMLFF}}=''.TRUE.''.
+
{{TAGBL|ML_FF_LMLFF}} = .TRUE.
**Set {{TAG|ML_FF_ISTART}}=2 back to {{TAG|ML_FF_ISTART}}=0.
+
{{TAGBL|ML_FF_ISTART}} = 0
**Set {{TAG|NSW}}=1000 back to {{TAG|NSW}}=10000. We will learn again for 30 ps.
+
{{TAGBL|NSW}} = 10000
**Optionally add k point parallelization wth the {{TAG|KPAR}} tag (if you are not familiar with this tag leave it be).
+
If you run have resources to run in parallel, you can reduce the computation time further by adding k point parallelization with the {{TAG|KPAR}} tag.
*Increase the k mesh in the {{TAG|KPOINTS}} file. The new file looks like:
+
We use a denser k-point mesh in the {{TAG|KPOINTS}} file
  test
+
  2x2x2 Gamma-centered mesh
 
  0 0 0
 
  0 0 0
 
  Gamma
 
  Gamma
 
   2 2 2
 
   2 2 2
 
   0 0 0
 
   0 0 0
*Run the calculation using the '''standard''' version of VASP (usually ''vasp_std'').
+
We will learn a new force field on a run of 30 ps. Keep in mind to run the calculation using the '''standard''' version of VASP (usually ''vasp_std'').
 
+
After running the calculation, we examine the error in the {{TAG|ML_LOGFILE}}, where the obtained values should be close to
After running the calculation we will again look at the error in the {{TAG|ML_LOGFILE}}, where the obtained values should be close to:
 
 
  ====================================================================================================
 
  ====================================================================================================
 
       Information on error estimations
 
       Information on error estimations
Line 462: Line 243:
 
               Error in force  (eV Angst^-1):    0.165678
 
               Error in force  (eV Angst^-1):    0.165678
 
                       Error in stress (kB):    2.553148
 
                       Error in stress (kB):    2.553148
                 Bayesian error (eV Angst-1):    0.044695    0.101357
+
                 Bayesian error (eV Angst-1):    0.044695    '''0.101357'''
 
                         Spilling factor (-):    0.002285    0.020000
 
                         Spilling factor (-):    0.002285    0.020000
 
  ====================================================================================================
 
  ====================================================================================================
  
We immediately see that the errors are significantly lower than in the training with one k point.
+
We immediately see that the errors are significantly lower than in the previous calculation with only one k point. This is due to the less noisy ab initio data which is easier to learn.
 
 
Next we try to look at how well the force field is learned and try to monitor how much training data is added to a structure after each learning cycle.
 
  
The {{TAG|ML_ABNCAR}} file contains the training data. The number of reference structures and the basis-set size are written at the beginning of this file. The values in this example at this step should be close to the following:
+
To understand how the force field is learned, we inspect the {{TAG|ML_ABNCAR}} file containing the training data. In the beginning of this file, you will find information about the number of reference structures
 
  **************************************************
 
  **************************************************
 
       The number of configurations
 
       The number of configurations
 
  --------------------------------------------------
 
  --------------------------------------------------
 
           55
 
           55
and
+
and the size of the basis set
 
  **************************************************
 
  **************************************************
 
       The numbers of basis sets per atom type
 
       The numbers of basis sets per atom type
 
  --------------------------------------------------
 
  --------------------------------------------------
 
         396
 
         396
for the training-set size and basis-set size, respectively.
 
  
Now we want to further continue the learning to see how much data is added and how the accuracy of the force field changes.
+
We will monitor how much training data is added to a structure after each learning cycle and what impact this has on the accuracy of the force field. First we will perform a continuation run keeping the Bayesian threshold fixed
Do the following steps for a continuation run with on-the-fly learning:
+
cp {{TAGBL|ML_ABNCAR}} {{TAGBL|ML_ABCAR}}
*Copy {{TAG|ML_ABNCAR}} to {{TAG|ML_ABCAR}}.
+
cp {{TAGBL|CONTCAR}} {{TAGBL|POSCAR}}
*Copy {{TAG|CONTCAR}} to {{TAG|POSCAR}}.
+
and set the following {{TAG|INCAR}} tags
*Modify the {{TAG|INCAR}} file in the following way:
+
{{TAGBL|ML_FF_ISTART}} = 1
**Change {{TAG|ML_FF_ISTART}}=0 to {{TAG|ML_FF_ISTART}}=1.
+
{{TAGBL|ML_FF_CTIFOR}} = ''x''
**Add {{TAG|ML_FF_CTIFOR}}=''x'', where ''x'' is the last criterion for the threshold of the Bayesian error obtained from the {{TAG|ML_LOGFILE}} (from the entry above it would be 0.101357, but please use the value that you obtained in your calculation). By setting already a good guess for the threshold several ab initio steps can be skipped from the beginning which would be required to determine the Bayesian error criterion. '''Mind''': When continuation runs are performed where the crystal structure is changed, the last obtained error criterion for the Bayesian error can be too large and then many ab initio steps can be skipped wrongly (this can especially happen when going from first a liquid to a solid phase). In that case it is safer to use the default of {{TAG|ML_FF_CTIFOR}}=10^{-16}</math>}}.
+
where ''x'' is the threshold of the Bayesian error obtained from the {{TAG|ML_LOGFILE}} (highlighted bold above; please use the value obtained in your calculation). By setting a good estimate for the threshold several ab initio steps are skipped in the first few steps, which would be required to determine the threshold automatically. '''Mind''': When continuation runs are performed for different crystal structures, the last previous threshold for the Bayesian error might be too large leading to premature skipping of ab initio steps. This is particular relevant when studying the liquid phase, first, and applying its threshold to the solid phase. In that case it is safer to use the default value for {{TAG|ML_FF_CTIFOR}} = <math>10^{-16}</math>.
  
After learning we again look at the last entry of the errors in the {{TAG|ML_LOGFILE}}, which should look like:
+
After running the calculation, we inspect the last instance of the errors in the {{TAG|ML_LOGFILE}}, which should look similar to
 
  ====================================================================================================
 
  ====================================================================================================
 
       Information on error estimations
 
       Information on error estimations
Line 500: Line 278:
 
                         Spilling factor (-):    0.001114    0.020000
 
                         Spilling factor (-):    0.001114    0.020000
 
  ====================================================================================================
 
  ====================================================================================================
We see that the accuracy has not significantly changed. Also the threshold for the Bayesian error has not change (by typing ''grep "Bayesian error (eV Angst-1):" ML_LOGFILE we see that it was the same the whole time). By looking at the actual errors we see that that the threshold was overwritten only very few times. To see how often the threshold was overstepped type the following command on the command line:
+
We see that the accuracy has not changed significantly and the threshold for the Bayesian error is set to the one specified in the {{TAG|INCAR}} file. Looking at the actual errors we observe that the threshold was exceeded only a few times. You can find all these instances with the command
  grep "Bayesian error (eV Angst-1):" ML_LOGFILE |awk '{if ($5>0.101357) {print $5}}'
+
  grep "Bayesian error (eV Angst-1):" ML_LOGFILE | awk '{if ($5 > $6) { print $5 }}'
Consequently only very few structures were added to the ab initio data in the {{TAG|ML_ABNCAR}} file. The entry for the reference structures and basis sets should look like:
+
Consequently only very few structures were added to the ab initio data in the {{TAG|ML_ABNCAR}} file. The entry for the reference structures and basis sets should look similar to
 
  **************************************************
 
  **************************************************
 
       The number of configurations
 
       The number of configurations
 
  --------------------------------------------------
 
  --------------------------------------------------
 
           65
 
           65
and
+
...
 
  **************************************************
 
  **************************************************
 
       The numbers of basis sets per atom type
 
       The numbers of basis sets per atom type
 
  --------------------------------------------------
 
  --------------------------------------------------
 
         407
 
         407
This means that either the force field is already accurate enough or that the threshold is set too high and ab initio calculations are skipped because the force field is too accurate for that criterion.
+
From this observation, we conclude either that the force field is already accurate enough or that the threshold is set too high. In the latter case, the machine would not conduct ab initio calculations because the force field is judged accurate enough according to the threshold.
  
Next we will test the learning with the default value for {{TAG|ML_FF_CTIFOR}} (which is practically zero), to see if the force field changes significantly. For that the following steps have to be done:
+
We compare this behavior to the case of using the default value for {{TAG|ML_FF_CTIFOR}}. We continue from the existing structure and force field
*Copy {{TAG|CONTCAR}} to {{TAG|POSCAR}}.
+
cp {{TAGBL|CONTCAR}} {{TAGBL|POSCAR}}
*Copy {{TAG|ML_ABNCAR}} to {{TAG|ML_ABCAR}}.
+
cp {{TAGBL|ML_ABNCAR}} {{TAG|ML_ABCAR}}
*Remove the {{TAG|ML_FF_ISTART}} line in the {{TAG|INCAR}} file.
+
Remove the {{TAG|ML_FF_CTIFOR}} tag from the {{TAG|INCAR}} file.
  
After rerunning the calculation we see that many structures have been added in the {{TAG|ML_ABNCAR}} file:
+
The default value for {{TAG|ML_FF_CTIFOR}} is practically zero, so we expect more structures to be added. We inspect the {{TAG|ML_ABNCAR}} to investigate how the number of structures and the basis set size changed
 
   **************************************************
 
   **************************************************
 
       The number of configurations
 
       The number of configurations
Line 529: Line 307:
 
  --------------------------------------------------
 
  --------------------------------------------------
 
         726
 
         726
Nevertheless the accuracy of the force field did not improve significantly in the {{TAG|ML_LOGFILE}}:
+
Even though the number of structure datasets and the basis set size increased, the accuracy of the force field did not improve significantly
 
  ====================================================================================================
 
  ====================================================================================================
 
       Information on error estimations
 
       Information on error estimations
Line 540: Line 318:
 
  ====================================================================================================
 
  ====================================================================================================
  
Ideally one should continue learning until no structures need to be added to the training data and basis set any more. Practically this is hard to achieve, since the prediction of the Bayesian error has some inaccuracies and learning can be executed even if the force field is accurate enough. Nevertheless for practical calculations one has to stop the calculation at a given point. In this example we stop here because the accuracy on test structures compared to the ab initio data is not improving in the {{TAG|ML_LOGFILE}} by further addition of training data and this example is only a tutorial example. For high quality production calculations suited for publications we would advise to learn until almost no data is added to the training structures and basis set. In many cases this involves a length of 100 ps for learning.
+
Ideally, one should continue learning until no structures need to be added to the training data and basis set anymore. In practice, the prediction of the Bayesian error exhibits numerical inaccuracies so that an ab initio calculation is conducted even if the force field is accurate enough. Hence, in this tutorial, we stop at this point because further addition of training data does not improve the accuracy on the test structures compared to the ab initio data. For high quality production calculations suited for publications, we advise to learn until almost no data is added to the training structures and basis set. In many cases this involves a runtime of at least 100 ps for learning.
 +
 
 +
;Exercise: Continue learning and observe the convergence of the number of reference structures. Most likely the maximum allowed basis set size {{TAG|ML_FF_MB_MB}} needs to be increased. How does the pair correlation function compare to ab initio data?
  
'''User task''': The user should continue learning and observe the convergence of the number of reference structures. Most likely the maximum allowed basis set size {{TAG|ML_FF_MB_MB}} needs to be increased. The user should also look at the pair correlation function with respect to the ab initio data. Is it still ok?
+
=== Optimizing weights of energy, forces and stress ===
  
=== Optimizing weights on energy, forces and stress ===
+
Even after a force field has been learned, there is still the possibility to refine it by changing the weights for energy {{TAG|ML_FF_WTOTEN}}, forces {{TAG|ML_FF_WTIFOR}} and stress {{TAG|ML_FF_WTSIF}}. By default all weights are set to 1.0, so energy, force and stress are equally important. Typically the error for a component decreases with increasing weight of that component. For methods where the free energy is important (such as e.g. {{TAG|Interface pinning calculations}}, {{TAG|Metadynamics}} etc.) introducing a larger weight for the energy can reduce the error of the energy significantly while only slightly increasing the errors of the forces and the stress. Note that the weights are set relative to each other, i.e., by increasing one the other ones decrease automatically.
  
Next we learn how to refine an existing force after it has been learned. We will use the force field already obtained from the previous subsection. In this wiki we show the results for the force field learned for 30 ps. If one has learned for a longer time (user task in the previous example) one can also use that force field, but should of course expect larger deviations from the results presented here. The input parameters {{TAG|ML_FF_WTOTEN}}, {{TAG|ML_FF_WTIFOR}} and {{TAG|ML_FF_WTSIF}} are the weights for energy, force and stress in the linear fitting, respectively. By default they are set to 1.0, so energy, force and stress are weighted equally. The weights are relative weights, this means it is sufficient to increase only the weight of interest, while keeping the other parameters at 1.0. If the weight for a component is increased the error of that component usually becomes lower with higher weight. Simultaneously the error on the other components gets worse, but at a different rate. For a given optimal value of the weight the overall error can be minimized, such that the component of interest is reproduced with much more accuracy. Usually this can be important for methods where the free energy is important (such as e.g. {{TAG|Interface pinning calculations}}, {{TAG|Metadynamics}} etc.) and this weighting can improve the error on energy significantly while only slightly making the errors on forces and stress worse.
+
We will start from the force field obtained above, but you can use a more refined one if you completed the user task
 +
cp {{TAGBL|INCAR}} INCAR.run
 +
cp {{TAGBL|CONTCAR}} POSCAR.run
 +
The ''INCAR.run'' file  will serve as a template for several short calculations with different weights for the energy. Set
 +
{{TAG|ML_FF_ISTART}} = 1
 +
{{TAG|NSW}} = 1
 +
to generate a single new test structure.
  
To run the weight optimizations calculations the following steps are necessary:
+
We can now use a script to set {{TAG|ML_FF_WTOTEN}} to different values and examine how the average error changes
*Copy {{TAG|INCAR}} to ''INCAR.run''. This will serve as a template since several short calculations will be executed with different weights for the energy.
 
*Copy {{TAG|CONTCAR}} to ''POSCAR.run''. For each calculation the same {{TAG|POSCAR}} file needs to be used as a starting point.
 
*Copy {{TAG|ML_ABNCAR}} to {{TAG|ML_ABCAR}}.
 
*Modify ''INCAR.run'' as follows:
 
**Set {{TAG|ML_FF_ISTART}}=1.
 
**Set {{TAG|NSW}}=1. We will only need a single new test structure.
 
**The {{TAG|ML_FF_WTOTEN}} test will be added by the runscript.
 
*Use the bash script ''run_param_opt.sh'' (or use your own) which contains the following lines:
 
 
  #!/bin/bash
 
  #!/bin/bash
 
   
 
   
Line 574: Line 352:
 
     awk -v var="$i" '/Error in energy \(eV atom\^-1\)\:/{energy=$6} /Error in force  \(eV Angst\^-1\)\:/ {force=$6} /Error in stress \(kB\)\:/{stress=$5} /Energy SD. \(eV atom\^-1\)\:/{energy_var=$5}  /Force SD. \(eV Angst\^-1\)\:/{force_var=$5} /Stress SD. \(kB\)\:/ {stress_var=$4} END{print var,energy/energy_var+force/force_var+stress/stress_var}' ML_LOGFILE.$i >> error_vs_eweight.dat
 
     awk -v var="$i" '/Error in energy \(eV atom\^-1\)\:/{energy=$6} /Error in force  \(eV Angst\^-1\)\:/ {force=$6} /Error in stress \(kB\)\:/{stress=$5} /Energy SD. \(eV atom\^-1\)\:/{energy_var=$5}  /Force SD. \(eV Angst\^-1\)\:/{force_var=$5} /Stress SD. \(kB\)\:/ {stress_var=$4} END{print var,energy/energy_var+force/force_var+stress/stress_var}' ML_LOGFILE.$i >> error_vs_eweight.dat
 
  done
 
  done
The user needs to set the executable path ''$executable_path'' in this directory (and maybe the executable name).
+
To run this script you need to
This script adds the weight for the energy ''ML_FF_WTOTEN = $i'' to the {{TAG|INCAR}} file, where ''$i'' runs from 1.0 to 5.0 with a step size of 0.5. After that a single step calculation is performed. The errors compared to ab initio and standard deviations for energies, forces and stress are output to the {{TAG|ML_LOGFILE}} and from that file the sum of the normalized errors is plotted against the weight of the energy. The normalization is done by dividing the errors by the standard deviation for each component.  
+
export executable_path=''path to the vasp executable''
 +
export np=''number of processes you want to use''
 +
This script sets the weight of the energy {{TAG|ML_FF_WTOTEN}} to values between 1.0 and 5.0. It calculates a single structure and compares the error of energy, forces and stress between ab initio calculation and force field. These errors are normalized with respect to the standard deviations of these quantities. The sum of the normalized errors is plotted against the weight of the energy with the following command
 +
gnuplot -e "set terminal jpeg; set xlabel 'Energy weight'; set ylabel 'Error'; set style data lines; plot 'error_vs_eweight.dat'" > error_vs_eweight.jpg
  
The normalized total error vs. weight for the energy is plotted from the file ''error_vs_eweight.dat'':
+
It should look similar to
gnuplot -e "set terminal jpeg; set xlabel 'Energy weight'; set ylabel 'Error'; set style data lines; plot 'error_vs_eweight.dat'" > error_vs_eweight.jpg
 
It should look similar to:
 
  
 
[[File:Error vs eweight.jpg|400px]]
 
[[File:Error vs eweight.jpg|400px]]
  
From this plot we see that an ideal value is around {{TAG|ML_FF_WTOTEN}}=2.5. Next we take a look at the actual errors for that weighting parameter in the file ''ML_LOGFILE.2.5'':
+
From this plot we see that an ideal value is around {{TAG|ML_FF_WTOTEN}}=2.5. Next we take a look at the actual errors for that weighting parameter in the file ''ML_LOGFILE.2.5'':
 
  ====================================================================================================
 
  ====================================================================================================
 
       Information on error estimations
 
       Information on error estimations
Line 603: Line 382:
 
                         Spilling factor (-):    0.000368    0.020000
 
                         Spilling factor (-):    0.000368    0.020000
 
  ====================================================================================================
 
  ====================================================================================================
In this example the decrease of the energy is only small but in many examples this can lead to signifcant increase in the precision.
+
In this example the decrease of the energy is only small but in many examples this can lead to significant increase in the precision.
 
+
Compare these values to the errors in ''ML_LOGFILE_5.0'':
Here are the errors for ''ML_LOGFILE_5.0'':
 
 
  ====================================================================================================
 
  ====================================================================================================
 
       Information on error estimations
 
       Information on error estimations
Line 615: Line 393:
 
                         Spilling factor (-):    0.000368    0.020000
 
                         Spilling factor (-):    0.000368    0.020000
 
  ====================================================================================================
 
  ====================================================================================================
We see that for higher values of the weight the energy is still getting more accurate but the forces and stress are getting loosing accuracy at a higher, such that the overall error gets worse. We advise to use the optimal value.
+
We see that for higher values of the weight the energy is even more accurate, but at the expense of the forces and the stress. Consequently the normalized error increases so that the trajectories during the MD simulation may suffer. We advise not to push the weight of the energy to these extreme values and instead use the value optimizing the overall error.
  
If needed this waiting can be done in a similar way for the forces and stress.
+
If needed this weighting can be done in a similar way for the forces and stress.
  
 
=== Production runs using the optimized parameters ===
 
=== Production runs using the optimized parameters ===
  
The production runs using the optimized parameters are run as normal calculations only using an additional line for the weight on energy. We repeat here which steps have to be performed:
+
In the production run, we rely on the structure dataset and the corresponding accurate force field
*Copy {{TAG|CONTCAR}} to {{TAG|POSCAR}} if needed.
+
cp {{TAGBL|CONTCAR}} {{TAG|POSCAR}}
*Copy {{TAG|ML_ABNCAR}} to {{TAG|ML_ABCAR}}.
+
cp {{TAGBL|ML_ABNCAR}} {{TAG|ML_ABCAR}}.
*Copy {{TAG|ML_FFNCAR}} to {{TAG|ML_FFCAR}}.
+
cp {{TAGBL|ML_FFNCAR}} {{TAG|ML_FFCAR}}.
*Modify {{TAG|INCAR}} file:
+
In the {{TAG|INCAR}} file, we switch of the updating of the force field and use the optimized energy
**Set {{TAG|ML_FF_ISTART}}=2.
+
{{TAGBL|ML_FF_ISTART}} = 2
**Set {{TAG|ML_FF_WTOTEN}}=2.5.
+
{{TAGBL|ML_FF_WTOTEN}} = 2.5
**Set necessary molecular dynamics parameters (like e.g. {{TAG|NSW}} etc.).
+
and adjust the necessary molecular dynamics parameters (like e.g. {{TAG|NSW}} etc.).
  
'''User task''': Calculate the pair correlation function as above with and without machine learning starting from the same {{TAG|POSCAR}} and compare the results.
+
;Exercise: Calculate the pair correlation function as above with the machine learned force field and compare to results obtained ab initio when using the same {{TAG|POSCAR}}.
  
 
== Download ==
 
== Download ==
[http://www.vasp.at/vasp-workshop/examples/Liquid_Si_MLFF.tgz Liquid_Si_MLFF.tgz]
+
[[Media:MLFF Liquid Si.tgz| MLFF_Liquid_Si.tgz]]
  
 
{{Template:Machine learning force field - Tutorial}}
 
{{Template:Machine learning force field - Tutorial}}
  
 
[[Category:Examples]]
 
[[Category:Examples]]

Latest revision as of 09:07, 14 November 2019

Task

Generating a machine learning force field for liquid Si. For this tutorial, we expect that the user is already familiar with running conventional ab initio molecular dynamic calculations.

Input

POSCAR

In this example we start from a 64 atom super cell of diamond-fcc Si (the same as in Liquid Si - Standard MD):

Si cubic diamond 2x2x2 super cell of conventional cell
     5.43090000000000
    2.00000000   0.00000000   0.00000000
    0.00000000   2.00000000   0.00000000
    0.00000000   0.00000000   2.00000000
   Si
   64
Direct
   0.00000000   0.00000000   0.00000000
   0.50000000   0.00000000   0.00000000
   0.00000000   0.50000000   0.00000000
   0.50000000   0.50000000   0.00000000
   0.00000000   0.00000000   0.50000000
   0.50000000   0.00000000   0.50000000
   0.00000000   0.50000000   0.50000000
   0.50000000   0.50000000   0.50000000
   0.37500000   0.12500000   0.37500000
   0.87500000   0.12500000   0.37500000
   0.37500000   0.62500000   0.37500000
   0.87500000   0.62500000   0.37500000
   0.37500000   0.12500000   0.87500000
   0.87500000   0.12500000   0.87500000
   0.37500000   0.62500000   0.87500000
   0.87500000   0.62500000   0.87500000
   0.00000000   0.25000000   0.25000000
   0.50000000   0.25000000   0.25000000
   0.00000000   0.75000000   0.25000000
   0.50000000   0.75000000   0.25000000
   0.00000000   0.25000000   0.75000000
   0.50000000   0.25000000   0.75000000
   0.00000000   0.75000000   0.75000000
   0.50000000   0.75000000   0.75000000
   0.37500000   0.37500000   0.12500000
   0.87500000   0.37500000   0.12500000
   0.37500000   0.87500000   0.12500000
   0.87500000   0.87500000   0.12500000
   0.37500000   0.37500000   0.62500000
   0.87500000   0.37500000   0.62500000
   0.37500000   0.87500000   0.62500000
   0.87500000   0.87500000   0.62500000
   0.25000000   0.00000000   0.25000000
   0.75000000   0.00000000   0.25000000
   0.25000000   0.50000000   0.25000000
   0.75000000   0.50000000   0.25000000
   0.25000000   0.00000000   0.75000000
   0.75000000   0.00000000   0.75000000 
   0.25000000   0.50000000   0.75000000
   0.75000000   0.50000000   0.75000000
   0.12500000   0.12500000   0.12500000
   0.62500000   0.12500000   0.12500000
   0.12500000   0.62500000   0.12500000
   0.62500000   0.62500000   0.12500000
   0.12500000   0.12500000   0.62500000
   0.62500000   0.12500000   0.62500000
   0.12500000   0.62500000   0.62500000
   0.62500000   0.62500000   0.62500000
   0.25000000   0.25000000   0.00000000
   0.75000000   0.25000000   0.00000000
   0.25000000   0.75000000   0.00000000
   0.75000000   0.75000000   0.00000000
   0.25000000   0.25000000   0.50000000
   0.75000000   0.25000000   0.50000000
   0.25000000   0.75000000   0.50000000
   0.75000000   0.75000000   0.50000000
   0.12500000   0.37500000   0.37500000
   0.62500000   0.37500000   0.37500000
   0.12500000   0.87500000   0.37500000
   0.62500000   0.87500000   0.37500000
   0.12500000   0.37500000   0.87500000
   0.62500000   0.37500000   0.87500000
   0.12500000   0.87500000   0.87500000
   0.62500000   0.87500000   0.87500000

KPOINTS

We will start with a single k point in this example:

K-Points
 0
Gamma
 1  1  1
 0  0  0

INCAR

#Basic parameters
ISMEAR = 0
SIGMA = 0.1
LREAL = Auto
ISYM = -1
NELM = 100
EDIFF = 1E-4
LWAVE = .FALSE.
LCHARG = .FALSE.

#Parallelization of ab initio calculations
NCORE = 2

#MD
IBRION = 0
MDALGO = 2
ISIF = 2
SMASS = 1.0
TEBEG = 2000
NSW = 10000
POTIM = 3.0

#Machine learning paramters
ML_FF_LMLFF = .TRUE.
ML_FF_ISTART = 0
ML_FF_EATOM = -.70128086E+00
ML_FF_LMLFF = .TRUE.
switches on the machine learning of the force field
ML_FF_ISTART = 0
corresponds to learning from scratch
ML_FF_EATOM = -.70128086E+00
sets the atomic reference energy for each species. We show to obtain this energy below.

Calculation

Reference energy

To obtain the force field for liquid Si, we require the atomic energy of a single Si atom. We create a new working directory for this calculation

 mkdir Si_ATOM
 cd Si_ATOM
 ln -s ../POTCAR

In this directory, we create the following INCAR file

#Basic parameters
ISMEAR = 0
SIGMA = 0.1
LREAL = Auto
ISYM = 0
NELM = 100
EDIFF = 1E-4
LWAVE = .FALSE.
LCHARG = .FALSE.
ISPIN = 2

The POSCAR file contains a single atom in a large enough box (the box should be orthorhombic to have enough degrees of freedom for electronic relaxation)

Si atom
     1.00090000000000
    12.00000000   0.00000000   0.00000000
     0.00000000  12.01000000   0.00000000
     0.00000000   0.00000000  12.02000000
   Si
   1
Direct
   0.00000000   0.00000000   0.00000000

Due to the large unit cell, we can use a Gamma point calculation set up in the KPOINTS file

Gamma point only
0 0 0
Gamma
 1 1 1
 0 0 0

After running the calculation, we obtain the total energy from the OUTCAR or OSZICAR file. We use this energy for the ML_FF_EATOM tag. If multiple atom types are present in the structure, this step has to be repeated for each atom type separately and the reference energies are provided as a list after the ML_FF_EATOM tag. The ordering of the energies corresponds to the ordering of the elements in the POTCAR file.

Creating the liquid structure

Because we don't have a structure of liquid silicon readily available, we first create that structure by starting from a super cell of crystalline silicon with 64 atoms. The temperature is set to 2000 K so that the crystal melts rapidly in the MD run. To improve the simulation speed drastically, we utilize the on-the-fly machine learning. Most of the ab initio steps will be replaced by very fast force-field ones. Within 10000 steps equivalent to 30 ps, we have obtained a good starting position for the subsequent simulations in the CONTCAR file. You can copy the input files or download them.

After running the calculation, we obtained a force field, but its initial trajectory might be tainted but the unreasonable starting position. Nevertheless, it is instructive to inspect the output to understand how to assess the accuracy of a force field, before refining it in subsequent calculations. The main output files for the machine learning are

ML_ABNCAR
contains the ab initio structure datasets used for the learning. It will be needed for continuation runs as ML_ABCAR.
ML_FFNCAR
contains the regression results (weights, parameters, etc.). It will be needed for continuation runs as ML_FFCAR.
ML_LOGFILE
logging the proceedings of the machine learning. With ML_FF_NWRITE = 2 set, this file lists the error between force-field and ab initio calculation for energy, forces and the stress. The entry for the last step should be similar to the following (the inherent randomness of a molecular dynamics simulation will lead to slightly different results):
====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.016994
             Error in force  (eV Angst^-1):     0.247664
                      Error in stress (kB):     5.368218
               Bayesian error (eV Angst-1):     0.064904    0.127195
                       Spilling factor (-):     0.000690    0.020000
====================================================================================================

You can see that additional information about the Bayesian error and the spilling factor is reported. The first entry of the Bayesian error is the estimated error from our model. If the estimated error exceeds the threshold listed next to it a new structure dataset is generated. The threshold is automatically determined during the calculations. The first and second entry for the spilling factor are the calculated spilling factor and the threshold, respectively. The threshold for the spilling factor is usually kept constant during the calculations.

Structral properties of the force field

To examine the accuracy of structural properties, we compare the deviations between a 3 ps molecular dynamics run using the force field and a full ab initio calculation. For a meaningful comparison, it is best to start from the same initial structure. We will use the liquid structure, we obtained in the previous step and back it up

cp CONTCAR POSCAR.T2000_relaxed

Now, we proceed with the force field calculation and set up the required files

cp POSCAR.T2000_relaxed POSCAR
cp ML_ABNCAR ML_ABCAR
cp ML_FFNCAR ML_FFCAR

To run a shorter simulation using only the force field, we change the following INCAR tags to

ML_FF_ISTART = 2
NSW = 1000

After the calculation finished, we backup the history of the atomic positions

cp XDATCAR XDATCAR.MLFF_3ps

We repeat the same process for the ab initio simulation recovering the same initial structure

cp POSCAR.T2000_relaxed POSCAR

switching the machine learning off in the INCAR file by setting

ML_FF_LMLFF = .FALSE.

We store the trajectory once the calculation completed

cp XDATCAR XDATCAR.AI_3ps

To analyze the pair correlation function, we use the PERL script pair_correlation_function.pl

Click to show/hide pair_correlation_function.pl
#!/usr/bin/perl

use strict;
use warnings;

#configuration for which ensemble average is to be calculated
my $confmin=1;            #starting index of configurations in XDATCAR file for pair correlation function
my $confmax=20000;           #last index of configurations in XDATCAR file for pair correlation function
my $confskip=1;           #stepsize for configuration loop
my $species_1 = 1;        #species 1 for which pair correlation function is going to be calculated
my $species_2 = 1;        #species 2 for which pair correlation function is going to be calculated
#setting radial grid 
my $rmin=0.0;             #minimal value of radial grid
my $rmax=10.0;            #maximum value of radial grid
my $nr=300;                #number of equidistant steps in radial grid
my $dr=($rmax-$rmin)/$nr; #stepsize in radial grid
my $tol=0.0000000001;     #tolerance limit for r->0 
 
my $z=0;                  #counter
my $numelem;              #number of elements
my @elements;             #number of atoms per element saved in list/array
my $lattscale;            #scaling factor for lattice
my @b;                    #Bravais matrix
my $nconf=0;              #number of configurations in XDATCAR file
my @cart;                 #Cartesian coordinates for each atom and configuration
my $atmin_1=0;            #first index of species one
my $atmax_1;              #last index of species one
my $atmin_2=0;            #first index of species two
my $atmax_2;              #last index of species two
my @vol;                  #volume of cell (determinant of Bravais matrix)
my $pi=4*atan2(1, 1);     #constant pi
my $natom=0;              #total number of atoms in cell
my @pcf;                  #pair correlation function (list/array)
my $mult_x=1;             #periodic repetition of cells in x dimension
my $mult_y=1;             #periodic repetition of cells in y dimension
my $mult_z=1;             #periodic repetition of cells in z dimension
my @cart_super;           #Cartesian cells over multiple cells
my @vec_len;              #Length of lattice vectors in 3 spatial coordinates
#my $ensemble_type="NpT";  #Set Npt or NVT. Needs to be set since both have different XDATCAR file.
my $ensemble_type="NVT";  #Set Npt or NVT. Needs to be set since both have different XDATCAR file.
my $av_vol=0;               #Average volume in cell

#reading in XDATCAR file
while (<>)
{
   chomp;
   $_=~s/^/ /;
   my @help=split(/[\t,\s]+/);
   $z++;
   if ($z==2)
   {
      $lattscale = $help[1];
   }
   if ($z==3)
   {
      $b[$nconf+1][1][1]=$help[1]*$lattscale;
      $b[$nconf+1][1][2]=$help[2]*$lattscale;
      $b[$nconf+1][1][3]=$help[3]*$lattscale;
   }
   if ($z==4)
   {
      $b[$nconf+1][2][1]=$help[1]*$lattscale;
      $b[$nconf+1][2][2]=$help[2]*$lattscale;
      $b[$nconf+1][2][3]=$help[3]*$lattscale;
   }
   if ($z==5)
   {
      $b[$nconf+1][3][1]=$help[1]*$lattscale;
      $b[$nconf+1][3][2]=$help[2]*$lattscale;
      $b[$nconf+1][3][3]=$help[3]*$lattscale;
   }
   if ($z==7)
   {
      if ($nconf==0)
      {
         $numelem=@help-1;
         for (my $i=1;$i<=$numelem;$i++)
         {
            $elements[$i]=$help[$i];
            $natom=$natom+$help[$i];
         }
      }
   }
   if ($_=~m/Direct/)
   {
      $nconf=$nconf+1;
      #for NVT ensemble only one Bravais matrix exists, so it has to be copied
      if ($ensemble_type eq "NVT")
      {
         for (my $i=1;$i<=3;$i++)
         { 
            for (my $j=1;$j<=3;$j++)
            {
               $b[$nconf][$i][$j]=$b[1][$i][$j];
            }
         }
      }
      for (my $i=1;$i<=$natom;$i++)
      {
         $_=<>;
         chomp;
         $_=~s/^/ /;
         my @helpat=split(/[\t,\s]+/);
         $cart[$nconf][$i][1]=$b[1][1][1]*$helpat[1]+$b[1][1][2]*$helpat[2]+$b[1][1][3]*$helpat[3];
         $cart[$nconf][$i][2]=$b[1][2][1]*$helpat[1]+$b[1][2][2]*$helpat[2]+$b[1][2][3]*$helpat[3];
         $cart[$nconf][$i][3]=$b[1][3][1]*$helpat[1]+$b[1][3][2]*$helpat[2]+$b[1][3][3]*$helpat[3];
      }
      if ($ensemble_type eq "NpT")
      {
         $z=0;
      }
   } 
   last if eof;
}

if ($confmin>$nconf)
{
   print "Error, confmin larger than number of configurations. Exiting...\n";
   exit;
}
if ($confmax>$nconf)
{
   $confmax=$nconf;
}
 
for (my $i=1;$i<=$nconf;$i++)
{
   #calculate lattice vector lengths
   $vec_len[$i][1]=($b[$i][1][1]*$b[$i][1][1]+$b[$i][1][2]*$b[$i][1][2]+$b[$i][1][3]*$b[$i][1][3])**0.5;
   $vec_len[$i][2]=($b[$i][2][1]*$b[$i][2][1]+$b[$i][2][2]*$b[$i][2][2]+$b[$i][2][3]*$b[$i][2][3])**0.5;
   $vec_len[$i][3]=($b[$i][3][1]*$b[$i][3][1]+$b[$i][3][2]*$b[$i][3][2]+$b[$i][3][3]*$b[$i][3][3])**0.5;
   #calculate volume of cell
   $vol[$i]=$b[$i][1][1]*$b[$i][2][2]*$b[$i][3][3]+$b[$i][1][2]*$b[$i][2][3]*$b[$i][3][1]+$b[$i][1][3]*$b[$i][2][1]*$b[$i][3][2]-$b[$i][3][1]*$b[$i][2][2]*$b[$i][1][3]-$b[$i][3][2]*$b[$i][2][3]*$b[$i][1][1]-$b[$i][3][3]*$b[$i][2][1]*$b[$i][1][2];
   $av_vol=$av_vol+$vol[$i];
}
$av_vol=$av_vol/$nconf;

#choose species 1 for which pair correlation function is going to be calculated
$atmin_1=1;
if ($species_1>1)
{
   for (my $i=1;$i<$species_1;$i++)
   {
     $atmin_1=$atmin_1+$elements[$i];
   }
}
$atmax_1=$atmin_1+$elements[$species_1]-1;
#choose species 2 to which paircorrelation function is calculated to
$atmin_2=1;
if ($species_2>1)
{
   for (my $i=1;$i<$species_2;$i++)
   {
     $atmin_2=$atmin_2+$elements[$i];
   }
}
$atmax_2=$atmin_2+$elements[$species_2]-1;
#initialize pair correlation function
for (my $i=0;$i<=($nr-1);$i++)
{
   $pcf[$i]=0.0;
}
# loop over configurations, make histogram of pair correlation function 
for (my $j=$confmin;$j<=$confmax;$j=$j+$confskip)
{
   for (my $k=$atmin_1;$k<=$atmax_1;$k++)
   {
       for (my $l=$atmin_2;$l<=$atmax_2;$l++)
       {
          if ($k==$l) {next};
          for (my $g_x=-$mult_x;$g_x<=$mult_x;$g_x++)
          {
             for (my $g_y=-$mult_y;$g_y<=$mult_y;$g_y++)
             {
                for (my $g_z=-$mult_y;$g_z<=$mult_z;$g_z++)
                {
                   my $at2_x=$cart[$j][$l][1]+$vec_len[$j][1]*$g_x;
                   my $at2_y=$cart[$j][$l][2]+$vec_len[$j][2]*$g_y;
                   my $at2_z=$cart[$j][$l][3]+$vec_len[$j][3]*$g_z;
                   my $dist=($cart[$j][$k][1]-$at2_x)**2.0+($cart[$j][$k][2]-$at2_y)**2.0+($cart[$j][$k][3]-$at2_z)**2.0;
                   $dist=$dist**0.5;
                   #determine integer multiple 
                   my $zz=int(($dist-$rmin)/$dr+0.5);
                   if ($zz<$nr)
                   {
                      $pcf[$zz]=$pcf[$zz]+1.0;
                   }
                }
             }
          }
       }
   }
}
 
#make ensemble average, rescale functions and print
for (my $i=0;$i<=($nr-1);$i++)
{
   my $r=$rmin+$i*$dr;
   if ($r<$tol)
   {
      $pcf[$i]=0.0;
   }
   else
   {
      $pcf[$i]=$pcf[$i]*$av_vol/(4*$pi*$r*$r*$dr*(($confmax-$confmin)/$confskip)*($atmax_2-$atmin_2+1)*($atmax_1-$atmin_1+1));#*((2.0*$mult_x+1.0)*(2.0*$mult_y+1.0)*(2.0*$mult_z+1.0)));
   }
   print $r," ",$pcf[$i],"\n";
}

and process the previously saved XDATCAR files

perl pair_correlation_function.pl XDATCAR.MLFF_3ps > pair_MLFF_3ps.dat
perl pair_correlation_function.pl XDATCAR.AI_3ps > pair_AI_3ps.dat

We can compare the pair correlation functions, e.g. with gnuplot using the following command

gnuplot -e "set terminal jpeg; set xlabel 'r(Ang)'; set ylabel 'PCF'; set style data lines; plot 'pair_MLFF_3ps.dat', 'pair_AI_3ps.dat' " > PC_MLFF_vs_AI_3ps.jpg

The pair correlation functions obtained that way should look similar to this figure

PC MLFF vs AI 3ps.jpg

We see that pair correlation is quite well reproduced although the error in the force of 0.247664 eV/ shown above is a little bit too large. This error is maybe too large for accurate production calculations (usually an accuracy of approximately 0.1 eV/ is targeted), but since the pair correlation function is well reproduced it is perfectly fine to use this on-the-fly force field in the time-consuming melting of the crystal.

Obtaining a more accurate force field

Including the melting phase in the force field may impact the accuracy of the force field. To improve it is usually advisable to learn on the pure structures, which in our case this means to use the CONTCAR file obtained after the melting. Furthermore, the force field was learned using only a single k point so that we can improve the accuracy of the reference data by including more k points. In most calculations, it is important to conduct accurate ab initio calculations since bad reference data can limit the accuracy or even inhibit the learning of a force field.

We restart from the liquid structure obtained before

cp POSCAR.T2000_relaxed POSCAR

and change the following INCAR tags

ALGO = Normal
ML_FF_LMLFF = .TRUE.
ML_FF_ISTART = 0
NSW = 10000

If you run have resources to run in parallel, you can reduce the computation time further by adding k point parallelization with the KPAR tag. We use a denser k-point mesh in the KPOINTS file

2x2x2 Gamma-centered mesh
0 0 0
Gamma
 2 2 2
 0 0 0

We will learn a new force field on a run of 30 ps. Keep in mind to run the calculation using the standard version of VASP (usually vasp_std). After running the calculation, we examine the error in the ML_LOGFILE, where the obtained values should be close to

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.006079
             Error in force  (eV Angst^-1):     0.165678
                      Error in stress (kB):     2.553148
               Bayesian error (eV Angst-1):     0.044695    0.101357
                       Spilling factor (-):     0.002285    0.020000
====================================================================================================

We immediately see that the errors are significantly lower than in the previous calculation with only one k point. This is due to the less noisy ab initio data which is easier to learn.

To understand how the force field is learned, we inspect the ML_ABNCAR file containing the training data. In the beginning of this file, you will find information about the number of reference structures

**************************************************
     The number of configurations
--------------------------------------------------
         55

and the size of the basis set

**************************************************
     The numbers of basis sets per atom type
--------------------------------------------------
       396

We will monitor how much training data is added to a structure after each learning cycle and what impact this has on the accuracy of the force field. First we will perform a continuation run keeping the Bayesian threshold fixed

cp ML_ABNCAR ML_ABCAR
cp CONTCAR POSCAR

and set the following INCAR tags

ML_FF_ISTART = 1
ML_FF_CTIFOR = x

where x is the threshold of the Bayesian error obtained from the ML_LOGFILE (highlighted bold above; please use the value obtained in your calculation). By setting a good estimate for the threshold several ab initio steps are skipped in the first few steps, which would be required to determine the threshold automatically. Mind: When continuation runs are performed for different crystal structures, the last previous threshold for the Bayesian error might be too large leading to premature skipping of ab initio steps. This is particular relevant when studying the liquid phase, first, and applying its threshold to the solid phase. In that case it is safer to use the default value for ML_FF_CTIFOR = .

After running the calculation, we inspect the last instance of the errors in the ML_LOGFILE, which should look similar to

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.006252
             Error in force  (eV Angst^-1):     0.168549
                      Error in stress (kB):     2.606332
               Bayesian error (eV Angst-1):     0.037569    0.101357
                       Spilling factor (-):     0.001114    0.020000
====================================================================================================

We see that the accuracy has not changed significantly and the threshold for the Bayesian error is set to the one specified in the INCAR file. Looking at the actual errors we observe that the threshold was exceeded only a few times. You can find all these instances with the command

grep "Bayesian error (eV Angst-1):" ML_LOGFILE | awk '{if ($5 > $6) { print $5 }}'

Consequently only very few structures were added to the ab initio data in the ML_ABNCAR file. The entry for the reference structures and basis sets should look similar to

**************************************************
     The number of configurations
--------------------------------------------------
         65
...
**************************************************
     The numbers of basis sets per atom type
--------------------------------------------------
       407

From this observation, we conclude either that the force field is already accurate enough or that the threshold is set too high. In the latter case, the machine would not conduct ab initio calculations because the force field is judged accurate enough according to the threshold.

We compare this behavior to the case of using the default value for ML_FF_CTIFOR. We continue from the existing structure and force field

cp CONTCAR POSCAR
cp ML_ABNCAR ML_ABCAR

Remove the ML_FF_CTIFOR tag from the INCAR file.

The default value for ML_FF_CTIFOR is practically zero, so we expect more structures to be added. We inspect the ML_ABNCAR to investigate how the number of structures and the basis set size changed

 **************************************************
     The number of configurations
--------------------------------------------------
        160
...
**************************************************
     The numbers of basis sets per atom type
--------------------------------------------------
       726

Even though the number of structure datasets and the basis set size increased, the accuracy of the force field did not improve significantly

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.005950
             Error in force  (eV Angst^-1):     0.162064
                      Error in stress (kB):     2.628567
               Bayesian error (eV Angst-1):     0.029723    0.061348
                       Spilling factor (-):     0.000406    0.020000
====================================================================================================

Ideally, one should continue learning until no structures need to be added to the training data and basis set anymore. In practice, the prediction of the Bayesian error exhibits numerical inaccuracies so that an ab initio calculation is conducted even if the force field is accurate enough. Hence, in this tutorial, we stop at this point because further addition of training data does not improve the accuracy on the test structures compared to the ab initio data. For high quality production calculations suited for publications, we advise to learn until almost no data is added to the training structures and basis set. In many cases this involves a runtime of at least 100 ps for learning.

Exercise
Continue learning and observe the convergence of the number of reference structures. Most likely the maximum allowed basis set size ML_FF_MB_MB needs to be increased. How does the pair correlation function compare to ab initio data?

Optimizing weights of energy, forces and stress

Even after a force field has been learned, there is still the possibility to refine it by changing the weights for energy ML_FF_WTOTEN, forces ML_FF_WTIFOR and stress ML_FF_WTSIF. By default all weights are set to 1.0, so energy, force and stress are equally important. Typically the error for a component decreases with increasing weight of that component. For methods where the free energy is important (such as e.g. Interface pinning calculations, Metadynamics etc.) introducing a larger weight for the energy can reduce the error of the energy significantly while only slightly increasing the errors of the forces and the stress. Note that the weights are set relative to each other, i.e., by increasing one the other ones decrease automatically.

We will start from the force field obtained above, but you can use a more refined one if you completed the user task

cp INCAR INCAR.run
cp CONTCAR POSCAR.run

The INCAR.run file will serve as a template for several short calculations with different weights for the energy. Set

ML_FF_ISTART = 1
NSW = 1

to generate a single new test structure.

We can now use a script to set ML_FF_WTOTEN to different values and examine how the average error changes

#!/bin/bash

if [ -f error_vs_eweight.dat ]; then
   rm error_vs_eweight.dat
fi

touch error_vs_eweight.dat

for i in 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
do
   cat INCAR.run > INCAR
   echo "ML_FF_WTOTEN = $i" >> INCAR
   cp POSCAR.run POSCAR
   mpirun -np $np $executable_path/vasp_std
   cp ML_LOGFILE ML_LOGFILE.$i
   awk -v var="$i" '/Error in energy \(eV atom\^-1\)\:/{energy=$6} /Error in force  \(eV Angst\^-1\)\:/ {force=$6} /Error in stress \(kB\)\:/{stress=$5} /Energy SD. \(eV atom\^-1\)\:/{energy_var=$5}  /Force SD. \(eV Angst\^-1\)\:/{force_var=$5} /Stress SD. \(kB\)\:/ {stress_var=$4} END{print var,energy/energy_var+force/force_var+stress/stress_var}' ML_LOGFILE.$i >> error_vs_eweight.dat
done

To run this script you need to

export executable_path=path to the vasp executable
export np=number of processes you want to use

This script sets the weight of the energy ML_FF_WTOTEN to values between 1.0 and 5.0. It calculates a single structure and compares the error of energy, forces and stress between ab initio calculation and force field. These errors are normalized with respect to the standard deviations of these quantities. The sum of the normalized errors is plotted against the weight of the energy with the following command

gnuplot -e "set terminal jpeg; set xlabel 'Energy weight'; set ylabel 'Error'; set style data lines; plot 'error_vs_eweight.dat'" > error_vs_eweight.jpg

It should look similar to

Error vs eweight.jpg

From this plot we see that an ideal value is around ML_FF_WTOTEN=2.5. Next we take a look at the actual errors for that weighting parameter in the file ML_LOGFILE.2.5:

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.005322
             Error in force  (eV Angst^-1):     0.169356
                      Error in stress (kB):     2.849817
               Bayesian error (eV Angst-1):     0.025548    0.000000
                       Spilling factor (-):     0.000368    0.020000
====================================================================================================

We see that the error on energy decreased while it increased for forces and stress compared to ML_LOGFILE.1.0:

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.005954
             Error in force  (eV Angst^-1):     0.162006
                      Error in stress (kB):     2.628128
               Bayesian error (eV Angst-1):     0.028542    0.000000
                       Spilling factor (-):     0.000368    0.020000
====================================================================================================

In this example the decrease of the energy is only small but in many examples this can lead to significant increase in the precision. Compare these values to the errors in ML_LOGFILE_5.0:

====================================================================================================
     Information on error estimations
----------------------------------------------------------------------------------------------------
              Error in energy (eV atom^-1):     0.004692
             Error in force  (eV Angst^-1):     0.179448
                      Error in stress (kB):     3.180711
               Bayesian error (eV Angst-1):     0.023225    0.000000
                       Spilling factor (-):     0.000368    0.020000
====================================================================================================

We see that for higher values of the weight the energy is even more accurate, but at the expense of the forces and the stress. Consequently the normalized error increases so that the trajectories during the MD simulation may suffer. We advise not to push the weight of the energy to these extreme values and instead use the value optimizing the overall error.

If needed this weighting can be done in a similar way for the forces and stress.

Production runs using the optimized parameters

In the production run, we rely on the structure dataset and the corresponding accurate force field

cp CONTCAR POSCAR
cp ML_ABNCAR ML_ABCAR.
cp ML_FFNCAR ML_FFCAR.

In the INCAR file, we switch of the updating of the force field and use the optimized energy

ML_FF_ISTART = 2
ML_FF_WTOTEN = 2.5

and adjust the necessary molecular dynamics parameters (like e.g. NSW etc.).

Exercise
Calculate the pair correlation function as above with the machine learned force field and compare to results obtained ab initio when using the same POSCAR.

Download

MLFF_Liquid_Si.tgz