Liquid Si - MLFF
Task
Generating a machine learning force field for liquid Si.
Input
POSCAR
- In this example we start from a 64 atom super cell of diamond-fcc Si (the same as in this example: Liquid Si - Standard MD:
Si cubic diamond 2x2x2 super cell of conventional cell 5.43090000000000 2.00000000 0.00000000 0.00000000 0.00000000 2.00000000 0.00000000 0.00000000 0.00000000 2.00000000 Si 64 Direct 0.00000000 0.00000000 0.00000000 0.50000000 0.00000000 0.00000000 0.00000000 0.50000000 0.00000000 0.50000000 0.50000000 0.00000000 0.00000000 0.00000000 0.50000000 0.50000000 0.00000000 0.50000000 0.00000000 0.50000000 0.50000000 0.50000000 0.50000000 0.50000000 0.37500000 0.12500000 0.37500000 0.87500000 0.12500000 0.37500000 0.37500000 0.62500000 0.37500000 0.87500000 0.62500000 0.37500000 0.37500000 0.12500000 0.87500000 0.87500000 0.12500000 0.87500000 0.37500000 0.62500000 0.87500000 0.87500000 0.62500000 0.87500000 0.00000000 0.25000000 0.25000000 0.50000000 0.25000000 0.25000000 0.00000000 0.75000000 0.25000000 0.50000000 0.75000000 0.25000000 0.00000000 0.25000000 0.75000000 0.50000000 0.25000000 0.75000000 0.00000000 0.75000000 0.75000000 0.50000000 0.75000000 0.75000000 0.37500000 0.37500000 0.12500000 0.87500000 0.37500000 0.12500000 0.37500000 0.87500000 0.12500000 0.87500000 0.87500000 0.12500000 0.37500000 0.37500000 0.62500000 0.87500000 0.37500000 0.62500000 0.37500000 0.87500000 0.62500000 0.87500000 0.87500000 0.62500000 0.25000000 0.00000000 0.25000000 0.75000000 0.00000000 0.25000000 0.25000000 0.50000000 0.25000000 0.75000000 0.50000000 0.25000000 0.25000000 0.00000000 0.75000000 0.75000000 0.00000000 0.75000000 0.25000000 0.50000000 0.75000000 0.75000000 0.50000000 0.75000000 0.12500000 0.12500000 0.12500000 0.62500000 0.12500000 0.12500000 0.12500000 0.62500000 0.12500000 0.62500000 0.62500000 0.12500000 0.12500000 0.12500000 0.62500000 0.62500000 0.12500000 0.62500000 0.12500000 0.62500000 0.62500000 0.62500000 0.62500000 0.62500000 0.25000000 0.25000000 0.00000000 0.75000000 0.25000000 0.00000000 0.25000000 0.75000000 0.00000000 0.75000000 0.75000000 0.00000000 0.25000000 0.25000000 0.50000000 0.75000000 0.25000000 0.50000000 0.25000000 0.75000000 0.50000000 0.75000000 0.75000000 0.50000000 0.12500000 0.37500000 0.37500000 0.62500000 0.37500000 0.37500000 0.12500000 0.87500000 0.37500000 0.62500000 0.87500000 0.37500000 0.12500000 0.37500000 0.87500000 0.62500000 0.37500000 0.87500000 0.12500000 0.87500000 0.87500000 0.62500000 0.87500000 0.87500000
KPOINTS
- We will start with a single k point in this example:
K-Points 0 Gamma 1 1 1 0 0 0
INCAR
#Basic parameters ISMEAR = 0 SIGMA = 0.1 LREAL = Auto PREC = FAST ALGO = FAST ISYM = -1 NELM = 100 EDIFF = 1E-4 LWAVE = .FALSE. LCHARG = .FALSE. #Parallelization of ab initio calculations NCORE = 2 #MD IBRION = 0 MDALGO = 2 ISIF = 2 SMASS = 1.0 TEBEG = 2000 NSW = 10000 POTIM = 3.0 #Machine learning paramters ML_FF_LMLFF = .TRUE. ML_FF_ISTART = 0 ML_FF_NWRITE = 2 ML_FF_EATOM = -.70128086E+00
- The user should be familiar at this step how to run a basic molecular dynamics calculations. If not please go through the example here: Liquid Si - Standard MD.
- Machine learning is switched on by setting the following tag: ML_FF_LMLFF=.TRUE..
- By setting the tag ML_FF_ISTART to zero learning from scratch is selected.
- The flag ML_FF_NWRITE=2 selects a more verbose output where the error on energies, forces and stress are output to the ML_LOGFILE file. This setting is very handy to check the accuracy of the force field.
- The tag ML_FF_EATOM=-.70128086E+00 sets the atomic reference energy for each species. How to obtain that energy is explained below.
Calculation
Reference energy
Before the force field for liquid Si can be calculated, the atomic energy of a single Si atom in a large enough box has to be calculated. For that the following steps have to be done:
- Create a new directory Si_ATOM by typing mkdir Si_ATOM and go to that directory cd Si_ATOM.
- Create an INCAR file with the following parameters:
#Basic parameters ISMEAR = 0 SIGMA = 0.1 LREAL = Auto PREC = FAST ALGO = FAST ISYM = 0 NELM = 100 EDIFF = 1E-4 LWAVE = .FALSE. LCHARG = .FALSE. ISPIN = 2
- Create a POSCAR file with a single atom in a large enough box (the box should be orthorombic to have enough degrees of freedom for electronic relaxation):
Si atom 1.00090000000000 12.00000000 0.00000000 0.00000000 0.00000000 12.01000000 0.00000000 0.00000000 0.00000000 12.02000000 Si 1 Direct 0.00000000 0.00000000 0.00000000
- Create a KPOINTS file with a single k point (or copy the one delivered with this example):
test 0 0 0 Gamma 1 1 1 0 0 0
- Copy the POTCAR from the previous directory to this directory (cp ../POTCAR).
- Run the calculation and look at the total energy in the OUTCAR file (or OSZICAR). AFter that switch back to the original directory. That energy will be used for the ML_FF_EATOM tag. If multiple atom types are present in the structure than this step has to be repeated for each atom type separately and the reference energies are provided as a list after the ML_FF_EATOM tag, where the ordering of the energies corresponds to the ordering of the elements in the POTCAR file.
Creating the liquid structure
We will start this example from a perfect super cell of crystalline silicon (fcc diamond structure) containing 64 atoms. The temperature is set to 2000 K so that when an MD is run the melting should occur relatively fast. We will do the melting with on-the-fly learning. This will greatly accelerate the melting since after some time most of the ab initio calculations are skipped and the very fast force field takes over. This calculation will be executed for 10000 steps with a step size of 3 fs.
Please run now the calculation.
After running the calculation we should obtain a fairly good liquid in the CONTCAR file.
As a side effect we have also learned a force field, but with maybe a quite bad trajectory at the beginning. Usually it is more systematic to learn on the pure structures. In our case this means to use the CONTCAR file obtained after the melting. Also the force field was only learned using a single k point. To obtain a better accuracy we will use more k points. So after this step we will start the learning from scratch using the CONTCAR obtained so and using more accurate parameters.
But before we do that we will take a look at the accuracy of the force obtained now.
The main output files for the machine learning are:
- ML_LOGFILE: This contains the main output of the machine learning. Since we use ML_FF_NWRITE=2 the error on energy, forces and stress of the force field compared to ab initio is also written out on this file for every step.
- ML_ABNCAR: This contains the ab initio data used for the learning. It will be needed for continuation runs as ML_ABCAR.
- ML_FFNCAR: This contains the regression results (weights, parameters, etc.). It will be needed for continuation runs as ML_FFCAR.
In the ML_LOGFILE we will look for the last entry on the error of energy, forces and stress, Bayesian error and spilling factor, which should look similar to this (since we run a molecular dynamics calculation in parallel on different computers actual results can deviate a little):
==================================================================================================== Information on error estimations ---------------------------------------------------------------------------------------------------- Error in energy (eV atom^-1): 0.011371 Error in force (eV Angst^-1): 0.160309 Error in stress (kB): 2.387189 Bayesian error (eV Angst-1): 0.142874 0.101240 Spilling factor (-): 0.006869 0.020000 ====================================================================================================
The first entry of the Bayesian error is the estimated error from our model. The second entry is the error threshold. In our case it is newly determined during the calculations. The first and second entry for the spilling factor are the calculated spilling factor and the threshold, respectively. The threshold for the spilling factor is usually always kept constant during the calculations.
Next we will look at the accuracy of structural properties of the force field. For the we first run a 3000 fs molecular dynamics calculation with and without the force field starting from the new CONTCAR file. First we save that new CONTCAR file to
cp CONTCAR POSCAR.T2000_relaxed
Now do the following steps to run the force field calculation:
- Copy POSCAR.T2000_relaxed to POSCAR.
- Copy ML_ABNCAR to ML_ABCAR.
- Copy ML_FFNCAR to ML_FFCAR.
- Change INCAR tags:
- Change ML_FF_ISTART=0 to ML_FF_ISTART=2: This will switch of learning and only use the machine-learned force field.
- Change NSW=10000 to NSW=1000.
- Run calculation.
- Copy XDATCAR to XDATCAR.MLFF_3ps.
Now do the following steps to run the ab initio reference calculation:
- Copy POSCAR.T2000_relaxed to POSCAR.
- Change ML_FF_LMLFF=.TRUE. to {TAG|ML_FF_LMLFF}}=.FALSE.: This will turn off the machine learning completely.
- Run caclulation.
- Copy XDATCAR to XDATCAR.AI_3ps.
To analyze the pair correlation function use the PERL script pair_correlation_function.pl:
#!/usr/bin/perl use strict; use warnings; #configuration for which ensemble average is to be calculated my $confmin=1; #starting index of configurations in XDATCAR file for pair correlation function my $confmax=20000; #last index of configurations in XDATCAR file for pair correlation function my $confskip=1; #stepsize for configuration loop my $species_1 = 1; #species 1 for which pair correlation function is going to be calculated my $species_2 = 1; #species 2 for which pair correlation function is going to be calculated #setting radial grid my $rmin=0.0; #minimal value of radial grid my $rmax=10.0; #maximum value of radial grid my $nr=300; #number of equidistant steps in radial grid my $dr=($rmax-$rmin)/$nr; #stepsize in radial grid my $tol=0.0000000001; #tolerance limit for r->0 my $z=0; #counter my $numelem; #number of elements my @elements; #number of atoms per element saved in list/array my $lattscale; #scaling factor for lattice my @b; #Bravais matrix my $nconf=0; #number of configurations in XDATCAR file my @cart; #Cartesian coordinates for each atom and configuration my $atmin_1=0; #first index of species one my $atmax_1; #last index of species one my $atmin_2=0; #first index of species two my $atmax_2; #last index of species two my @vol; #volume of cell (determinant of Bravais matrix) my $pi=4*atan2(1, 1); #constant pi my $natom=0; #total number of atoms in cell my @pcf; #pair correlation function (list/array) my $mult_x=1; #periodic repetition of cells in x dimension my $mult_y=1; #periodic repetition of cells in y dimension my $mult_z=1; #periodic repetition of cells in z dimension my @cart_super; #Cartesian cells over multiple cells my @vec_len; #Length of lattice vectors in 3 spatial coordinates #my $ensemble_type="NpT"; #Set Npt or NVT. Needs to be set since both have different XDATCAR file. my $ensemble_type="NVT"; #Set Npt or NVT. Needs to be set since both have different XDATCAR file. my $av_vol=0; #Average volume in cell #reading in XDATCAR file while (<>) { chomp; $_=~s/^/ /; my @help=split(/[\t,\s]+/); $z++; if ($z==2) { $lattscale = $help[1]; } if ($z==3) { $b[$nconf+1][1][1]=$help[1]*$lattscale; $b[$nconf+1][1][2]=$help[2]*$lattscale; $b[$nconf+1][1][3]=$help[3]*$lattscale; } if ($z==4) { $b[$nconf+1][2][1]=$help[1]*$lattscale; $b[$nconf+1][2][2]=$help[2]*$lattscale; $b[$nconf+1][2][3]=$help[3]*$lattscale; } if ($z==5) { $b[$nconf+1][3][1]=$help[1]*$lattscale; $b[$nconf+1][3][2]=$help[2]*$lattscale; $b[$nconf+1][3][3]=$help[3]*$lattscale; } if ($z==7) { if ($nconf==0) { $numelem=@help-1; for (my $i=1;$i<=$numelem;$i++) { $elements[$i]=$help[$i]; $natom=$natom+$help[$i]; } } } if ($_=~m/Direct/) { $nconf=$nconf+1; #for NVT ensemble only one Bravais matrix exists, so it has to be copied if ($ensemble_type eq "NVT") { for (my $i=1;$i<=3;$i++) { for (my $j=1;$j<=3;$j++) { $b[$nconf][$i][$j]=$b[1][$i][$j]; } } } for (my $i=1;$i<=$natom;$i++) { $_=<>; chomp; $_=~s/^/ /; my @helpat=split(/[\t,\s]+/); $cart[$nconf][$i][1]=$b[1][1][1]*$helpat[1]+$b[1][1][2]*$helpat[2]+$b[1][1][3]*$helpat[3]; $cart[$nconf][$i][2]=$b[1][2][1]*$helpat[1]+$b[1][2][2]*$helpat[2]+$b[1][2][3]*$helpat[3]; $cart[$nconf][$i][3]=$b[1][3][1]*$helpat[1]+$b[1][3][2]*$helpat[2]+$b[1][3][3]*$helpat[3]; } if ($ensemble_type eq "NpT") { $z=0; } } last if eof; } if ($confmin>$nconf) { print "Error, confmin larger than number of configurations. Exiting...\n"; exit; } if ($confmax>$nconf) { $confmax=$nconf; } for (my $i=1;$i<=$nconf;$i++) { #calculate lattice vector lengths $vec_len[$i][1]=($b[$i][1][1]*$b[$i][1][1]+$b[$i][1][2]*$b[$i][1][2]+$b[$i][1][3]*$b[$i][1][3])**0.5; $vec_len[$i][2]=($b[$i][2][1]*$b[$i][2][1]+$b[$i][2][2]*$b[$i][2][2]+$b[$i][2][3]*$b[$i][2][3])**0.5; $vec_len[$i][3]=($b[$i][3][1]*$b[$i][3][1]+$b[$i][3][2]*$b[$i][3][2]+$b[$i][3][3]*$b[$i][3][3])**0.5; #calculate volume of cell $vol[$i]=$b[$i][1][1]*$b[$i][2][2]*$b[$i][3][3]+$b[$i][1][2]*$b[$i][2][3]*$b[$i][3][1]+$b[$i][1][3]*$b[$i][2][1]*$b[$i][3][2]-$b[$i][3][1]*$b[$i][2][2]*$b[$i][1][3]-$b[$i][3][2]*$b[$i][2][3]*$b[$i][1][1]-$b[$i][3][3]*$b[$i][2][1]*$b[$i][1][2]; $av_vol=$av_vol+$vol[$i]; } $av_vol=$av_vol/$nconf; #choose species 1 for which pair correlation function is going to be calculated $atmin_1=1; if ($species_1>1) { for (my $i=1;$i<$species_1;$i++) { $atmin_1=$atmin_1+$elements[$i]; } } $atmax_1=$atmin_1+$elements[$species_1]-1; #choose species 2 to which paircorrelation function is calculated to $atmin_2=1; if ($species_2>1) { for (my $i=1;$i<$species_2;$i++) { $atmin_2=$atmin_2+$elements[$i]; } } $atmax_2=$atmin_2+$elements[$species_2]-1; #initialize pair correlation function for (my $i=0;$i<=($nr-1);$i++) { $pcf[$i]=0.0; } # loop over configurations, make histogram of pair correlation function for (my $j=$confmin;$j<=$confmax;$j=$j+$confskip) { for (my $k=$atmin_1;$k<=$atmax_1;$k++) { for (my $l=$atmin_2;$l<=$atmax_2;$l++) { if ($k==$l) {next}; for (my $g_x=-$mult_x;$g_x<=$mult_x;$g_x++) { for (my $g_y=-$mult_y;$g_y<=$mult_y;$g_y++) { for (my $g_z=-$mult_y;$g_z<=$mult_z;$g_z++) { my $at2_x=$cart[$j][$l][1]+$vec_len[$j][1]*$g_x; my $at2_y=$cart[$j][$l][2]+$vec_len[$j][2]*$g_y; my $at2_z=$cart[$j][$l][3]+$vec_len[$j][3]*$g_z; my $dist=($cart[$j][$k][1]-$at2_x)**2.0+($cart[$j][$k][2]-$at2_y)**2.0+($cart[$j][$k][3]-$at2_z)**2.0; $dist=$dist**0.5; #determine integer multiple my $zz=int(($dist-$rmin)/$dr+0.5); if ($zz<$nr) { $pcf[$zz]=$pcf[$zz]+1.0; } } } } } } } #make ensemble average, rescale functions and print for (my $i=0;$i<=($nr-1);$i++) { my $r=$rmin+$i*$dr; if ($r<$tol) { $pcf[$i]=0.0; } else { $pcf[$i]=$pcf[$i]*$av_vol/(4*$pi*$r*$r*$dr*(($confmax-$confmin)/$confskip)*($atmax_2-$atmin_2+1)*($atmax_1-$atmin_1+1));#*((2.0*$mult_x+1.0)*(2.0*$mult_y+1.0)*(2.0*$mult_z+1.0))); } print $r," ",$pcf[$i],"\n"; }
Obtain the pair correlation function from the previously saved XDATCAR files:
perl pair_correlation_function.pl XDATCAR.MLFF_3ps > pair_MLFF_3ps.dat perl pair_correlation_function.pl XDATCAR.MLFF_3ps > pair_AI_3ps.dat
Plot the pair correlation functions using the following command:
gnuplot -e "set terminal jpeg; set xlabel 'r(Ang)'; set ylabel 'PCF'; set style data lines; plot 'pair_MLFF_3ps.dat', 'pair_AI_3ps.dat' " > PC_MLFF_vs_AI_3ps.jpg