Here we present the theory for on-the-fly machine learning force fields, which will be only available from VASP6.0 or higher. The theory will be presented in a very condensed manner and for a more detailed description of the methods we refer the readers to Refs. and .
Fig. 1: On-the-fly machine learning force field generation scheme.
Before presenting the algorithm we want to define the following terms:
- Structure datasets: A single structure dataset consist of the Bravais lattice, the atomic positions, the total energy, the forces and the stress tensor for one specific structure calculated by first principles.
- Local configurations: Local configuration around each atom mapped onto descriptors. Several structure datasets and local configurations are selected and the machine-learning force field is fitted to those.
The on-the-fly force field generation scheme is given by the following steps (a flowchart of the algorithm is shown in Fig. 1):
- The machine predicts the energy, forces and stress tensor and their uncertainties on a given structure using existing force field.
- The machine decides whether to do first principles calculations or not. If machine decides not to do then algorithm skips steps 3 and 4.
- The first principles calculation is carried out and the data is stored as a candidate for new reference dataset.
- If the number of newly collected structures reaches a certain threshold, or if the uncertainty in the prediction becomes too large, the machine updates the set of reference structure datasets and local reference configurations and creates a new force field.
- Atomic positions are velocities are updated before returning to step 1 or ending the calculation if the total number of ionic steps is reached. The update is done in the following way:
- If machine judges that the force field is not accurate enough, the first principles energy, forces and stress tensor are used.
- Otherwise the force field is used.
The potential energy of a structure with atoms is approximated as
The local energies are functionals of the probability density to find another atom at the position around the atom within a cut-off radius defined as
Here is a cut-off function acting outside of . In the Smooth Overlap of Atomic Positions (SOAP) method the delta function is approximated as
Fig. 2: Radial and angular descriptors.
Unfortunately is not rotationally invariant. To deal with this problem intermediate functions or descriptors depending on possessing rotational invariance are introduced:
This is the simplest descriptor which relies on the radial distribution function
where denotes the unit vector the vector between atoms and [see Fig. 2 (a)].
The Radial descriptor can also be regarded as a two-body descriptor.
In most cases the radial descriptor is not enough to distinguish different probability densities , since two different can yield the same . To improve on this angular information between two radial descriptors is also incorporated within an angular descriptor
where denotes the angle between two vectors and [see Fig. 2 (b)]. The function is also commonly referred to as angular distribution function and it is equivalent to the power spectrum used in the Gaussian Approximation Potential (GAP) method.
Basis set expansion
The atomic probability density can be also expanded in terms of basis functions
where , and denote expansion coefficients, radial basis functions and spherical harmonics, respectively. The indices , and denote main, angular and magnetic quantum numbers, respectively.
By using the above equation the radial descriptor and angular descriptor (power spectrum) can be written as
where represent Legendre polynomials of order .
In many cases is multiplied with an angular filtering function , which can noticably reduce the necessary basis set size without losing accuracy in the calculations
where is a paramater controling the extent of the filtering. A larger value for the parameter should provide more filtering.
Potential energy fitting
It is convenient to express the local potential energy of atom in structure in terms of linear coefficients and a kernel as follows
where is the basis set size. The kernel measures the similarity between a local configuration from the training set and basis set . Using the radial and angular descriptors it is written as
Here the vectors and contain all coefficients and , respectively. The notation indicates that it is a normalized vector of . The parameters and control the weighting of the radial and angular terms, respectively. The parameter controls the sharpness of the kernel and the order of the many-body interactions.
Matrix vector form of linear equations
Similarly to the energy the forces and the stress tensor are also described as linear functions of the coefficients . All three are fitted simultaneously which leads to following matrix-vector form
where is a super vector consisting of the sub vectors . Here each contains the first principle energies per atom, forces and stress tensors for each structure . denotes the total number of structures. The size of is .
The matrix is also called as design matrix. The rows of this matrix are blocked for each structure , where the first line of each block consists of the kernel used to calculate the energy. The subsequent lines consist of the derivatives of the kernel with respect to the atomic coordinates used to calculate the forces. The final 6 lines within each structure consist of the derivatives of the kernel with respect to the unit cell coordinates used to calculate the stress tensor components. The overall size of is looking like as follows
We use two different estimators for the errors in the force field:
- Bayesian error estimation.
- Spilling factor.
Bayesian error estimation
Ultimately for error prediction we want to get the maximized probability of observing a new structure on basis of the training set , which is denoted as . For this we need to get from the error of the linear fitting coefficients in the reproduction of the training data to which is explained in the following.
First we obtain from the Bayesian theorem
where we assume multivariate Gaussian distributions for the likelihood function
and the (conjugate) prior
The parameters and need to be optimized to balance the accuracy and robustness of the force field (see below).The ormalization is obtained by
Using the equations from above and the completing square method is obtained as follows
Since the prior and the likelood function are described by multivariate Gaussian distributions the posterior describes a multivariate Gaussian written as
where we use the following definitions for the center of the Guassian distribution and the covariance matrix
By using the relation
and the completing square method the distribution of is written as
Here is similar to but only for the new structure .
The mean vector contains the results of the predictions on the dimensionless energy per atom, forces and stress tensor.
The diagonal elements of , which correspond to the variances of the predicted results, are used as the uncertainty in the prediction.
Finally to get the best results and to prevent overfitting the parameters and have to be optimized. To achieve this, we use the evidence approximation (also called as empirical bayes, 2 maximum likelihood or generalized maximum likelihood), which maximizes the evidence function (also called model evidence) defined as
The function corresponds to the probability that the regression model with parameters and provides the reference data . Hence the best fit is optimized by maximizing this probability. The maximization is carried out by simultaneously solving the following equations
where are the eigenvalues of the matrix .
The spilling factor is a measure of the density of a local reference configuration near the new local configuration given as
It should be noted that the equation is slightly modified from its orginial equation given by Miwa and Ohno to make it applicable to a non-normalized similarity measure such as, e.g. the radial descriptor, too.
- ↑ R. Jinnouchi, F. Karsai and G. Kresse, Phys. Rev. B 100. 014105 (2019).
- ↑ A. P. Bartók, R. Kondor, and G. Csányi, Phys. Rev. B 87, 184115 (2013).
- ↑ A. P. Bartók, M.C. Payne, R. Kondor, and G. Csányi, Phys. Rev. Lett 104, 136403 (2010).
- ↑ [ J. P. Boyd, Chebyshev and Fourier Spectral Methods (Dover Publications, New York, 2000).]
- ↑ a b c C. M. Bishop, Pattern Recognition and Machine Learning, (New York: Springer), (2006).
- ↑ a b K. Miwa, and H. Ohno, Phys. Rev. B 94, 184109 (2016).
- ↑ S.F. Gull, and J. Skilling, Maximum Entropy Bayesian Methods, Fundam. Theor. Phys., 28th ed. (Springer, Dordrecht, 1989).
- ↑ D. J. C. Mackay, Neural Computation 4, 415 (1992).
- ↑ R. Jinnouchi, and R. Asahi, J. Phys. Chem. Lett. 8, 4279 (2017).