Force field query¶
sGDML force fields (see force field reconstruction) are straightforward to use. In the following example we use a pre-trained model to predict the energy and forces for a ethanol geometry stored as XYZ file (
import numpy as np from sgdml.predict import GDMLPredict from sgdml.utils import io model = np.load('m_ethanol.npz') gdml = GDMLPredict(model) r,_ = io.read_xyz('ethanol.xyz') # 9 atoms e,f = gdml.predict(r) print(r.shape) # (1,27) print(e.shape) # (1,) print(f.shape) # (1,27)
Here, the sGDML predictor is instantiated with the model file
m_ethanol.npz and queried using the ethanol geometry
r imported from
ethanol.xyz. It returns the energy
e and all interatomic forces
f for this structure.
In this example,
r is an array of dimension
1 x 3N, containing the Cartesian coordinates of
N atoms, but we could as well have passed an
M x 3N-dimensional array containing
M geometries at once, to generate multiple energy and force predictions simultaneously.
The order of atoms in
ethanol.xyz must be consistent with the dataset on which the model
m_ethanol.npz was originally trained.
The distance unit of the input geometry (here: Ångström) must match the unit in the dataset that was used for model training.
sgdml package is able to parallelize calculations across multiple CPU cores, which is especially beneficial when querying multiple geometries at once in every call to
gdml.predict(). To ensure optimal performance in a particular compute environment, given a particular model file, we recommend running
right after initialization of the model. This function will determine the optimal parallelization settings and apply them in one step. Here,
n_bulk is the number of geometries
M that we plan on querying in each call to
The maximum number of processes (= CPU cores) that sGDML is allowed to use can be globally limited by specifying
max_processes during initialization of the predictor:
gdml = GDMLPredict(model, max_processes=12)
Running the benchmark can take some time (i.e. seconds to minutes, depending on the model). However the result for a each configuration of training points, number of atoms and choice of
n_bulk is cached so that
gdml.prepare_parallel() returns instantly on subsequent calls (after repeating the benchmark several times).
use_torch=True when instantiating the predictor redirects all calculations to PyTorch, which automatically uses GPUs, if available.
gdml = GDMLPredict(model, use_torch=True)
The PyTorch dependency is required for this option (see installation).
PyTorch must be installed with GPU support, otherwise it falls back on the CPU. However, we recommend running CPU calculations without the PyTorch flag, as our own CPU implementation is faster.