Force field query

sGDML force fields (see force field reconstruction) are straightforward to use. In the following example we use a pre-trained model to predict the energy and forces for a ethanol geometry stored as XYZ file (download):

import numpy as np
from sgdml.predict import GDMLPredict
from sgdml.utils import io

model = np.load('m_ethanol.npz')
gdml = GDMLPredict(model)

r,_ = io.read_xyz('') # 9 atoms
e,f = gdml.predict(r)

print(r.shape) # (1,27)
print(e.shape) # (1,)
print(f.shape) # (1,27)

Here, the sGDML predictor is instantiated with the model file m_ethanol.npz and queried using the ethanol geometry r imported from It returns the energy e and all interatomic forces f for this structure.

In this example, r is an array of dimension 1 x 3N, containing the Cartesian coordinates of N atoms, but we could as well have passed an M x 3N-dimensional array containing M geometries at once, to generate multiple energy and force predictions simultaneously.


The order of atoms in must be consistent with the dataset on which the model m_ethanol.npz was originally trained.


The distance unit of the input geometry (here: Ångström) must match the unit in the dataset that was used for model training.

Multi-CPU support

The sgdml package is able to parallelize calculations across multiple CPU cores, which is especially beneficial when querying multiple geometries at once in every call to gdml.predict(). To ensure optimal performance in a particular compute environment, given a particular model file, we recommend running


right after initialization of the model. This function will determine the optimal parallelization settings and apply them in one step. Here, n_bulk is the number of geometries M that we plan on querying in each call to gdml.predict().

The maximum number of processes (= CPU cores) that sGDML is allowed to use can be globally limited by specifying max_processes during initialization of the predictor:

gdml = GDMLPredict(model, max_processes=12)


Running the benchmark can take some time (i.e. seconds to minutes, depending on the model). However the result for a each configuration of training points, number of atoms and choice of n_bulk is cached so that gdml.prepare_parallel() returns instantly on subsequent calls (after repeating the benchmark several times).

Multi-GPU support

Setting use_torch=True when instantiating the predictor redirects all calculations to PyTorch, which automatically uses GPUs, if available.

gdml = GDMLPredict(model, use_torch=True)


The PyTorch dependency is required for this option (see installation).


PyTorch must be installed with GPU support, otherwise it falls back on the CPU. However, we recommend running CPU calculations without the PyTorch flag, as our own CPU implementation is faster.