Force Field Reconstruction¶
To reconstruct a sGDML force field, we need a dataset file, which can be generated from various file formats (Data Preperation). For a quick start, we can also just continue with one of the included benchmark datasets.
The easiest way to create a model is with the automated model creation assistant:
$ sgdml all ethanol.npz <n_train> <n_validate> <n_test>
<n_test> specify the sample sizes for the training, validation and test datasets, respectively. All of them are taken from the provided bulk dataset
ethanol.npz, without overlap.
The size of the training set is the most important parameter: a larger value will yield a more accurate model, but at the cost of increased training time and memory requirement. Increasing the size of the validation and test dataset carries no such penalty. In fact, it is desirable to use large test datasets to get a reliable estimate for the generalization performance of the trained model.
There is not much more to it: the command line call above will perform all steps necessary to reconstruct a sGDML force field. Once the program finishes, it will output a fully trained and tested
Model training is a memory intensive task that is best executed on a powerful computer.
If the reconstruction process is terminated prematurely, it can be simply reissued to resume.
In the example above, we have used the
all to automate force field reconstruction. However, each training step can also be called individually as described here.
The same functionality is also exposed via the Python API, which is particularly useful when developing new models based on the existing sGDML implementation.
Here is how to train one individual model (without cross-validation or testing as with the assistant used above) for a particular choice of hyper-parameters
sig = 10 and
lam = 1e-15:
import sys import numpy as np from sgdml.train import GDMLTrain dataset = np.load('d_ethanol.npz') n_train = 200 gdml_train = GDMLTrain() task = gdml_train.create_task(dataset, n_train,\ valid_dataset=dataset, n_valid=1000,\ sig=10, lam=1e-15) try: model = gdml_train.train(task) except Exception, err: sys.exit(err) else: np.savez_compressed('m_ethanol.npz', **model)