# Data Preparation¶

sGDML uses a native format for its datasets, but we include scripts to convert from and to Extended XYZ files and other popular file formats. It is easy to create custom converters by using one of those scripts as a template.

To create a dataset from an Extended XYZ file (Example), run:

$sgdml_dataset_from_xyz.py <ext_xyz_file>  This will create a dataset file in the format supported by the sGDML package. Vice versa, we can also convert from this native format back to Extended XYZ, using: $ sgdml_dataset_to_xyz.py <ext_xyz_file>


Note

Any metadata, like the dataset name, the level of theory or dataset checksum will be dropped when exporting to external file formats.

## Importing proprietary formats¶

Note

The output format of external programs can suddenly change in the future, which would make adjustments to the following scripts necessary.

To create a datasets from a FHI-aims molecular dynamics output files (Example), run:

$sgdml_dataset_from_aims.py <aims_output_file>  To create a datasets from i-PI molecular dynamics trajectories (Example), run: $ sgdml_dataset_from_ipi.py <xyz_geometries> <xyz_forces> <energies> [<energy_col>]


i-PI stores geometries, forces and energies in separate files. The desired columns in its energy output file is selected via the parameter <energy_col>.