sgdml.utils package

sgdml.utils.desc module

sgdml.utils.desc.from_r(r, lat_and_inv=None)[source]

Generate descriptor and its Jacobian for a molecular geometry in Cartesian coordinates.

Parameters:
  • r (numpy.ndarray) – Array of size 1 x 3N containing the Cartesian coordinates of each atom.
  • lat_and_inv (tuple of numpy.ndarray, optional) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.
Returns:

  • numpy.ndarray – Descriptor representation as 1D array of size N(N-1)/2
  • numpy.ndarray – Array of size N(N-1)/2 x 3N containing all partial derivatives of the descriptor.

sgdml.utils.desc.init(n_atoms)[source]
sgdml.utils.desc.pbc_diff(diffs, lat_and_inv)[source]

Clamp differences of vectors to super cell.

Parameters:
  • diffs (numpy.ndarray) – N x 3 matrix of N pairwise differences between vectors u - v
  • v (numpy.ndarray) – Second vector.
  • lat_and_inv (tuple of numpy.ndarray) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.
Returns:

N x 3 matrix clamped differences

Return type:

numpy.ndarray

sgdml.utils.desc.pbc_diff_torch(diffs, lat_and_inv)[source]

Clamp differences of vectors to super cell (for torch tensors).

Parameters:
  • diffs (numpy.ndarray) – N x 3 matrix of N pairwise differences between vectors u - v
  • v (numpy.ndarray) – Second vector.
  • lat_and_inv (tuple of numpy.ndarray) – Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.
Returns:

N x 3 matrix clamped differences

Return type:

numpy.ndarray

sgdml.utils.desc.pdist(r, lat_and_inv=None)[source]
sgdml.utils.desc.perm(perm)[source]

Convert atom permutation to descriptor permutation.

A permutation of N atoms is converted to a permutation that acts on the corresponding descriptor representation. Applying the converted permutation to a descriptor is equivalent to permuting the atoms first and then generating the descriptor.

Parameters:perm (numpy.ndarray) – Array of size N containing the atom permutation.
Returns:Array of size N(N-1)/2 containing the corresponding descriptor permutation.
Return type:numpy.ndarray
sgdml.utils.desc.r_to_d_desc(r, pdist, lat_and_inv=None)[source]

Generate descriptor Jacobian for a set of atom positions in Cartesian coordinates. This method can apply the minimum-image convention as periodic boundary condition for distances between atoms, given the edge length of the (square) unit cell.

Parameters:
  • r (numpy.ndarray) – Array of size 1 x 3N containing the Cartesian coordinates of each atom.
  • pdist (numpy.ndarray) – Array of size N x N containing the Euclidean distance (2-norm) for each pair of atoms.
  • lat_and_inv (tuple of numpy.ndarray, optional) – Tuple of 3x3 matrix containing lattice vectors as columns and its inverse.
Returns:

Array of size N(N-1)/2 x 3N containing all partial derivatives of the descriptor.

Return type:

numpy.ndarray

sgdml.utils.desc.r_to_desc(r, pdist)[source]

Generate descriptor for a set of atom positions in Cartesian coordinates.

Parameters:
  • r (numpy.ndarray) – Array of size 3N containing the Cartesian coordinates of each atom.
  • pdist (numpy.ndarray) – Array of size N x N containing the Euclidean distance (2-norm) for each pair of atoms.
Returns:

Descriptor representation as 1D array of size N(N-1)/2

Return type:

numpy.ndarray

sgdml.utils.io module

sgdml.utils.io.dataset_md5(dataset)[source]
sgdml.utils.io.filter_file_type(dir, type, md5_match=None)[source]

Filters all files from a directory that match a given type and (optionally) a given fingerprint.

Parameters:
  • arg (str) – File path.
  • type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.
  • md5_match (str, optional) – Fingerprint string.
Returns:

List of file names that match the specified type and fingerprint (if provided).

Return type:

list of str

Raises:

ArgumentTypeError – If the directory contains unreadable .npz files.

sgdml.utils.io.generate_xyz_str(r, z, e=None, f=None, lattice=None)[source]
sgdml.utils.io.is_dir_with_file_type(arg, type, or_file=False)[source]

Validate directory path and check if it contains files of the specified type.

Note

If a file path is provided, this function acts like its a directory with just one file.

Parameters:
  • arg (str) – File path.
  • type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.
  • or_file (bool) – If arg contains a file path, act like it’s a directory with just a single file inside.
Returns:

Tuple of directory path (as provided) and a list of contained file names of the specified type.

Return type:

(str, list of str)

Raises:
  • ArgumentTypeError – If the provided directory path does not lead to a directory.
  • ArgumentTypeError – If directory contains unreadable files.
  • ArgumentTypeError – If directory contains no files of the specified type.
sgdml.utils.io.is_file_type(arg, type)[source]

Validate file path and check if the file is of the specified type.

Parameters:
  • arg (str) – File path.
  • type ({‘dataset’, ‘task’, ‘model’}) – Possible file types.
Returns:

Tuple of file path (as provided) and data stored in the file. The returned instance of NpzFile class must be closed to avoid leaking file descriptors.

Return type:

(str, dict)

Raises:
  • ArgumentTypeError – If the provided file path does not lead to a NpzFile.
  • ArgumentTypeError – If the file is not readable.
  • ArgumentTypeError – If the file is of wrong type.
  • ArgumentTypeError – If path/fingerprint is provided, but the path is not valid.
  • ArgumentTypeError – If fingerprint could not be resolved.
  • ArgumentTypeError – If multiple files with the same fingerprint exist.
sgdml.utils.io.is_strict_pos_int(arg)[source]

Validate strictly positive integer input.

Parameters:arg (str) – Integer as string.
Returns:Parsed integer.
Return type:int
Raises:ArgumentTypeError – If integer is not > 0.
sgdml.utils.io.is_task_dir_resumeable(train_dir, train_dataset, test_dataset, n_train, n_test, sigs, gdml)[source]

Check if a directory contains task and/or model files that match the configuration of a training process specified in the remaining arguments.

Check if the training and test datasets in each task match train_dataset and test_dataset, if the number of training and test points matches and if the choices for the kernel hyper-parameter \(\sigma\) are contained in the list. Check also, if the existing tasks/models contain symmetries and if that’s consistent with the flag gdml. This function is useful for determining if a training process can be resumed using the existing files or not.

Parameters:
  • train_dir (str) – Path to training directory.
  • train_dataset (dataset) – Dataset from which training points are sampled.
  • test_dataset (test_dataset) – Dataset from which test points are sampled (may be the same as train_dataset).
  • n_train (int) – Number of training points to sample.
  • n_test (int) – Number of test points to sample.
  • sigs (list of int) – List of \(\sigma\) kernel hyper-parameter choices (usually: the hyper-parameter search grid)
  • gdml (bool) – If True, don’t include any symmetries in model (GDML), otherwise do (sGDML).
Returns:

False, if any of the files in the directory do not match the training configuration.

Return type:

bool

sgdml.utils.io.is_valid_file_type(arg_in)[source]

Check if file is either a valid dataset, task or model file.

Parameters:arg_in (str) – File path.
Returns:Tuple of file path (as provided) and data stored in the file. The returned instance of NpzFile class must be closed to avoid leaking file descriptors.
Return type:(str, dict)
Raises:ArgumentTypeError – If the provided file path does not point to a supported file type.
sgdml.utils.io.lattice_vec_to_par(lat)[source]
sgdml.utils.io.model_file_name(task_or_model, is_extended=False)[source]
sgdml.utils.io.parse_list_or_range(arg)[source]

Parses a string that represents either an integer or a range in the notation <start>:<step>:<stop>.

Parameters:arg (str) – Integer or range string.
Returns:
Return type:int or list of int
Raises:ArgumentTypeError – If input can neither be interpreted as an integer nor a valid range.
sgdml.utils.io.read_xyz(file_path)[source]
sgdml.utils.io.task_file_name(task)[source]
sgdml.utils.io.train_dir_name(dataset, n_train, use_sym, use_cprsn, use_E, use_E_cstr)[source]
sgdml.utils.io.write_geometry(filename, r, z, comment_str='')[source]
sgdml.utils.io.z_str_to_z(z_str)[source]
sgdml.utils.io.z_to_z_str(z)[source]

sgdml.utils.perm module

sgdml.utils.perm.bipartite_match(R, z, max_processes=None)[source]
sgdml.utils.perm.complete_sym_group(perms)[source]
sgdml.utils.perm.find_perms(R, z, max_processes=None)[source]
sgdml.utils.perm.inv_perm(perm)[source]
sgdml.utils.perm.share_array(arr_np, typecode)[source]
sgdml.utils.perm.sync_perm_mat(match_perms_all, match_cost, n_atoms)[source]

sgdml.utils.ui module

sgdml.utils.ui.color_str(str, fore_color=7, back_color=0, bold=False)[source]
sgdml.utils.ui.gen_lattice_str(lat)[source]
sgdml.utils.ui.gen_mat_str(mat)[source]

Converts a matrix to a multiline string such that the decimal points align in each column. Trailing zeros are replaced with spaces.

Parameters:mat (numpy.ndarray)
Returns:String representation of matrix.
Return type:str
sgdml.utils.ui.gen_range_str(min, max)[source]

Generates a string that shows a minimum and maximum value, as well as the range.

Example: <min> |-- <range> --| <max>

Parameters:
  • min (float) – Minimum value.
  • max (float) – Maximum value.
Returns:

Return type:

str

sgdml.utils.ui.gray_str(str)[source]
sgdml.utils.ui.indent_str(str, indent)[source]

Indents all lines of a multiline string right by a given number of characters.

Parameters:
  • str (str) – Multiline string.
  • indent (int) – Number of characters added in front of each line.
Returns:

Return type:

str

sgdml.utils.ui.info_str(str)[source]
sgdml.utils.ui.merge_col_str(col_str1, col_str2)[source]

Merges two multiline strings that represent columns in a table by concatenating each pair of lines.

Note

Both strings must have the same number of lines.

Parameters:
  • col_str1 (str) – First multiline string.
  • col_str2 (str) – Second multiline string.
Returns:

Return type:

str

sgdml.utils.ui.pass_str(str)[source]
sgdml.utils.ui.print_lattice(lat=None)[source]
sgdml.utils.ui.print_step_title(title_str, sec_title_str='', underscore=True)[source]
sgdml.utils.ui.print_two_column_str(str, sec_str='')[source]
sgdml.utils.ui.progr_bar(current, total, disp_str='', sec_disp_str=None)[source]

Print progress bar.

Example: [ 45%] Task description (secondary string)

Parameters:
  • current (int) – How many items already processed?
  • total (int) – Total number of items?
  • disp_str (str, optional) – Task description.
  • sec_disp_str (str, optional) – Additional string shown in gray.
sgdml.utils.ui.progr_toggle(is_done, disp_str='', sec_disp_str=None)[source]

Print progress toggle.

Example (not done): [ .. ] Task description (secondary string)

Example (done): [DONE] Task description (secondary string)

Parameters:
  • is_done (bool) – Task done?
  • disp_str (str, optional) – Task description.
  • sec_disp_str (str, optional) – Additional string shown in gray.
sgdml.utils.ui.str_plen(str)[source]

Returns printable length of string. This function can only account for invisible characters due to string styling with color_str.

Parameters:str (str) – String.
Returns:
Return type:str
sgdml.utils.ui.underline_str(str)[source]
sgdml.utils.ui.unicode_str(s)[source]
sgdml.utils.ui.white_back_str(str)[source]
sgdml.utils.ui.white_bold_str(str)[source]
sgdml.utils.ui.wrap_indent_str(label, str, width=93)[source]

Wraps and indents a multiline string to arrange it with the provided label in two columns. The default maximum line already accounts for the indentation due to the logging level label.

Example: <label><multiline string>

Parameters:
  • label (str) – Label
  • str (str) – Multiline string.
Returns:

Return type:

str

sgdml.utils.ui.wrap_str(str, width=93)[source]

Wrap multiline string after a given number of characters. The default maximum line already accounts for the indentation due to the logging level label.

Parameters:
  • str (str) – Multiline string.
  • width (int, optional) – Max number of characters in a line.
Returns:

Return type:

str

sgdml.utils.ui.yellow_back_str(str)[source]
sgdml.utils.ui.yes_or_no(question)[source]

Ask for yes/no user input on a question.

Any response besides y yields a negative answer.

Parameters:question (str) – User question.