NNucleate package

Submodules

NNucleate.data_augmentation module

NNucleate.data_augmentation.augment_evenly(n: int, trajname: str, topology: str, cvname: str, savename: str, box: float, n_min=0, col=3, bins=25, n_max=inf)

Takes in a trajectory and adds degenerate rotated frames such that the resulting trajectory represents and even histogram.: Writes a new trajectory and CV file.

Parameters

n (int) – The height of the target histogram.
trajname (str) – Path to the trajectory file (.xtc or .xyz).
topology (str) – Path to the topology file (.pdb).
cvname (str) – Path to the CV file. Text file with CVs organised in columns.
savename (str) – String without file ending under which the final CV and traj will be saved.
box (float) – Box length for applying PBC.
n_min (int, optional) – The minimum number of frames to add per frame, defaults to 0.
col (int, optional) – The column in the CV file from which to read the CV (0 indexing), defaults to 3.
bins (int, optional) – Number of bins in the target histogram, defaults to 25.
n_max (int, optional) – Maximal height of a histogram column, defaults to math.inf.

NNucleate.data_augmentation.transform_frame_to_knn_list(k: int, traj: ndarray, box_length: float) → ndarray

Transforms the cartesian representation of a given trajectory frame to a list of sorted distances including the distance of each atom to its k nearest neighbours. This guarantees symmetry invariances but at significant cost and risk of kinks in the CV space.

Parameters

k (int) – Number of neighbours to consider for each atom.
traj (ndarray of float) – List of coordinates to be transformed.
box_length (float) – Length of the cubic box.

Returns

Returns an array of shape n_atoms x k*n_atoms/2.

Return type

ndarray of float

NNucleate.data_augmentation.transform_frame_to_ndist_list(n_dist: int, traj: ndarray, box_length: float) → ndarray

Transform the the cartesian coordinates of a given trajectory frame into a sorted list of the n_dist shortest distances in the system.

Parameters

n_dist (int) – Number of distances to include (max: n*(n-1)/2).
traj (ndarray of float) – List of list of coordinates to transform.
box_length (float) – Length of the cubic box.

Returns

Array of shape n_atoms x n_dists.

Return type

ndarray of float

NNucleate.data_augmentation.transform_traj_to_knn_list(k: int, traj: ndarray, box_length: float) → ndarray

Transforms the cartesian representation of a given trajectory to a list of sorted distances including the distance of each atom to its k nearest neighbours. This guarantees symmetry invariances but at significant cost and risk of kinks in the CV space.

Parameters

k (int) – Number of neighbours to consider for each atom.
traj (ndarray of ndarray of float) – List of coordinates to be transformed.
box_length (float) – Length of the cubic box.

Returns

Returns an array of shape n_frames x n_atoms x k*n_atoms/2.

Return type

ndarray of ndarray of float

NNucleate.data_augmentation.transform_traj_to_ndist_list(n_dist: int, traj: ndarray, box_length: float) → ndarray

Transform the cartesian coordinates of a given trajectory into a sorted list of the n_dist shortest distances in the system.

Parameters

n_dist (int) – Number of distances to include (max: n*(n-1)/2).
traj (ndarray of ndarray of float) – Trajectory that is to be transformed.
box_length (float) – Length of the cubic box.

Returns

Array of shape n_frames x n_atoms x n_dists.

Return type

ndarray of ndarray of float

NNucleate.dataset module

class NNucleate.dataset.CVTrajectory(cv_file: str, traj_name: str, top_file: str, cv_col: int, box_length: float, transform=None, start=0, stop=- 1, stride=1, root=1)

Bases: Dataset

Instantiates a dataset from a trajectory file in xtc/xyz format and a text file containing the nucleation CVs (Assumes cubic cell)

Warning

For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.

Parameters

cv_file (str) – Path to text file structured in columns containing the CVs.
traj_name (str) – Path to the trajectory in .xtc or .xyz file format.
top_file (str) – Path to the topology file in .pdb file format.
cv_col (int) – Indicates the column in which the desired CV is written in the CV file (0 indexing).
box_length (float) – Length of the cubic cell.
transform (function, optional) – A function to be applied to the configuration before returning e.g. to_dist(), defaults to None.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is read, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.

class NNucleate.dataset.GNNMolecularTrajectory(cv_file, traj_name, top_file, cv_col, box_length, rc, n_mol, n_at, start=0, stop=- 1, stride=1, root=1)

Bases: Dataset

Generates a dataset from a trajectory in .xtc/.xyz format for the training of a GNN. The edges are generated from the neighbourlist graph between the COMs of the molecules.

Parameters

cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
cv_col (int) – Gives the colimn in which the CV of interest is stored.
box_length (float) – Length of the cubic box.
rc (float) – Cut-off radius for the construction of the graph.
n_mol (int) – Number of molecules in the system
n_at (int) – Number of atoms per molecule
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is rea, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.

class NNucleate.dataset.GNNTrajectory(cv_file: str, traj_name: str, top_file: str, cv_col: int, box_length: float, rc: float, start=0, stop=- 1, stride=1, root=1)

Bases: Dataset

Generates a dataset from a trajectory in .xtc/.xyz format for the training of a GNN. .. warning:: For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.

Parameters

cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
cv_col (int) – Gives the colimn in which the CV of interest is stored.
box_length (float) – Length of the cubic box.
rc (float) – Cut-off radius for the construction of the graph.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is rea, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.

class NNucleate.dataset.GNNTrajectory_mult(cv_file: str, traj_name: str, top_file: str, box_length: float, rc: float, start=0, stop=- 1, stride=1, root=1)

Bases: Dataset

Generates a dataset from a trajectory in .xtc/.xyz format for the training of a GNN with a multidimensional output. This object loads all the columns in the provided CV file. Make sure to only use it with other functions that can account for that. .. warning:: For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.

Parameters

cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
box_length (float) – Length of the cubic box.
rc (float) – Cut-off radius for the construction of the graph.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is rea, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.

class NNucleate.dataset.KNNTrajectory(cv_file: str, traj_name: str, top_file: str, cv_col: int, box_length: float, k: int, start=0, stop=- 1, stride=1, root=1)

Bases: Dataset

Generates a dataset from a trajectory in .xtc/xyz format.: The trajectory frames are represented via the sorted distances of all atoms to their k nearest neighbours.

Warning

For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.

Parameters

cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
cv_col (int) – Gives the colimn in which the CV of interest is stored.
box_length (float) – Length of the cubic box.
k (int) – Number of neighbours to consider.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is read, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.

class NNucleate.dataset.NdistTrajectory(cv_file: str, traj_name: str, top_file: str, cv_col: int, box_length: float, n_dist: int, start=0, stop=- 1, stride=1, root=1)

Bases: Dataset

Generates a dataset from a trajectory in .xtc/xyz format.: The trajectory frames are represented via the n_dist sorted distances.

Warning

For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.

Parameters

cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
cv_col (int) – Gives the colimn in which the CV of interest is stored.
box_length (float) – Length of the cubic box.
n_dist (int) – Number of distances to consider.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is read, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.

NNucleate.models module

class NNucleate.models.GCL(hidden_nf: int, act_fn=ReLU())

Bases: Module

The graph convolutional layer for the graph-based model. Do not instantiate this directly.

Parameters

hidden_nf (int) – Hidden dimensionality of the latent node representation.
act_fn (torch.nn.modules.activation, optional) – PyTorch activation function to be used in the multi-layer perceptrons, defaults to nn.ReLU()

edge_model(source, target)

forward(h, edge_index)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

node_model(x, edge_index, edge_attr)

training: bool

class NNucleate.models.GNNCV(in_node_nf=3, hidden_nf=3, device='cpu', act_fn=ReLU(), pool_fn=<built-in method sum of type object>, n_layers=1)

Bases: Module

Graph neural network class for approximating nucleation CVs.

Parameters

in_node_nf (int, optional) – Dimensionality of the data in the graph nodes, defaults to 3.
hidden_nf (int, optional) – Hidden dimensionality of the latent node representation, defaults to 3.
device (str, optional) – Device the model should be stored on (For GPU support), defaults to “cpu”.
act_fn (torch.nn.modules.activation, optional) – PyTorch activation function to be used in the multi-layer perceptrons, defaults to nn.ReLU().
pool_fn (function, optional) – Pooling function used in the final layer. Should behave analogously to torch.sum(), defaults to torch.sum
n_layers (int, optional) – The number of graph convolutional layers, defaults to 1.

forward(x, edges, n_nodes)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class NNucleate.models.GNNCV_mult(out_nodes, in_node_nf=3, hidden_nf=3, device='cpu', act_fn=ReLU(), pool_fn=<built-in method sum of type object>, n_layers=1)

Bases: Module

Graph neural network class for approximating multiple nucleation CVs at once.

Parameters

out_nodes (int) – Dimensionality of the prediction.
in_node_nf (int, optional) – Dimensionality of the data in the graph nodes, defaults to 3.
hidden_nf (int, optional) – Hidden dimensionality of the latent node representation, defaults to 3.
device (str, optional) – Device the model should be stored on (For GPU support), defaults to “cpu”.
act_fn (torch.nn.modules.activation, optional) – PyTorch activation function to be used in the multi-layer perceptrons, defaults to nn.ReLU().
pool_fn (function, optional) – Pooling function used in the final layer. Should behave analogously to torch.sum(), defaults to torch.sum
n_layers (int, optional) – The number of graph convolutional layers, defaults to 1.

forward(x, edges, n_nodes)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class NNucleate.models.NNCV(insize: int, l1: int, l2=0, l3=0)

Bases: Module

Instantiates an NN for approximating CVs. Supported are architectures with up to 3 layers.

Parameters

insize (int) – Size of the input layer.
l1 (int) – Size of dense layer 1.
l2 (int, optional) – Size of dense layer 2, defaults to 0.
l3 (int, optional) – Size of dense layer 3, defaults to 0.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

NNucleate.models.initialise_weights(model: Module)

Initiallises the weights of a custom model using the globally set seed. Usage: model.apply(initialise_weights)

Parameters: model (nn.Module) – Model that is to be initialised

NNucleate.pycv_link module

NNucleate.pycv_link.write_cv_link(model, n_hid, n_layers, n_at, box_l, rc, fname)

Function that writes an input file for the coupling with plumed, based on a model that is passed in the parameters. This function assumes the following architecture: - Embedding layer n_at x n_hid - N_layers GCLs - Edge layer n_hid x n_hid, ReLU, n_hid x n_hid - Node layer 2*n_hid x n_hid, ReLU, n_hid x n_hid - Node decoder n_hid x n_hid, ReLU, n_hid x n_hid - Graph decoder n_hid x n_hid, ReLU, n_hid x 1

Parameters

model (GNNCV) – The model for which the input file shall be written. (only for graph-based models)
n_hid (int) – Number of dimensions in the model latent space.
n_layers (int) – Number of GCL layers.
n_at (int) – Number of nodes in the graph.
box_l (float) – Size of the simulation box of the system that is used in the MTD simulation.
rc (float) – Cut off radius for the neighbourlist generation
fname (str) – Name of file that is created.

NNucleate.pycv_link.write_fast_link(model, n_hid, n_layers, n_at, box_l, rc, fname)

Function that writes an input file for the coupling with plumed, based on a model that is passed in the parameters. This version of the input file will be faster but requires a Cython package “neighborlist” with the same signature as the one provided in the github repository. This function assumes the following architecture: - Embedding layer n_at x n_hid - N_layers GCLs - Edge layer n_hid x n_hid, ReLU, n_hid x n_hid - Node layer 2*n_hid x n_hid, ReLU, n_hid x n_hid - Node decoder n_hid x n_hid, ReLU, n_hid x n_hid - Graph decoder n_hid x n_hid, ReLU, n_hid x 1

Parameters

model (GNNCV) – The model for which the input file shall be written. (only for graph-based models)
n_hid (int) – Number of dimensions in the model latent space.
n_layers (int) – Number of GCL layers.
n_at (int) – Number of nodes in the graph.
box_l (float) – Size of the simulation box of the system that is used in the MTD simulation.
rc (float) – Cut off radius for the neighbourlist generation
fname (str) – Name of file that is created.

NNucleate.pycv_link.write_fast_link_2D(model, n_hid, n_layers, out_nodes, n_at, box_l, rc, dim1, dim2, fname, pool='sum')

Function that writes an input file for the coupling with plumed, based on a model that is passed in the parameters. This version of the input file will be faster but requires a Cython package “neighborlist” with the same signature as the one provided in the github repository. This function assumes the following architecture: - Embedding layer n_at x n_hid - N_layers GCLs - Edge layer n_hid x n_hid, ReLU, n_hid x n_hid - Node layer 2*n_hid x n_hid, ReLU, n_hid x n_hid - Node decoder n_hid x n_hid, ReLU, n_hid x n_hid - Graph decoder n_hid x n_hid, ReLU, n_hid x 1

Parameters

model (GNNCV) – The model for which the input file shall be written. (only for graph-based models)
n_hid (int) – Number of dimensions in the model latent space.
n_layers (int) – Number of GCL layers.
out_nodes (int) – Dimensionality of the model output.
n_at (int) – Number of nodes in the graph.
box_l (float) – Size of the simulation box of the system that is used in the MTD simulation.
rc (float) – Cut off radius for the neighbourlist generation
dim1 (str) – Name of the CV in the first dimension.
dm2 – Name of the CV in the second dimension.
fname (str) – Name of file that is created.
pool (str) – Name of the pooling layer that is used. (“sum”, “mean”, “max”)

NNucleate.training module

NNucleate.training.early_stopping_gnn(model_t: GNNCV, train_loader: DataLoader, val_loader: DataLoader, n_at: int, optimizer: Callable, loss: Callable, device: str, test_freq=1) → tuple

Train a graph-based model according to the early-stopping approach. In early stopping a model is trained until the validation error (approximation for the generalisation error) worsens for the first time to prevent overfitting. Once an increase in the validation error is detected for the first time th eloop is exited and the model-state from the previous validation is returned.

Parameters

model_t (GNNCV) – The graph-based model hat is to be optimised.
train_loader (torch.utils.data.Dataloader) – Wrapper around the training set for the model optimisation.
val_loader (torch.utils.data.Dataloader) – Wrapper around the validation set for the model optimisation.
n_at (int) – Number of nodes in the graph (Number of atoms or molecules).
optimizer (torch.optim) – Optimizer to be used for the optimisation.
loss (torch.nn._Loss) – Loss function to be used for the optimisation.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
test_freq (int, optional) – The number of epochs after which the model should be evaluated. A lower number is more accurate and costs more but is reccomended for big datasets, defaults to 1

Returns

This function returns the optimised model and the history of test and training errors over the course of the convergence.

Return type

GNNCV, list of float, list of float

NNucleate.training.evaluate_model_gnn(model: GNNCV, dataloader: DataLoader, n_mol: int, device: str, n_at=1) → tuple

Helper function that evaluates a model on a training set and calculates some properies for the generation of performance scatter plots.

Parameters

model (GNNCV) – The model that is to be evaluated.
dataloader (torch.utils.data.Dataloader) – Wrapper around the dataset that the model is supposed to be evaluated on.
n_mol (int) – Number of nodes in the graph of each frame. (Number of atoms or molecules)
device (str) – Device that the training is performed on. (Required for GPU compatibility)
n_at (int, optional) – Number of atoms per molecule.

Returns

Returns the prediction of the model on each frame, the corresponding true values, the root mean square error of the predictions and the r2 correlation coefficient.

Return type

List of float, List of float, float, float

NNucleate.training.evaluate_model_gnn_mult(model: GNNCV, dataloader: DataLoader, n_mol: int, device: str, cols: list, n_at=1) → tuple

Helper function that evaluates a model on a training set and calculates some properies for the generation of performance scatter plots.

Parameters

model (GNNCV) – The model that is to be evaluated.
dataloader (torch.utils.data.Dataloader) – Wrapper around the dataset that the model is supposed to be evaluated on.
n_mol (int) – Number of nodes in the graph of each frame. (Number of atoms or molecules)
device (str) – Device that the training is performed on. (Required for GPU compatibility)
cols (list) – List of column indices representing the CVs the model is learning from the dataset.
n_at (int, optional) – Number of atoms per molecule.

Returns

Returns the prediction of the model on each frame, the corresponding true values, the root mean square errors of the predictions and the r2 correlation coefficients.

Return type

List of List of float, List of List of float, List of float, List of float

NNucleate.training.test_gnn(model: GNNCV, loader: DataLoader, n_mol: int, loss_l1: Callable, device: str, n_at=1) → float

Evaluate the test/validation error of a graph based model_t on a validation set.

Parameters

model (GNNCV) – Graph-based model_t to be trained.
loader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
n_mol (int) – Number of nodes per frame.
loss_l1 (torch.nn._Loss) – Loss function for the training.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
n_at (int, optional) – Number of atoms per molecule.

Returns

Return the average loss over the epoch.

Return type

float

NNucleate.training.test_gnn_mult(model: GNNCV, loader: DataLoader, n_mol: int, loss_l1, device: str, cols: list, n_at=1) → float

Evaluate the test/validation error of a graph based model with multidimensional output on a test set.

Parameters

model (GNNCV) – Graph-based model_t to be trained.
loader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
n_mol (int) – Number of nodes per frame.
loss_l1 (torch.nn._Loss) – Loss function for the training.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
cols (list) – List of column indices representing the CVs the model is learning from the dataset.
n_at (int, optional) – Number of atoms per molecule.

Returns

Return the average loss over the epoch.

Return type

float

NNucleate.training.test_linear(model_t: NNCV, dataloader: DataLoader, loss_fn: Callable, device: str) → float

Calculates the current average test set loss.

Parameters

model_t (NNCV) – Model that is being trained.
dataloader (torch.utils.data.Dataloader) – Dataloader loading the test set.
loss_fn (torch.nn._Loss) – Pytorch loss function.
device (str) – Device that the training is performed on. (Required for GPU compatibility)

Returns

Return the validation loss.

Return type

float

NNucleate.training.train_gnn(model: GNNCV, loader: DataLoader, n_mol: int, optimizer: Callable, loss: Callable, device: str, n_at=1) → float

Function to perform one epoch of a GNN training.

Parameters

model (GNNCV) – Graph-based model_t to be trained.
loader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
n_at (int, optional) – Number of nodes per frame.
optimizer (torch.optim) – The optimizer object for the training.
loss (torch.nn._Loss) – Loss function for the training.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
n_at – Number of atoms per molecule.

Returns

Return the average loss over the epoch.

Return type

float

NNucleate.training.train_gnn_mult(model: GNNCV, loader: DataLoader, n_mol: int, optimizer, loss, device: str, cols: list, n_at=1) → float

Function to perform one epoch of a GNN with multidimensional output training.

Parameters

model (GNNCV) – Graph-based model_t to be trained.
loader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
n_at (int, optional) – Number of nodes per frame.
optimizer (torch.optim) – The optimizer object for the training.
loss (torch.nn._Loss) – Loss function for the training.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
cols (list) – List of column indices representing the CVs the model is learning from the dataset.
n_at – Number of atoms per molecule.

Returns

Return the average loss over the epoch.

Return type

float

NNucleate.training.train_linear(model_t: NNCV, dataloader: DataLoader, loss_fn: Callable, optimizer: Callable, device: str, print_batch=1000000) → float

Performs one training epoch for a NNCV.

Parameters

model_t (NNCV) – The network to be trained.
dataloader (torch.utils.data.Dataloader) – Wrappper for the training set.
loss_fn (torch.nn._Loss) – Pytorch loss to be used during training.
optimizer (torch.optim) – Pytorch optimizer to be used during training.
device (str) – Pytorch device to run the calculation on. Supports CPU and GPU (cuda).
print_batch (int, optional) – Set to recieve printed updates on the lost every print_batch batches, defaults to 1000000.

Returns

Returns the last loss item. For easy learning curve recording. Alternatively one can use a Tensorboard.

Return type

float

NNucleate.training.train_perm(model_t: NNCV, dataloader: DataLoader, optimizer: Callable, loss_fn: Callable, n_trans: int, device: str, print_batch=1000000) → float

Performs one training epoch for a NNCV but the loss for each batch is not just calculated on one reference structure but a set of n_trans permutated versions of that structure.

Parameters

dataloader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
optimizer (torch.optim) – The optimizer object for the training.
loss_fn (torch.nn._Loss) – Loss function for the training.
n_trans (int) – Number of permutated structures used for the loss calculations.
device (str) – Pytorch device to run the calculations on. Supports CPU and GPU (cuda).
print_batch (int, optional) – Set to recieve printed updates on the loss every print_batches batches, defaults to 1000000.

Returns

Returns the last loss item. For easy learning curve recording. Alternatively one can use a Tensorboard.

Return type

float

NNucleate.training.train_rot(model_t: NNCV, dataloader: DataLoader, optimizer: Callable, loss_fn: Callable, n_trans: int, device: str, print_batch=1000000) → float

Performs one training epoch for a NNCV but the loss for each batch is not just calculated on one reference structure but a set of n_trans rotated versions of that structure.

Parameters

dataloader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
optimizer (torch.optim) – The optimizer object for the training.
loss_fn (torch.nn._Loss) – Loss function for the training.
n_trans (int) – Number of rotated structures used for the loss calculations.
device (str) – Pytorch device to run the calculations on. Supports CPU and GPU (cuda).
print_batch (int, optional) – Set to recieve printed updates on the loss every print_batches batches, defaults to 1000000.

Returns

Returns the last loss item. For easy learning curve recording. Alternatively one can use a Tensorboard.

Return type

float

NNucleate.utils module

class NNucleate.utils.PeriodicCKDTree(bounds: ndarray, data: ndarray, leafsize=10)

Bases: cKDTree

A wrapper around scipy.spatial.kdtree to implement periodic boundary conditions

!!!!Written by Patrick Varilly, 6 Jul 2012!!! “https://github.com/patvarilly/periodic_kdtree” Released under the scipy license

Cython kd-tree for quick nearest-neighbor lookup with periodic boundaries See scipy.spatial.ckdtree for details on kd-trees. Searches with periodic boundaries are implemented by mapping all initial data points to one canonical periodic image, building an ordinary kd-tree with these points, then querying this kd-tree multiple times, if necessary, with all the relevant periodic images of the query point. Note that to ensure that no two distinct images of the same point appear in the results, it is essential to restrict the maximum distance between a query point and a data point to half the smallest box dimension. Construct a kd-tree.

Parameters

bounds (array_like, shape (k,)) – Size of the periodic box along each spatial dimension. A negative or zero size for dimension k means that space is not periodic along k.
data (array-like, shape (n,m)) – The n data points of dimension mto be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles, and so modifying this data will result in bogus results.
leafsize (int, optional) – The number of points at which the algorithm switches over to brute-force, defaults to 10.

query(x: ndarray, k=1, eps=0, p=2, distance_upper_bound=inf) → ndarray

Query the kd-tree for nearest neighbors.

Parameters

x (array_like, last dimension self.m) – An array of points to query.
k (int, optional.) – The number of nearest neighbors to return, defaults to 1
eps (int, optional) – Return approximate nearest neighbors; the kth returned value is guaranteed to be no further than (1+eps) times the distance to the real k-th nearest neighbor, defaults to 0.
p (int, optional) – Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance, defaults to 2.
distance_upper_bound (float, optional) – Return only neighbors within this distance. This is used to prune tree searches, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point, defaults to np.inf.

Returns

The distances to the nearest neighbors. If x has shape tuple+(self.m,), then d has shape tuple+(k,). Missing neighbors are indicated with infinite distances.

Return type

array of floats

Returns

The locations of the neighbors in self.data. If x has shape tuple+(self.m,), then i has shape tuple+(k,). Missing neighbors are indicated with self.n.

Return type

ndarray of ints

query_ball_point(x: ndarray, r: float, p=2.0, eps=0) → ndarray

Find all points within distance r of point(s) x. Notes: If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a PeriodicCKDTree and using query_ball_tree.

Parameters

x (array_like, shape tuple + (self.m,)) – The point or points to search for neighbors of.
r (float) – The radius of points to return.
p (float, optional) – Which Minkowski p-norm to use. Should be in the range [1, inf], defaults to 2.0.
eps (int, optional) – Approximate search. Branches of the tree are not explored if their nearest points are further than r / (1 + eps), and branches are added in bulk if their furthest points are nearer than r * (1 + eps), defaults to 0.

Returns

If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

Return type

list or array of lists

NNucleate.utils.com(xyz: ndarray) → list

Calculates the centre of mass of a set of coordinates.

Parameters: xyz (np.ndarray) – Array containing the list of 3-dimensional coordinates.
Returns: A list of the calculated centres of mass.
Return type: list of float

NNucleate.utils.get_mol_edges(rc: float, traj: Trajectory, n_mol: int, n_at: int, box: float) → list

Generate the edges for a neighbourlist graph based on the COMs of the given molecules.

Parameters

rc (float) – Cut off radius for the neighbourlist graph.
traj (md.Trajectory) – The Trajectory containing the frames.
n_mol (int) – Number of molecules per frame.
n_at (int) – Number of atoms per molecule.
box (float) – Length of the cubic box

Returns

A list containing two tensors which represent the adjacency matrix of the graph.

Return type

list of torch.tensor

NNucleate.utils.get_rc_edges(rc: float, traj: Trajectory) → list

Returns the edges of the graph constructed by interpreting the atoms in the trajectory as nodes that are connected to all other nodes within a distance of rc.

Parameters

rc (float) – Cut-off radius for the graph construction.
traj (md.trajectory) – The trajectory for which the graphs shall be constructed.

Returns

A list containing two tensors which represent the adjacency matrix of the graph.

Return type

list of torch.tensor

NNucleate.utils.pbc(trajectory: Trajectory, box_length: float) → Trajectory

Centers an mdtraj Trajectory around the centre of a cubic box with the given box length and wraps all atoms into the box.

Parameters

trajectory (mdtraj.trajectory) – The trajectory that is to be modified, i.e. contains the configurations that shall be wrapped back into the simulation box.
box_length (float) – Length of the cubic box which shall contain all the positions.

Returns

Returns a trajectory object obeying PBC according to the given box length.

Return type

mdtraj.trajectory

NNucleate.utils.pbc_config(config: ndarray, box_length: float) → Trajectory

Wraps all atoms in a given configuration into the box.

Parameters

config – The trajectory that is to be modified, i.e. contains the configurations that shall be wrapped back into the simulation box.
box_length (float) – Length of the cubic box which shall contain all the positions.

Returns

Returns a trajectory object obeying PBC according to the given box length.

Return type

np.ndarray

NNucleate.utils.rotate_trajs(trajectories: ndarray) → ndarray

Rotates each frame in the given trajectories according to a random quaternion.

Parameters: trajectories (list of md.trajectory) – A list of mdtraj.trajectory objects to be modified.
Returns: Returns a list of trajectories, the frames of which have been randomly rotated and wrapped back into the box.
Return type: list of md.trajectory

NNucleate.utils.unsorted_segment_sum(data: Tensor, segment_ids: Tensor, num_segments: int) → Tensor

Function that sums the segments of a matrix. Each row has a non-unique ID and all rows with the same ID are summed such that a matrix with the number of rows equal to the number of unique IDs is obtained.

Parameters

data (torch.tensor) – A tensor that contains the data that is to be summed.
segment_ids (torch.tensor) – An array that has the same number of entries as data has rows which indicates which rows shall be summed.
num_segments (int) – This is the number of unique IDs, i.e. the dimensionality of the resulting tensor.

Returns

Returns a tensor shaped num_segments x data.size(1) containing all the segment sums.

Return type

torch.Tensor

NNucleate package

Submodules

NNucleate.data_augmentation module

NNucleate.dataset module

NNucleate.models module

NNucleate.pycv_link module

NNucleate.training module

NNucleate.utils module

Module contents