NNucleate package
Submodules
NNucleate.data_augmentation module
- NNucleate.data_augmentation.augment_evenly(n: int, trajname: str, topology: str, cvname: str, savename: str, box: float, n_min=0, col=3, bins=25, n_max=inf)
- Takes in a trajectory and adds degenerate rotated frames such that the resulting trajectory represents and even histogram.
Writes a new trajectory and CV file.
- Parameters
n (int) – The height of the target histogram.
trajname (str) – Path to the trajectory file (.xtc or .xyz).
topology (str) – Path to the topology file (.pdb).
cvname (str) – Path to the CV file. Text file with CVs organised in columns.
savename (str) – String without file ending under which the final CV and traj will be saved.
box (float) – Box length for applying PBC.
n_min (int, optional) – The minimum number of frames to add per frame, defaults to 0.
col (int, optional) – The column in the CV file from which to read the CV (0 indexing), defaults to 3.
bins (int, optional) – Number of bins in the target histogram, defaults to 25.
n_max (int, optional) – Maximal height of a histogram column, defaults to math.inf.
- NNucleate.data_augmentation.transform_frame_to_knn_list(k: int, traj: ndarray, box_length: float) ndarray
Transforms the cartesian representation of a given trajectory frame to a list of sorted distances including the distance of each atom to its k nearest neighbours. This guarantees symmetry invariances but at significant cost and risk of kinks in the CV space.
- Parameters
k (int) – Number of neighbours to consider for each atom.
traj (ndarray of float) – List of coordinates to be transformed.
box_length (float) – Length of the cubic box.
- Returns
Returns an array of shape n_atoms x k*n_atoms/2.
- Return type
ndarray of float
- NNucleate.data_augmentation.transform_frame_to_ndist_list(n_dist: int, traj: ndarray, box_length: float) ndarray
Transform the the cartesian coordinates of a given trajectory frame into a sorted list of the n_dist shortest distances in the system.
- Parameters
n_dist (int) – Number of distances to include (max: n*(n-1)/2).
traj (ndarray of float) – List of list of coordinates to transform.
box_length (float) – Length of the cubic box.
- Returns
Array of shape n_atoms x n_dists.
- Return type
ndarray of float
- NNucleate.data_augmentation.transform_traj_to_knn_list(k: int, traj: ndarray, box_length: float) ndarray
Transforms the cartesian representation of a given trajectory to a list of sorted distances including the distance of each atom to its k nearest neighbours. This guarantees symmetry invariances but at significant cost and risk of kinks in the CV space.
- Parameters
k (int) – Number of neighbours to consider for each atom.
traj (ndarray of ndarray of float) – List of coordinates to be transformed.
box_length (float) – Length of the cubic box.
- Returns
Returns an array of shape n_frames x n_atoms x k*n_atoms/2.
- Return type
ndarray of ndarray of float
- NNucleate.data_augmentation.transform_traj_to_ndist_list(n_dist: int, traj: ndarray, box_length: float) ndarray
Transform the cartesian coordinates of a given trajectory into a sorted list of the n_dist shortest distances in the system.
- Parameters
n_dist (int) – Number of distances to include (max: n*(n-1)/2).
traj (ndarray of ndarray of float) – Trajectory that is to be transformed.
box_length (float) – Length of the cubic box.
- Returns
Array of shape n_frames x n_atoms x n_dists.
- Return type
ndarray of ndarray of float
NNucleate.dataset module
- class NNucleate.dataset.CVTrajectory(cv_file: str, traj_name: str, top_file: str, cv_col: int, box_length: float, transform=None, start=0, stop=- 1, stride=1, root=1)
Bases:
Dataset
Instantiates a dataset from a trajectory file in xtc/xyz format and a text file containing the nucleation CVs (Assumes cubic cell)
Warning
For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.
- Parameters
cv_file (str) – Path to text file structured in columns containing the CVs.
traj_name (str) – Path to the trajectory in .xtc or .xyz file format.
top_file (str) – Path to the topology file in .pdb file format.
cv_col (int) – Indicates the column in which the desired CV is written in the CV file (0 indexing).
box_length (float) – Length of the cubic cell.
transform (function, optional) – A function to be applied to the configuration before returning e.g. to_dist(), defaults to None.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is read, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.
- class NNucleate.dataset.GNNMolecularTrajectory(cv_file, traj_name, top_file, cv_col, box_length, rc, n_mol, n_at, start=0, stop=- 1, stride=1, root=1)
Bases:
Dataset
Generates a dataset from a trajectory in .xtc/.xyz format for the training of a GNN. The edges are generated from the neighbourlist graph between the COMs of the molecules.
- Parameters
cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
cv_col (int) – Gives the colimn in which the CV of interest is stored.
box_length (float) – Length of the cubic box.
rc (float) – Cut-off radius for the construction of the graph.
n_mol (int) – Number of molecules in the system
n_at (int) – Number of atoms per molecule
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is rea, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.
- class NNucleate.dataset.GNNTrajectory(cv_file: str, traj_name: str, top_file: str, cv_col: int, box_length: float, rc: float, start=0, stop=- 1, stride=1, root=1)
Bases:
Dataset
Generates a dataset from a trajectory in .xtc/.xyz format for the training of a GNN. .. warning:: For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.
- Parameters
cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
cv_col (int) – Gives the colimn in which the CV of interest is stored.
box_length (float) – Length of the cubic box.
rc (float) – Cut-off radius for the construction of the graph.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is rea, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.
- class NNucleate.dataset.GNNTrajectory_mult(cv_file: str, traj_name: str, top_file: str, box_length: float, rc: float, start=0, stop=- 1, stride=1, root=1)
Bases:
Dataset
Generates a dataset from a trajectory in .xtc/.xyz format for the training of a GNN with a multidimensional output. This object loads all the columns in the provided CV file. Make sure to only use it with other functions that can account for that. .. warning:: For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.
- Parameters
cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
box_length (float) – Length of the cubic box.
rc (float) – Cut-off radius for the construction of the graph.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is rea, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.
- class NNucleate.dataset.KNNTrajectory(cv_file: str, traj_name: str, top_file: str, cv_col: int, box_length: float, k: int, start=0, stop=- 1, stride=1, root=1)
Bases:
Dataset
- Generates a dataset from a trajectory in .xtc/xyz format.
The trajectory frames are represented via the sorted distances of all atoms to their k nearest neighbours.
Warning
For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.
- Parameters
cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
cv_col (int) – Gives the colimn in which the CV of interest is stored.
box_length (float) – Length of the cubic box.
k (int) – Number of neighbours to consider.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is read, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.
- class NNucleate.dataset.NdistTrajectory(cv_file: str, traj_name: str, top_file: str, cv_col: int, box_length: float, n_dist: int, start=0, stop=- 1, stride=1, root=1)
Bases:
Dataset
- Generates a dataset from a trajectory in .xtc/xyz format.
The trajectory frames are represented via the n_dist sorted distances.
Warning
For .xtc give the boxlength in nm and for .xyz give the boxlength in Å.
- Parameters
cv_file (str) – Path to the cv file.
traj_name (str) – Path to the trajectory file (.xtc/.xyz).
top_file (str) – Path to the topology file (.pdb).
cv_col (int) – Gives the colimn in which the CV of interest is stored.
box_length (float) – Length of the cubic box.
n_dist (int) – Number of distances to consider.
start (int, optional) – Starting frame of the trajectory, defaults to 0.
stop (int, optional) – The last file of the trajectory that is read, defaults to -1.
stride (int, optional) – The stride with which the trajectory frames are read, defaults to 1.
root (int, optional) – Allows for the loading of the n-th root of the CV data (to compress the numerical range), defaults to 1.
NNucleate.models module
- class NNucleate.models.GCL(hidden_nf: int, act_fn=ReLU())
Bases:
Module
The graph convolutional layer for the graph-based model. Do not instantiate this directly.
- Parameters
hidden_nf (int) – Hidden dimensionality of the latent node representation.
act_fn (torch.nn.modules.activation, optional) – PyTorch activation function to be used in the multi-layer perceptrons, defaults to nn.ReLU()
- edge_model(source, target)
- forward(h, edge_index)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- node_model(x, edge_index, edge_attr)
- training: bool
- class NNucleate.models.GNNCV(in_node_nf=3, hidden_nf=3, device='cpu', act_fn=ReLU(), pool_fn=<built-in method sum of type object>, n_layers=1)
Bases:
Module
Graph neural network class for approximating nucleation CVs.
- Parameters
in_node_nf (int, optional) – Dimensionality of the data in the graph nodes, defaults to 3.
hidden_nf (int, optional) – Hidden dimensionality of the latent node representation, defaults to 3.
device (str, optional) – Device the model should be stored on (For GPU support), defaults to “cpu”.
act_fn (torch.nn.modules.activation, optional) – PyTorch activation function to be used in the multi-layer perceptrons, defaults to nn.ReLU().
pool_fn (function, optional) – Pooling function used in the final layer. Should behave analogously to torch.sum(), defaults to torch.sum
n_layers (int, optional) – The number of graph convolutional layers, defaults to 1.
- forward(x, edges, n_nodes)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- class NNucleate.models.GNNCV_mult(out_nodes, in_node_nf=3, hidden_nf=3, device='cpu', act_fn=ReLU(), pool_fn=<built-in method sum of type object>, n_layers=1)
Bases:
Module
Graph neural network class for approximating multiple nucleation CVs at once.
- Parameters
out_nodes (int) – Dimensionality of the prediction.
in_node_nf (int, optional) – Dimensionality of the data in the graph nodes, defaults to 3.
hidden_nf (int, optional) – Hidden dimensionality of the latent node representation, defaults to 3.
device (str, optional) – Device the model should be stored on (For GPU support), defaults to “cpu”.
act_fn (torch.nn.modules.activation, optional) – PyTorch activation function to be used in the multi-layer perceptrons, defaults to nn.ReLU().
pool_fn (function, optional) – Pooling function used in the final layer. Should behave analogously to torch.sum(), defaults to torch.sum
n_layers (int, optional) – The number of graph convolutional layers, defaults to 1.
- forward(x, edges, n_nodes)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- class NNucleate.models.NNCV(insize: int, l1: int, l2=0, l3=0)
Bases:
Module
Instantiates an NN for approximating CVs. Supported are architectures with up to 3 layers.
- Parameters
insize (int) – Size of the input layer.
l1 (int) – Size of dense layer 1.
l2 (int, optional) – Size of dense layer 2, defaults to 0.
l3 (int, optional) – Size of dense layer 3, defaults to 0.
- forward(x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- NNucleate.models.initialise_weights(model: Module)
Initiallises the weights of a custom model using the globally set seed. Usage: model.apply(initialise_weights)
- Parameters
model (nn.Module) – Model that is to be initialised
NNucleate.pycv_link module
- NNucleate.pycv_link.write_cv_link(model, n_hid, n_layers, n_at, box_l, rc, fname)
Function that writes an input file for the coupling with plumed, based on a model that is passed in the parameters. This function assumes the following architecture: - Embedding layer n_at x n_hid - N_layers GCLs - Edge layer n_hid x n_hid, ReLU, n_hid x n_hid - Node layer 2*n_hid x n_hid, ReLU, n_hid x n_hid - Node decoder n_hid x n_hid, ReLU, n_hid x n_hid - Graph decoder n_hid x n_hid, ReLU, n_hid x 1
- Parameters
model (GNNCV) – The model for which the input file shall be written. (only for graph-based models)
n_hid (int) – Number of dimensions in the model latent space.
n_layers (int) – Number of GCL layers.
n_at (int) – Number of nodes in the graph.
box_l (float) – Size of the simulation box of the system that is used in the MTD simulation.
rc (float) – Cut off radius for the neighbourlist generation
fname (str) – Name of file that is created.
- NNucleate.pycv_link.write_fast_link(model, n_hid, n_layers, n_at, box_l, rc, fname)
Function that writes an input file for the coupling with plumed, based on a model that is passed in the parameters. This version of the input file will be faster but requires a Cython package “neighborlist” with the same signature as the one provided in the github repository. This function assumes the following architecture: - Embedding layer n_at x n_hid - N_layers GCLs - Edge layer n_hid x n_hid, ReLU, n_hid x n_hid - Node layer 2*n_hid x n_hid, ReLU, n_hid x n_hid - Node decoder n_hid x n_hid, ReLU, n_hid x n_hid - Graph decoder n_hid x n_hid, ReLU, n_hid x 1
- Parameters
model (GNNCV) – The model for which the input file shall be written. (only for graph-based models)
n_hid (int) – Number of dimensions in the model latent space.
n_layers (int) – Number of GCL layers.
n_at (int) – Number of nodes in the graph.
box_l (float) – Size of the simulation box of the system that is used in the MTD simulation.
rc (float) – Cut off radius for the neighbourlist generation
fname (str) – Name of file that is created.
- NNucleate.pycv_link.write_fast_link_2D(model, n_hid, n_layers, out_nodes, n_at, box_l, rc, dim1, dim2, fname, pool='sum')
Function that writes an input file for the coupling with plumed, based on a model that is passed in the parameters. This version of the input file will be faster but requires a Cython package “neighborlist” with the same signature as the one provided in the github repository. This function assumes the following architecture: - Embedding layer n_at x n_hid - N_layers GCLs - Edge layer n_hid x n_hid, ReLU, n_hid x n_hid - Node layer 2*n_hid x n_hid, ReLU, n_hid x n_hid - Node decoder n_hid x n_hid, ReLU, n_hid x n_hid - Graph decoder n_hid x n_hid, ReLU, n_hid x 1
- Parameters
model (GNNCV) – The model for which the input file shall be written. (only for graph-based models)
n_hid (int) – Number of dimensions in the model latent space.
n_layers (int) – Number of GCL layers.
out_nodes (int) – Dimensionality of the model output.
n_at (int) – Number of nodes in the graph.
box_l (float) – Size of the simulation box of the system that is used in the MTD simulation.
rc (float) – Cut off radius for the neighbourlist generation
dim1 (str) – Name of the CV in the first dimension.
dm2 – Name of the CV in the second dimension.
fname (str) – Name of file that is created.
pool (str) – Name of the pooling layer that is used. (“sum”, “mean”, “max”)
NNucleate.training module
- NNucleate.training.early_stopping_gnn(model_t: GNNCV, train_loader: DataLoader, val_loader: DataLoader, n_at: int, optimizer: Callable, loss: Callable, device: str, test_freq=1) tuple
Train a graph-based model according to the early-stopping approach. In early stopping a model is trained until the validation error (approximation for the generalisation error) worsens for the first time to prevent overfitting. Once an increase in the validation error is detected for the first time th eloop is exited and the model-state from the previous validation is returned.
- Parameters
model_t (GNNCV) – The graph-based model hat is to be optimised.
train_loader (torch.utils.data.Dataloader) – Wrapper around the training set for the model optimisation.
val_loader (torch.utils.data.Dataloader) – Wrapper around the validation set for the model optimisation.
n_at (int) – Number of nodes in the graph (Number of atoms or molecules).
optimizer (torch.optim) – Optimizer to be used for the optimisation.
loss (torch.nn._Loss) – Loss function to be used for the optimisation.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
test_freq (int, optional) – The number of epochs after which the model should be evaluated. A lower number is more accurate and costs more but is reccomended for big datasets, defaults to 1
- Returns
This function returns the optimised model and the history of test and training errors over the course of the convergence.
- Return type
GNNCV, list of float, list of float
- NNucleate.training.evaluate_model_gnn(model: GNNCV, dataloader: DataLoader, n_mol: int, device: str, n_at=1) tuple
Helper function that evaluates a model on a training set and calculates some properies for the generation of performance scatter plots.
- Parameters
model (GNNCV) – The model that is to be evaluated.
dataloader (torch.utils.data.Dataloader) – Wrapper around the dataset that the model is supposed to be evaluated on.
n_mol (int) – Number of nodes in the graph of each frame. (Number of atoms or molecules)
device (str) – Device that the training is performed on. (Required for GPU compatibility)
n_at (int, optional) – Number of atoms per molecule.
- Returns
Returns the prediction of the model on each frame, the corresponding true values, the root mean square error of the predictions and the r2 correlation coefficient.
- Return type
List of float, List of float, float, float
- NNucleate.training.evaluate_model_gnn_mult(model: GNNCV, dataloader: DataLoader, n_mol: int, device: str, cols: list, n_at=1) tuple
Helper function that evaluates a model on a training set and calculates some properies for the generation of performance scatter plots.
- Parameters
model (GNNCV) – The model that is to be evaluated.
dataloader (torch.utils.data.Dataloader) – Wrapper around the dataset that the model is supposed to be evaluated on.
n_mol (int) – Number of nodes in the graph of each frame. (Number of atoms or molecules)
device (str) – Device that the training is performed on. (Required for GPU compatibility)
cols (list) – List of column indices representing the CVs the model is learning from the dataset.
n_at (int, optional) – Number of atoms per molecule.
- Returns
Returns the prediction of the model on each frame, the corresponding true values, the root mean square errors of the predictions and the r2 correlation coefficients.
- Return type
List of List of float, List of List of float, List of float, List of float
- NNucleate.training.test_gnn(model: GNNCV, loader: DataLoader, n_mol: int, loss_l1: Callable, device: str, n_at=1) float
Evaluate the test/validation error of a graph based model_t on a validation set.
- Parameters
model (GNNCV) – Graph-based model_t to be trained.
loader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
n_mol (int) – Number of nodes per frame.
loss_l1 (torch.nn._Loss) – Loss function for the training.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
n_at (int, optional) – Number of atoms per molecule.
- Returns
Return the average loss over the epoch.
- Return type
float
- NNucleate.training.test_gnn_mult(model: GNNCV, loader: DataLoader, n_mol: int, loss_l1, device: str, cols: list, n_at=1) float
Evaluate the test/validation error of a graph based model with multidimensional output on a test set.
- Parameters
model (GNNCV) – Graph-based model_t to be trained.
loader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
n_mol (int) – Number of nodes per frame.
loss_l1 (torch.nn._Loss) – Loss function for the training.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
cols (list) – List of column indices representing the CVs the model is learning from the dataset.
n_at (int, optional) – Number of atoms per molecule.
- Returns
Return the average loss over the epoch.
- Return type
float
- NNucleate.training.test_linear(model_t: NNCV, dataloader: DataLoader, loss_fn: Callable, device: str) float
Calculates the current average test set loss.
- Parameters
model_t (NNCV) – Model that is being trained.
dataloader (torch.utils.data.Dataloader) – Dataloader loading the test set.
loss_fn (torch.nn._Loss) – Pytorch loss function.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
- Returns
Return the validation loss.
- Return type
float
- NNucleate.training.train_gnn(model: GNNCV, loader: DataLoader, n_mol: int, optimizer: Callable, loss: Callable, device: str, n_at=1) float
Function to perform one epoch of a GNN training.
- Parameters
model (GNNCV) – Graph-based model_t to be trained.
loader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
n_at (int, optional) – Number of nodes per frame.
optimizer (torch.optim) – The optimizer object for the training.
loss (torch.nn._Loss) – Loss function for the training.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
n_at – Number of atoms per molecule.
- Returns
Return the average loss over the epoch.
- Return type
float
- NNucleate.training.train_gnn_mult(model: GNNCV, loader: DataLoader, n_mol: int, optimizer, loss, device: str, cols: list, n_at=1) float
Function to perform one epoch of a GNN with multidimensional output training.
- Parameters
model (GNNCV) – Graph-based model_t to be trained.
loader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
n_at (int, optional) – Number of nodes per frame.
optimizer (torch.optim) – The optimizer object for the training.
loss (torch.nn._Loss) – Loss function for the training.
device (str) – Device that the training is performed on. (Required for GPU compatibility)
cols (list) – List of column indices representing the CVs the model is learning from the dataset.
n_at – Number of atoms per molecule.
- Returns
Return the average loss over the epoch.
- Return type
float
- NNucleate.training.train_linear(model_t: NNCV, dataloader: DataLoader, loss_fn: Callable, optimizer: Callable, device: str, print_batch=1000000) float
Performs one training epoch for a NNCV.
- Parameters
model_t (NNCV) – The network to be trained.
dataloader (torch.utils.data.Dataloader) – Wrappper for the training set.
loss_fn (torch.nn._Loss) – Pytorch loss to be used during training.
optimizer (torch.optim) – Pytorch optimizer to be used during training.
device (str) – Pytorch device to run the calculation on. Supports CPU and GPU (cuda).
print_batch (int, optional) – Set to recieve printed updates on the lost every print_batch batches, defaults to 1000000.
- Returns
Returns the last loss item. For easy learning curve recording. Alternatively one can use a Tensorboard.
- Return type
float
- NNucleate.training.train_perm(model_t: NNCV, dataloader: DataLoader, optimizer: Callable, loss_fn: Callable, n_trans: int, device: str, print_batch=1000000) float
Performs one training epoch for a NNCV but the loss for each batch is not just calculated on one reference structure but a set of n_trans permutated versions of that structure.
- Parameters
dataloader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
optimizer (torch.optim) – The optimizer object for the training.
loss_fn (torch.nn._Loss) – Loss function for the training.
n_trans (int) – Number of permutated structures used for the loss calculations.
device (str) – Pytorch device to run the calculations on. Supports CPU and GPU (cuda).
print_batch (int, optional) – Set to recieve printed updates on the loss every print_batches batches, defaults to 1000000.
- Returns
Returns the last loss item. For easy learning curve recording. Alternatively one can use a Tensorboard.
- Return type
float
- NNucleate.training.train_rot(model_t: NNCV, dataloader: DataLoader, optimizer: Callable, loss_fn: Callable, n_trans: int, device: str, print_batch=1000000) float
Performs one training epoch for a NNCV but the loss for each batch is not just calculated on one reference structure but a set of n_trans rotated versions of that structure.
- Parameters
dataloader (torch.utils.data.Dataloader) – Wrapper around a GNNTrajectory dataset.
optimizer (torch.optim) – The optimizer object for the training.
loss_fn (torch.nn._Loss) – Loss function for the training.
n_trans (int) – Number of rotated structures used for the loss calculations.
device (str) – Pytorch device to run the calculations on. Supports CPU and GPU (cuda).
print_batch (int, optional) – Set to recieve printed updates on the loss every print_batches batches, defaults to 1000000.
- Returns
Returns the last loss item. For easy learning curve recording. Alternatively one can use a Tensorboard.
- Return type
float
NNucleate.utils module
- class NNucleate.utils.PeriodicCKDTree(bounds: ndarray, data: ndarray, leafsize=10)
Bases:
cKDTree
A wrapper around scipy.spatial.kdtree to implement periodic boundary conditions
!!!!Written by Patrick Varilly, 6 Jul 2012!!! “https://github.com/patvarilly/periodic_kdtree” Released under the scipy license
Cython kd-tree for quick nearest-neighbor lookup with periodic boundaries See scipy.spatial.ckdtree for details on kd-trees. Searches with periodic boundaries are implemented by mapping all initial data points to one canonical periodic image, building an ordinary kd-tree with these points, then querying this kd-tree multiple times, if necessary, with all the relevant periodic images of the query point. Note that to ensure that no two distinct images of the same point appear in the results, it is essential to restrict the maximum distance between a query point and a data point to half the smallest box dimension. Construct a kd-tree.
- Parameters
bounds (array_like, shape (k,)) – Size of the periodic box along each spatial dimension. A negative or zero size for dimension k means that space is not periodic along k.
data (array-like, shape (n,m)) – The n data points of dimension mto be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles, and so modifying this data will result in bogus results.
leafsize (int, optional) – The number of points at which the algorithm switches over to brute-force, defaults to 10.
- query(x: ndarray, k=1, eps=0, p=2, distance_upper_bound=inf) ndarray
Query the kd-tree for nearest neighbors.
- Parameters
x (array_like, last dimension self.m) – An array of points to query.
k (int, optional.) – The number of nearest neighbors to return, defaults to 1
eps (int, optional) – Return approximate nearest neighbors; the kth returned value is guaranteed to be no further than (1+eps) times the distance to the real k-th nearest neighbor, defaults to 0.
p (int, optional) – Which Minkowski p-norm to use. 1 is the sum-of-absolute-values “Manhattan” distance 2 is the usual Euclidean distance infinity is the maximum-coordinate-difference distance, defaults to 2.
distance_upper_bound (float, optional) – Return only neighbors within this distance. This is used to prune tree searches, so if you are doing a series of nearest-neighbor queries, it may help to supply the distance to the nearest neighbor of the most recent point, defaults to np.inf.
- Returns
The distances to the nearest neighbors. If x has shape tuple+(self.m,), then d has shape tuple+(k,). Missing neighbors are indicated with infinite distances.
- Return type
array of floats
- Returns
The locations of the neighbors in self.data. If x has shape tuple+(self.m,), then i has shape tuple+(k,). Missing neighbors are indicated with self.n.
- Return type
ndarray of ints
- query_ball_point(x: ndarray, r: float, p=2.0, eps=0) ndarray
Find all points within distance r of point(s) x. Notes: If you have many points whose neighbors you want to find, you may save substantial amounts of time by putting them in a PeriodicCKDTree and using query_ball_tree.
- Parameters
x (array_like, shape tuple + (self.m,)) – The point or points to search for neighbors of.
r (float) – The radius of points to return.
p (float, optional) – Which Minkowski p-norm to use. Should be in the range [1, inf], defaults to 2.0.
eps (int, optional) – Approximate search. Branches of the tree are not explored if their nearest points are further than
r / (1 + eps)
, and branches are added in bulk if their furthest points are nearer thanr * (1 + eps)
, defaults to 0.
- Returns
If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.
- Return type
list or array of lists
- NNucleate.utils.com(xyz: ndarray) list
Calculates the centre of mass of a set of coordinates.
- Parameters
xyz (np.ndarray) – Array containing the list of 3-dimensional coordinates.
- Returns
A list of the calculated centres of mass.
- Return type
list of float
- NNucleate.utils.get_mol_edges(rc: float, traj: Trajectory, n_mol: int, n_at: int, box: float) list
Generate the edges for a neighbourlist graph based on the COMs of the given molecules.
- Parameters
rc (float) – Cut off radius for the neighbourlist graph.
traj (md.Trajectory) – The Trajectory containing the frames.
n_mol (int) – Number of molecules per frame.
n_at (int) – Number of atoms per molecule.
box (float) – Length of the cubic box
- Returns
A list containing two tensors which represent the adjacency matrix of the graph.
- Return type
list of torch.tensor
- NNucleate.utils.get_rc_edges(rc: float, traj: Trajectory) list
Returns the edges of the graph constructed by interpreting the atoms in the trajectory as nodes that are connected to all other nodes within a distance of rc.
- Parameters
rc (float) – Cut-off radius for the graph construction.
traj (md.trajectory) – The trajectory for which the graphs shall be constructed.
- Returns
A list containing two tensors which represent the adjacency matrix of the graph.
- Return type
list of torch.tensor
- NNucleate.utils.pbc(trajectory: Trajectory, box_length: float) Trajectory
Centers an mdtraj Trajectory around the centre of a cubic box with the given box length and wraps all atoms into the box.
- Parameters
trajectory (mdtraj.trajectory) – The trajectory that is to be modified, i.e. contains the configurations that shall be wrapped back into the simulation box.
box_length (float) – Length of the cubic box which shall contain all the positions.
- Returns
Returns a trajectory object obeying PBC according to the given box length.
- Return type
mdtraj.trajectory
- NNucleate.utils.pbc_config(config: ndarray, box_length: float) Trajectory
Wraps all atoms in a given configuration into the box.
- Parameters
config – The trajectory that is to be modified, i.e. contains the configurations that shall be wrapped back into the simulation box.
box_length (float) – Length of the cubic box which shall contain all the positions.
- Returns
Returns a trajectory object obeying PBC according to the given box length.
- Return type
np.ndarray
- NNucleate.utils.rotate_trajs(trajectories: ndarray) ndarray
Rotates each frame in the given trajectories according to a random quaternion.
- Parameters
trajectories (list of md.trajectory) – A list of mdtraj.trajectory objects to be modified.
- Returns
Returns a list of trajectories, the frames of which have been randomly rotated and wrapped back into the box.
- Return type
list of md.trajectory
- NNucleate.utils.unsorted_segment_sum(data: Tensor, segment_ids: Tensor, num_segments: int) Tensor
Function that sums the segments of a matrix. Each row has a non-unique ID and all rows with the same ID are summed such that a matrix with the number of rows equal to the number of unique IDs is obtained.
- Parameters
data (torch.tensor) – A tensor that contains the data that is to be summed.
segment_ids (torch.tensor) – An array that has the same number of entries as data has rows which indicates which rows shall be summed.
num_segments (int) – This is the number of unique IDs, i.e. the dimensionality of the resulting tensor.
- Returns
Returns a tensor shaped num_segments x data.size(1) containing all the segment sums.
- Return type
torch.Tensor