ladder.scripts.workflows.BaseWorkflow#

class ladder.scripts.workflows.BaseWorkflow(anndata, config='cross-condition', verbose=False, random_seed=None)#

Base class for all workflows.

Offers a high-level API that does not require running blocks of code in quick succession, as the process for each dataset is more or less similar. Must not be instantiated and used directly. All parameters given to specific functions throughout the workflow can later be accessed with the same named attribute.

Parameters:
  • anndata (AnnData) – The dataset object to be used throughout the analyses.

  • config (Literal["cross-condition", "interpretable"], default: “cross-condition”) – Defines the workflow to be used. Affects model structure.

  • verbose (bool, default: False) – If True, prints progress messages for various methods within the module.

  • random_seed (int, optional) – If given, seeds the internal modules with the value.

anndata#

The attached Dataset object.

Type:

AnnData

batch_key#

Optional batch key in obs for correction.

Type:

str, optional

batch_mapping#

Mapping of batch literals to encodings, only appears if batch key is provided in workflow.

Type:

dict

cell_type_label_key#

Optional cell type labels in obs, required if cell-type specific evaluation is desired.

Type:

str, optional

config#

The config string provided during construction.

Type:

Literal["cross-condition", "interpretable"], default: “cross-condition”

converter#

Low-level converter class for the attached Dataset. See :func:`~ladder.data.real_data.distrib_dataset for details.

Type:

AnndataConverter

dataset#

Low-level Dataset object passed to the model. See :func:`~ladder.data.real_data.distrib_dataset for details.

Type:

Dataset

factors#

List of factors to register to the model.

Type:

list

verbose#

If True, prints progress messages for various methods within the module.

Type:

bool, optional

random_seed#

If given, seeds the internal modules with the value.

Type:

int, optional

label_style#

Defines the conditional encoding style to use depending on the model.

Type:

str

latent_dim#

Size of the latent dimension for the model. Common latent for Patches.

Type:

int

len_attrs#

Specifies the number of attributes per condition class.

Type:

list

levels#

Mapping of condition literals to encodings.

Type:

dict

l_mean#

If batch_key is provided in workflow, the empirical library size log-mean for each batch (1-D Array-like of float). A single value otherwise.

Type:

float or array_like

l_scale#

If batch_key is provided in workflow, then the empirical library size log-variance for each batch (1-D Array-like of float). A single value otherwise.

Type:

float or array_like

minibatch_size#

Size of the minibatch to be provided during training.

Type:

int

model#

The model object attached to the workflow.

Type:

Module

model_type#

Specifies the model attached to the current workflow.

Type:

str

optim_args#

Optimizer arguments passed to low-level trainer. See training for details.

Type:

dict

predictive#

Low-level generator to be used for tasks after training.

Type:

Predictive

reconstruction#

Defines the decoder to be used.

Type:

str

train_loss#

ndarray of losses recorded on the training set during training.

Type:

ndarray

train_set#

Low-level training Dataset passed to the model. See :func:`~ladder.data.real_data.distrib_dataset for details.

Type:

Dataset

test_loss#

ndarray of losses recorded on the test set during training.

Type:

ndarray

test_set#

Low-level test Dataset passed to the model. See :func:`~ladder.data.real_data.distrib_dataset for details.

Type:

torch.utils.data.Dataset

w_dim#

Size of conditional latents, only defined for Patches.

Type:

int, optional

prep_model(factors, batch_key=None, cell_type_label_key=None, minibatch_size=128, model_type='Patches', model_args=None, optim_args=None)#

Prepares the model to be run.

run_model(max_epochs=1500, convergence_threshold=1e-3, convergence_window=30, classifier_warmup=0, params_save_path=None)#

Runs the model on the attached data object.

save_model(params_save_path)#

Saves the attached model.

load_model(params_load_path)#

Loads parameters for the attached model. Needs prep_model() to be run first.

plot_loss()#

Simple plotter for loss functions.

write_embeddings()#

Places the calculated cell embeddings from the trained model under the corresponding obsm field.

evaluate_reconstruction(subset=None, cell_type=None, n_iter=5)#

Evaluates the quality of reconstructions with generative metrics.

evaluate_separability(factor=None)#

Evaluates the separability of the latent encodings with respect to conditional effects.

Attributes table#

Methods table#

evaluate_reconstruction([subset, cell_type, ...])

Evaluates the quality of reconstructions with generative metrics.

evaluate_separability([factor])

Evaluates the separability of latent embeddings for conditions.

load_model(params_load_path)

Loads parameters for the attached model.

plot_loss([save_loss_path])

Simple plotter for loss functions.

prep_model(factors[, batch_key, ...])

Prepares the model to be run.

run_model([max_epochs, ...])

Runs the model on the attached data object.

save_model(params_save_path)

Saves the attached model.

write_embeddings()

Places the calculated cell embeddings from the trained model under the corresponding obsm field.

Attributes#

BaseWorkflow.METRICS_REG = {'chamfer': 'Chamfer Discrepancy', 'corr': 'Profile Correlation', 'rmse': 'RMSE', 'swd': '2-Sliced Wasserstein'}#
BaseWorkflow.OPT_CLASS1 = ['SCVI', 'SCANVI']#
BaseWorkflow.OPT_CLASS2 = ['Patches']#
BaseWorkflow.OPT_DEFAULTS = {'betas': (0.9, 0.999), 'eps': 0.01, 'gamma': 1, 'lr': 0.001, 'milestones': [10000000000.0]}#
BaseWorkflow.OPT_LIST = ['optimizer', 'optim_args', 'gamma', 'milestones', 'lr', 'eps', 'betas']#
BaseWorkflow.SEP_METRICS_REG = {'calc_asw': 'Average Silhouette Width', 'kmeans_ari': 'K-Means ARI', 'kmeans_nmi': 'K-Means NMI', 'knn_error': 'kNN Classifier Accuracy'}#

Methods#

BaseWorkflow.evaluate_reconstruction(subset=None, cell_type=None, n_iter=5)#

Evaluates the quality of reconstructions with generative metrics.

Parameters:
  • subset (str, optional) – Key from levels to subset cells for a specific condition before evaluating reconstruction.

  • cell_type (str, optional) – Requires cell_type_label_key to be defined as attribute. Subset cells to a single type before evaluating reconstruction.

  • n_iter (int, default: 5) – Number of times to repeat the generative process.

BaseWorkflow.evaluate_separability(factor=None)#

Evaluates the separability of latent embeddings for conditions.

Parameters:

factor (str, optional) – Item listed in BaseWorkflow.factors. If not provided, the metrics will be evaluated on the combinations of factors.

BaseWorkflow.load_model(params_load_path)#

Loads parameters for the attached model. Needs prep_model() to be run first.

Parameters:

params_load_path (str) – Path to find model parameters. Expects only the shared prefix, and not the trailing “_torch.pth” or “_pyro.pth”.

BaseWorkflow.plot_loss(save_loss_path=None)#

Simple plotter for loss functions.

Parameters:

save_loss_path (str, optional) – If provided, saves the figure to the specified location. Requires the full name with extensions (eg. fig.png).

BaseWorkflow.prep_model(factors, batch_key=None, cell_type_label_key=None, minibatch_size=128, model_type='Patches', model_args=None, optim_args=None)#

Prepares the model to be run.

The choice of model implicitly decides the kind of condition encodings to use, so there is no need to have a separate data preparation.

Parameters:
  • factors (list) – Factors from obs to register to the model.

  • batch_key (str, optional) – Defines the workflow to be used. Affects model structure. Can later be accessed with same named attribute.

  • cell_type_label_key (str, optional) – Optional cell type labels in obs, required if cell-type specific evaluation is desired.

  • minibatch_size (int, default: 128) – Size of the minibatch to be provided during training.

  • model_type (Literal["SCVI", "SCANVI", "Patches"], default: “Patches”) – Specifies the model attached to the current workflow.

  • model_args (dict) – Model arguments passed to low-level model constructor. See models for details.

  • optim_args (dict) – Optimizer arguments passed to low-level trainer. See training for details.

BaseWorkflow.run_model(max_epochs=1500, convergence_threshold=0.0001, convergence_window=100, classifier_warmup=0, classifier_aggression=0, params_save_path=None)#

Runs the model on the attached data object.

Parameters:
  • max_epochs (int, default: 1500) – Maximum number of epochs to run.

  • convergence_threshold (float, default: 1e-3) – Minimum improvement required to continue training.

  • convergence_window (int, default: 30) – Number of epochs to wait until a new minimum is attained.

  • classifier_warmup (int, default: 0) – Number of epochs to run the classifier before running the entire model.

  • classifier_aggression (int, default: 0) – Number of epochs the classifier takes independently between jointly trained epochs. Used for Patches.

  • params_save_path (str, optional) – If provided, saves the model to the specified path.

BaseWorkflow.save_model(params_save_path)#

Saves the attached model.

Parameters:

params_save_path (str) – Path to save model parameters. Expects only the name without extensions.

BaseWorkflow.write_embeddings()#

Places the calculated cell embeddings from the trained model under the corresponding obsm field.

Each model has a separate name for their respective latent, so that more than a single workflow running on the same object instance does not overwrite info.