ladder.data.real_data.distrib_dataset

ladder.data.real_data.distrib_dataset#

ladder.data.real_data.distrib_dataset(dataset, levels, split_pcts=None, batch_size=128, keep_train=None, keep_test=None, batch_key=None, **kwargs)#

Function that distributes the TensorDataset generated by construct_labels.

Parameters:
  • dataset (TensorDataset) – The dataset output from construct_labels.

  • levels (dict) – The levels output from construct_labels.

  • split_pcts (array_like, optional) – Size 2 list of float specifying the proportions for training and test respectively. Ignored if both keep_train and keep_test are not None.

  • batch_size (int) – Mini-batch size for the models to train on.

  • keep_train (array_like, optional) – 1D Array-like of str. Specifies the levels to keep in the training dataset. Elements must be from levels.keys().

  • keep_test (array_like, optional) – 1D Array-like of str. Specifies the levels to keep in the test dataset. Elements must be from levels.keys().

  • batch_key (str, optional) – Must not be None if batch_key was previously provided to construct_labels. The actual values is unimportant for this scope.

  • **kwargs (dict, optional) – Keyword arguments passed to utils.DataLoader.

Return type:

tuple

Returns:

train_setTensorDataset or ConcatTensorDataset

The full training set to be used downstream.

test_setTensorDataset or ConcatTensorDataset

The full test set to be used downstream.

train_loaderDataLoader

The corresponding loader for train_set.

test_loaderDataLoader

The corresponding loader for test_set.

l_meanfloat or array_like

If batch_key is provided, the empirical library size log-mean for each batch (1-D Array-like of float). A single value otherwise.

l_scalefloat or array_like

If batch_key is provided, then the empirical library size log-variance for each batch (1-D Array-like of float). A single value otherwise.