Overview¶

The dataset in use is the MNIST dataset, which is divided into three parts to be used accross the different training options:

Labeled trainset a smaller set of train images, with known target labels. used for semi-supervised training.
Unlabeled trainset a large set of train images, assumed to be stored without known target labels. used for un-supervised training, or as the un-supervised segment in semi-supervised training.
Validation set used to validate the model’s classification accuracy in the case of semi-supervised training.

The sizes chosen for the datasets are as follows: 3000 [labeled trainset] 47000 [un-labeled trainset] 10000 [validation]

Initialize the Datasets¶

Initializing the datasets requires downloading the MNIST dataset, and its segmentation into the parts described above. Therefore the dataset initialization can be done only once, using a separate entry point.

>>> python setup.py install --user
>>> init_datasets --dir-path <path-to-data-dir>