How to use SemiPy ?

Description of different ways to use SemiPy

SemiPy has the advantage of being versatile: different ways exist to use the library.

Configuration file

A first simple way to use SemiPy is to write a configuration file that describes all the different parameters that the user can change. Here you can find the default configuration file used by the library. Below, a list of all parameters and their description.

Parameter	Description	Default value
USE_LIGHTNING	To use PyTorch Lightning or not	True
EPOCHS	Total number of epochs. One epoch is finished when the model has seen all labelled items.	1
BALANCING_WEIGHT	Corresponds to the 'lambda' parameter found in various SSL papers. Allows to weight unlabelled loss compared to labelled one.	0.5
DEBIASED	To enable safe SSL via debiased loss (Schmutz et. al)	False
SELECTION_THRESHOLD	Probability threshold for pseudo-labels.	0.95
BATCH_SIZE	Size of each batch.	64
LABELLED_PROPORTION	Proportion of labelled items compared to unlabelled ones. Needed for JointSampler	0.5
SAVE_PATH	Path to save models at the end of training and best model during training.	'./saves'
OPTIMIZER	Optimizer name and parameters	NAME: 'SGD'; PARAMS: {lr: 1.0e-3, momentum: 0.9}
SCHEDULER	Learning rate scheduler	null
NET	Model to train.	'resnet18'
METHOD	SSL method to use.	'pseudolabel'
NUM_WARMUP_EPOCHS	Number of warmup epochs. Used for PiModel.	null
DATA	Data informations	See default values in DATA details below
USE_MULTIGPU	To enable multi-GPU training.	False
NUM_GPU	Number of used GPU(s).	null
MULTIGPU_STRATEGY	Strategy for multi-GPU training. For now only 'ddp' is supported.	null
EMA	Exponential Moving Average coefficient for EMA on model's parameters.	null
METRICS	Metrics informations	See default values in METRICS details below
EARLYSTOPPING	EarlyStopping informations	See default values in EARLYSTOPPING details below

DATA parameters details

Parameter	Description	Default value
NAME	Name of the desired dataset. Use 'custom' for your own dataset.	null
VALIDATION_PROPORTION	Size proportion of validation set compared to labelled items.	null
TEST_PROPORTION	(In case current dataset does not have a test set) Size proportion for test set compared to whole dataset.	null
LABELLED_SAMPLES	Number of labelled samples in training set.	null
UNLABELLED_SAMPLES	Number of unlabelled samples in training set.	null
INCLUDE_LABELLED	To include labelled items (without label) in the unlabelled set, to add information.	True
USE_EXTRA	Used by SVHN dataset	False
DATA/SPLITS	Subsection for defining different data splits.	See below for parameters
SPLITS/TRAIN	Subsection example for Train split.
SPLITS/PATH	Path to train set.	'data'
SPLITS/NAME_UNLABELLED	In case using your own dataset : name of folder containing unlabelled items.	'nodata'
SPLITS/TRANSFORMS	To add transformations to the dataset.	[]

METRICS parameters details

Parameter	Description	Default value
METRICS/VALIDATION	Subsection for validation metrics.	See below for parameters.
VALIDATION/NAME	Name of the first validation metric.	Accuracy
VALIDATION/PARAMS	Parameters for the corresponding metric.	{task: multiclass}
METRICS/TEST	Subsection for test metrics.	See below for parameters.
TEST/NAME	Name of the first test metric.	Accuracy
TEST/PARAMS	Parameters for the corresponding metric.	{task: multiclass}

Tip

You can add as many metrics as you want. Simply add a new item in the list. The names should be took in torchmetrics list of metrics. For example:

METRICS:
    VALIDATION:
        - NAME: MulticlassAccuracy
          PARAMS:
            num_classes: 10
        - NAME: MulticlassAveragePrecision
          PARAMS:
            num_classes: 10
            average: 'weighted'

EARLYSTOPPING parameters details

Parameter	Description	Default value
EARLYSTOPPING/NAME	Name of the monitored metric.	VALIDATION/Loss
EARLYSTOPPING/PARAMS	Dictionary of parameters for EarlyStopping.	{'mode': 'min', 'patience': 10}

Once you have your configuration file ready, you have multiple choices: use a notebook or use a script.

Notebook Usage

If you want to use a notebook, simply feed the configuration file's path to your trainer.

import semipy as smp
trainer = smp.tools.SSLTrainer(config='config.yaml')
trainer.fit()

from pytorch_lightning import Trainer
import semipy as smp
trainer = Trainer(max_epochs=100, accelerator='gpu')
lightning_module = smp.pl.LitFixMatch(config='config.yaml')
trainer.fit(lightning_module)

Script usage

You can also use a custom script, or use the main.py script present in the root folder of the library. Just use the parser -config "path_to_config_file" option to use your configuration file.

python main.py --config config.yaml

Without configuration file ?

It is stil possible to use SemiPy without a YAML configuration file. You can also use a dictionary of parameters. Note that when building a trainer (with or without PyTorch Lightning), the library will still use the above provided default configuration file in order to fill in the parameters that have not been specified by the user. For example with a dictionary:

import semipy as smp
args = {'EPOCHS': 100, 'BALANCING_WEIGHT': 0.12}
trainer = smp.tools.SSLTrainer(config=args)
trainer.fit()

Finally, if you don't want to use the provided trainers, you have access to all the useful functions that comes with SemiPy, especially the different loss functions specific to each Semi-Supervised learning algorithms (that are written in a PyTorch style), and of course the JointSampler. It's up to you to build your own code with those more in-depth functions. Note that most of the code is Pytorch Lightning compatible, so you can also build your own Lightning Module.