Skip to content

How to use SemiPy ?

Description of different ways to use SemiPy

SemiPy has the advantage of being versatile: different ways exist to use the library.

Configuration file

A first simple way to use SemiPy is to write a configuration file that describes all the different parameters that the user can change. Here you can find the default configuration file used by the library. Below, a list of all parameters and their description.

Parameter Description Default value
USE_LIGHTNING To use PyTorch Lightning or not True
EPOCHS Total number of epochs. One epoch is finished when the model has seen all labelled items. 1
BALANCING_WEIGHT Corresponds to the 'lambda' parameter found in various SSL papers. Allows to weight unlabelled loss compared to labelled one. 0.5
DEBIASED To enable safe SSL via debiased loss (Schmutz et. al) False
SELECTION_THRESHOLD Probability threshold for pseudo-labels. 0.95
BATCH_SIZE Size of each batch. 64
LABELLED_PROPORTION Proportion of labelled items compared to unlabelled ones. Needed for JointSampler 0.5
SAVE_PATH Path to save models at the end of training and best model during training. './saves'
OPTIMIZER Optimizer name and parameters NAME: 'SGD'; PARAMS: {lr: 1.0e-3, momentum: 0.9}
SCHEDULER Learning rate scheduler null
NET Model to train. 'resnet18'
METHOD SSL method to use. 'pseudolabel'
NUM_WARMUP_EPOCHS Number of warmup epochs. Used for PiModel. null
DATA Data informations See default values in DATA details below
USE_MULTIGPU To enable multi-GPU training. False
NUM_GPU Number of used GPU(s). null
MULTIGPU_STRATEGY Strategy for multi-GPU training. For now only 'ddp' is supported. null
EMA Exponential Moving Average coefficient for EMA on model's parameters. null
METRICS Metrics informations See default values in METRICS details below
EARLYSTOPPING EarlyStopping informations See default values in EARLYSTOPPING details below

DATA parameters details

Parameter Description Default value
NAME Name of the desired dataset. Use 'custom' for your own dataset. null
VALIDATION_PROPORTION Size proportion of validation set compared to labelled items. null
TEST_PROPORTION (In case current dataset does not have a test set) Size proportion for test set compared to whole dataset. null
LABELLED_SAMPLES Number of labelled samples in training set. null
UNLABELLED_SAMPLES Number of unlabelled samples in training set. null
INCLUDE_LABELLED To include labelled items (without label) in the unlabelled set, to add information. True
USE_EXTRA Used by SVHN dataset False
DATA/SPLITS Subsection for defining different data splits. See below for parameters
SPLITS/TRAIN Subsection example for Train split.
SPLITS/PATH Path to train set. 'data'
SPLITS/NAME_UNLABELLED In case using your own dataset : name of folder containing unlabelled items. 'nodata'
SPLITS/TRANSFORMS To add transformations to the dataset. []

METRICS parameters details

Parameter Description Default value
METRICS/VALIDATION Subsection for validation metrics. See below for parameters.
VALIDATION/NAME Name of the first validation metric. Accuracy
VALIDATION/PARAMS Parameters for the corresponding metric. {task: multiclass}
METRICS/TEST Subsection for test metrics. See below for parameters.
TEST/NAME Name of the first test metric. Accuracy
TEST/PARAMS Parameters for the corresponding metric. {task: multiclass}

Tip

You can add as many metrics as you want. Simply add a new item in the list. The names should be took in torchmetrics list of metrics. For example:

METRICS:
    VALIDATION:
        - NAME: MulticlassAccuracy
          PARAMS:
            num_classes: 10
        - NAME: MulticlassAveragePrecision
          PARAMS:
            num_classes: 10
            average: 'weighted'

EARLYSTOPPING parameters details

Parameter Description Default value
EARLYSTOPPING/NAME Name of the monitored metric. VALIDATION/Loss
EARLYSTOPPING/PARAMS Dictionary of parameters for EarlyStopping. {'mode': 'min', 'patience': 10}

Once you have your configuration file ready, you have multiple choices: use a notebook or use a script.

Notebook Usage

If you want to use a notebook, simply feed the configuration file's path to your trainer.

import semipy as smp
trainer = smp.tools.SSLTrainer(config='config.yaml')
trainer.fit()
from pytorch_lightning import Trainer
import semipy as smp
trainer = Trainer(max_epochs=100, accelerator='gpu')
lightning_module = smp.pl.LitFixMatch(config='config.yaml')
trainer.fit(lightning_module)

Script usage

You can also use a custom script, or use the main.py script present in the root folder of the library. Just use the parser -config "path_to_config_file" option to use your configuration file.

python main.py --config config.yaml

Without configuration file ?

It is stil possible to use SemiPy without a YAML configuration file. You can also use a dictionary of parameters. Note that when building a trainer (with or without PyTorch Lightning), the library will still use the above provided default configuration file in order to fill in the parameters that have not been specified by the user. For example with a dictionary:

import semipy as smp
args = {'EPOCHS': 100, 'BALANCING_WEIGHT': 0.12}
trainer = smp.tools.SSLTrainer(config=args)
trainer.fit()

Finally, if you don't want to use the provided trainers, you have access to all the useful functions that comes with SemiPy, especially the different loss functions specific to each Semi-Supervised learning algorithms (that are written in a PyTorch style), and of course the JointSampler. It's up to you to build your own code with those more in-depth functions. Note that most of the code is Pytorch Lightning compatible, so you can also build your own Lightning Module.