How to use SemiPy ?
SemiPy has the advantage of being versatile: different ways exist to use the library.
Configuration file
A first simple way to use SemiPy is to write a configuration file that describes all the different parameters that the user can change. Here you can find the default configuration file used by the library. Below, a list of all parameters and their description.
Parameter | Description | Default value |
---|---|---|
USE_LIGHTNING | To use PyTorch Lightning or not | True |
EPOCHS | Total number of epochs. One epoch is finished when the model has seen all labelled items. | 1 |
BALANCING_WEIGHT | Corresponds to the 'lambda' parameter found in various SSL papers. Allows to weight unlabelled loss compared to labelled one. | 0.5 |
DEBIASED | To enable safe SSL via debiased loss (Schmutz et. al) | False |
SELECTION_THRESHOLD | Probability threshold for pseudo-labels. | 0.95 |
BATCH_SIZE | Size of each batch. | 64 |
LABELLED_PROPORTION | Proportion of labelled items compared to unlabelled ones. Needed for JointSampler | 0.5 |
SAVE_PATH | Path to save models at the end of training and best model during training. | './saves' |
OPTIMIZER | Optimizer name and parameters | NAME: 'SGD'; PARAMS: {lr: 1.0e-3, momentum: 0.9} |
SCHEDULER | Learning rate scheduler | null |
NET | Model to train. | 'resnet18' |
METHOD | SSL method to use. | 'pseudolabel' |
NUM_WARMUP_EPOCHS | Number of warmup epochs. Used for PiModel. | null |
DATA | Data informations | See default values in DATA details below |
USE_MULTIGPU | To enable multi-GPU training. | False |
NUM_GPU | Number of used GPU(s). | null |
MULTIGPU_STRATEGY | Strategy for multi-GPU training. For now only 'ddp' is supported. | null |
EMA | Exponential Moving Average coefficient for EMA on model's parameters. | null |
METRICS | Metrics informations | See default values in METRICS details below |
EARLYSTOPPING | EarlyStopping informations | See default values in EARLYSTOPPING details below |
DATA parameters details
Parameter | Description | Default value |
---|---|---|
NAME | Name of the desired dataset. Use 'custom' for your own dataset. | null |
VALIDATION_PROPORTION | Size proportion of validation set compared to labelled items. | null |
TEST_PROPORTION | (In case current dataset does not have a test set) Size proportion for test set compared to whole dataset. | null |
LABELLED_SAMPLES | Number of labelled samples in training set. | null |
UNLABELLED_SAMPLES | Number of unlabelled samples in training set. | null |
INCLUDE_LABELLED | To include labelled items (without label) in the unlabelled set, to add information. | True |
USE_EXTRA | Used by SVHN dataset | False |
DATA/SPLITS | Subsection for defining different data splits. | See below for parameters |
SPLITS/TRAIN | Subsection example for Train split. | |
SPLITS/PATH | Path to train set. | 'data' |
SPLITS/NAME_UNLABELLED | In case using your own dataset : name of folder containing unlabelled items. | 'nodata' |
SPLITS/TRANSFORMS | To add transformations to the dataset. | [] |
METRICS parameters details
Parameter | Description | Default value |
---|---|---|
METRICS/VALIDATION | Subsection for validation metrics. | See below for parameters. |
VALIDATION/NAME | Name of the first validation metric. | Accuracy |
VALIDATION/PARAMS | Parameters for the corresponding metric. | {task: multiclass} |
METRICS/TEST | Subsection for test metrics. | See below for parameters. |
TEST/NAME | Name of the first test metric. | Accuracy |
TEST/PARAMS | Parameters for the corresponding metric. | {task: multiclass} |
Tip
You can add as many metrics as you want. Simply add a new item in the list. The names should be took in torchmetrics list of metrics. For example:
EARLYSTOPPING parameters details
Parameter | Description | Default value |
---|---|---|
EARLYSTOPPING/NAME | Name of the monitored metric. | VALIDATION/Loss |
EARLYSTOPPING/PARAMS | Dictionary of parameters for EarlyStopping. | {'mode': 'min', 'patience': 10} |
Once you have your configuration file ready, you have multiple choices: use a notebook or use a script.
Notebook Usage
If you want to use a notebook, simply feed the configuration file's path to your trainer.
from pytorch_lightning import Trainer
import semipy as smp
trainer = Trainer(max_epochs=100, accelerator='gpu')
lightning_module = smp.pl.LitFixMatch(config='config.yaml')
trainer.fit(lightning_module)
Script usage
You can also use a custom script, or use the main.py script present in the root folder of the library. Just use the parser -config "path_to_config_file" option to use your configuration file.
Without configuration file ?
It is stil possible to use SemiPy without a YAML configuration file. You can also use a dictionary of parameters. Note that when building a trainer (with or without PyTorch Lightning), the library will still use the above provided default configuration file in order to fill in the parameters that have not been specified by the user. For example with a dictionary:
import semipy as smp
args = {'EPOCHS': 100, 'BALANCING_WEIGHT': 0.12}
trainer = smp.tools.SSLTrainer(config=args)
trainer.fit()
Finally, if you don't want to use the provided trainers, you have access to all the useful functions that comes with SemiPy, especially the different loss functions specific to each Semi-Supervised learning algorithms (that are written in a PyTorch style), and of course the JointSampler. It's up to you to build your own code with those more in-depth functions. Note that most of the code is Pytorch Lightning compatible, so you can also build your own Lightning Module.