Skip to content

semipy.sampler.DistributedJointSampler

Warning

This section is in construction.

    class semipy.sampler.DistributedJointSampler(dataset, batch_size, proportion)

This class is the distributed version of semipy.sampler.JointSampler. It has to be used while training with multi-GPU instead of JointSampler. It is based on the DistributedSampler from torch.

Parameters

  • dataset - A map-style dataset containing labelled and unlabelled data. Unlabelled data must be attached with a -1 label.
  • batch_size (int) - Size of batches for labelled data.
  • proportion (float) - Proportion of labelled/unlabelled data to use at each batch.
  • num_replicas (int, optional) - Number of processes participating in distributed training. By default, world_size is retrieved from the current distributed group. Default = None
  • rank (int, optional) - Rank of the current process within num_replicas. By default, rank is retrieved from the current distributed group. Default: None
  • shuffle (bool) - If True, sampler will shuffle the indices when arriving at the end of either labelled set or unlabelled set. Default: True
  • seed (int) - Random seed used to shuffle the sampler if shuffle=True. This number should be identical across all processes in the distributed group. Default: 0.
  • drop_last (bool) - If True, then the sampler will drop the tail of the data to make it evenly divisible across the number of replicas. If False, the sampler will add extra indices to make the data evenly divisible across the replicas. Default: False.