model.sync_batchnorm package¶
Submodules¶
model.sync_batchnorm.batchnorm module¶
- class model.sync_batchnorm.batchnorm.SynchronizedBatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True)¶
Bases:
model.sync_batchnorm.batchnorm._SynchronizedBatchNorm
Applies Synchronized Batch Normalization over a 2d or 3d input that is seen as a mini-batch.
\[y = \frac{x - mean[x]}{ \sqrt{Var[x] + \epsilon}} * gamma + beta\]This module differs from the built-in PyTorch BatchNorm1d as the mean and standard-deviation are reduced across all devices during training.
For example, when one uses nn.DataParallel to wrap the network during training, PyTorch's implementation normalize the tensor on each device using the statistics only on that device, which accelerated the computation and is also easy to implement, but the statistics might be inaccurate. Instead, in this synchronized version, the statistics will be computed over all training samples distributed on multiple devices.
Note that, for one-GPU or CPU-only case, this module behaves exactly same as the built-in PyTorch implementation.
The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size C (where C is the input size).
During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.
During evaluation, this running mean/variance is used for normalization.
Because the BatchNorm is done over the C dimension, computing statistics on (N, L) slices, it's common terminology to call this Temporal BatchNorm
- Args:
- num_features: num_features from an expected input of size
batch_size x num_features [x width]
- eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Default: 0.1
- affine: a boolean value that when set to
True
, gives the layer learnable affine parameters. Default:
True
- Shape:
Input: \((N, C)\) or \((N, C, L)\)
Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
- Examples:
>>> # With Learnable Parameters >>> m = SynchronizedBatchNorm1d(100) >>> # Without Learnable Parameters >>> m = SynchronizedBatchNorm1d(100, affine=False) >>> input = torch.autograd.Variable(torch.randn(20, 100)) >>> output = m(input)
- class model.sync_batchnorm.batchnorm.SynchronizedBatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)¶
Bases:
model.sync_batchnorm.batchnorm._SynchronizedBatchNorm
Applies Batch Normalization over a 4d input that is seen as a mini-batch of 3d inputs
\[y = \frac{x - mean[x]}{ \sqrt{Var[x] + \epsilon}} * gamma + beta\]This module differs from the built-in PyTorch BatchNorm2d as the mean and standard-deviation are reduced across all devices during training.
For example, when one uses nn.DataParallel to wrap the network during training, PyTorch's implementation normalize the tensor on each device using the statistics only on that device, which accelerated the computation and is also easy to implement, but the statistics might be inaccurate. Instead, in this synchronized version, the statistics will be computed over all training samples distributed on multiple devices.
Note that, for one-GPU or CPU-only case, this module behaves exactly same as the built-in PyTorch implementation.
The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size C (where C is the input size).
During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.
During evaluation, this running mean/variance is used for normalization.
Because the BatchNorm is done over the C dimension, computing statistics on (N, H, W) slices, it's common terminology to call this Spatial BatchNorm
- Args:
- num_features: num_features from an expected input of
size batch_size x num_features x height x width
- eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Default: 0.1
- affine: a boolean value that when set to
True
, gives the layer learnable affine parameters. Default:
True
- Shape:
Input: \((N, C, H, W)\)
Output: \((N, C, H, W)\) (same shape as input)
- Examples:
>>> # With Learnable Parameters >>> m = SynchronizedBatchNorm2d(100) >>> # Without Learnable Parameters >>> m = SynchronizedBatchNorm2d(100, affine=False) >>> input = torch.autograd.Variable(torch.randn(20, 100, 35, 45)) >>> output = m(input)
- class model.sync_batchnorm.batchnorm.SynchronizedBatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True)¶
Bases:
model.sync_batchnorm.batchnorm._SynchronizedBatchNorm
Applies Batch Normalization over a 5d input that is seen as a mini-batch of 4d inputs
\[y = \frac{x - mean[x]}{ \sqrt{Var[x] + \epsilon}} * gamma + beta\]This module differs from the built-in PyTorch BatchNorm3d as the mean and standard-deviation are reduced across all devices during training.
For example, when one uses nn.DataParallel to wrap the network during training, PyTorch's implementation normalize the tensor on each device using the statistics only on that device, which accelerated the computation and is also easy to implement, but the statistics might be inaccurate. Instead, in this synchronized version, the statistics will be computed over all training samples distributed on multiple devices.
Note that, for one-GPU or CPU-only case, this module behaves exactly same as the built-in PyTorch implementation.
The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size C (where C is the input size).
During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.
During evaluation, this running mean/variance is used for normalization.
Because the BatchNorm is done over the C dimension, computing statistics on (N, D, H, W) slices, it's common terminology to call this Volumetric BatchNorm or Spatio-temporal BatchNorm
- Args:
- num_features: num_features from an expected input of
size batch_size x num_features x depth x height x width
- eps: a value added to the denominator for numerical stability.
Default: 1e-5
- momentum: the value used for the running_mean and running_var
computation. Default: 0.1
- affine: a boolean value that when set to
True
, gives the layer learnable affine parameters. Default:
True
- Shape:
Input: \((N, C, D, H, W)\)
Output: \((N, C, D, H, W)\) (same shape as input)
- Examples:
>>> # With Learnable Parameters >>> m = SynchronizedBatchNorm3d(100) >>> # Without Learnable Parameters >>> m = SynchronizedBatchNorm3d(100, affine=False) >>> input = torch.autograd.Variable(torch.randn(20, 100, 35, 45, 10)) >>> output = m(input)
model.sync_batchnorm.batchnorm_reimpl module¶
model.sync_batchnorm.comm module¶
- class model.sync_batchnorm.comm.FutureResult¶
Bases:
object
A thread-safe future implementation. Used only as one-to-one pipe.
- get()¶
- put(result)¶
- class model.sync_batchnorm.comm.SlavePipe(identifier, queue, result)¶
Bases:
model.sync_batchnorm.comm._SlavePipeBase
Pipe for master-slave communication.
- run_slave(msg)¶
- class model.sync_batchnorm.comm.SyncMaster(master_callback)¶
Bases:
object
An abstract SyncMaster object.
During the replication, as the data parallel will trigger an callback of each module, all slave devices should
call register(id) and obtain an SlavePipe to communicate with the master. - During the forward pass, master device invokes run_master, all messages from slave devices will be collected, and passed to a registered callback. - After receiving the messages, the master device should gather the information and determine to message passed back to each slave devices.
- property nr_slaves¶
- register_slave(identifier)¶
Register an slave device.
- Args:
identifier: an identifier, usually is the device id.
Returns: a SlavePipe object which can be used to communicate with the master device.
- run_master(master_msg)¶
Main entry for the master device in each forward pass. The messages were first collected from each devices (including the master device), and then an callback will be invoked to compute the message to be sent back to each devices (including the master device).
- Args:
master_msg: the message that the master want to send to itself. This will be placed as the first message when calling master_callback. For detailed usage, see _SynchronizedBatchNorm for an example.
Returns: the message to be sent back to the master device.
model.sync_batchnorm.replicate module¶
- class model.sync_batchnorm.replicate.CallbackContext¶
Bases:
object
- class model.sync_batchnorm.replicate.DataParallelWithCallback(module, device_ids=None, output_device=None, dim=0)¶
Bases:
torch.nn.parallel.data_parallel.DataParallel
Data Parallel with a replication callback.
An replication callback __data_parallel_replicate__ of each module will be invoked after being created by original replicate function. The callback will be invoked with arguments __data_parallel_replicate__(ctx, copy_id)
- Examples:
> sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1]) # sync_bn.__data_parallel_replicate__ will be invoked.
- replicate(module, device_ids)¶
- model.sync_batchnorm.replicate.execute_replication_callbacks(modules)¶
Execute an replication callback __data_parallel_replicate__ on each module created by original replication.
The callback will be invoked with arguments __data_parallel_replicate__(ctx, copy_id)
Note that, as all modules are isomorphism, we assign each sub-module with a context (shared among multiple copies of this module on different devices). Through this context, different copies can share some information.
We guarantee that the callback on the master copy (the first copy) will be called ahead of calling the callback of any slave copies.
- model.sync_batchnorm.replicate.patch_replication_callback(data_parallel)¶
Monkey-patch an existing DataParallel object. Add the replication callback. Useful when you have customized DataParallel implementation.
- Examples:
> sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) > sync_bn = DataParallel(sync_bn, device_ids=[0, 1]) > patch_replication_callback(sync_bn) # this is equivalent to > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1])