model.sync_batchnorm package

Submodules

model.sync_batchnorm.batchnorm module

class model.sync_batchnorm.batchnorm.SynchronizedBatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True)

Bases: model.sync_batchnorm.batchnorm._SynchronizedBatchNorm

Applies Synchronized Batch Normalization over a 2d or 3d input that is seen as a mini-batch.

\[y = \frac{x - mean[x]}{ \sqrt{Var[x] + \epsilon}} * gamma + beta\]

This module differs from the built-in PyTorch BatchNorm1d as the mean and standard-deviation are reduced across all devices during training.

For example, when one uses nn.DataParallel to wrap the network during training, PyTorch's implementation normalize the tensor on each device using the statistics only on that device, which accelerated the computation and is also easy to implement, but the statistics might be inaccurate. Instead, in this synchronized version, the statistics will be computed over all training samples distributed on multiple devices.

Note that, for one-GPU or CPU-only case, this module behaves exactly same as the built-in PyTorch implementation.

The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size C (where C is the input size).

During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.

During evaluation, this running mean/variance is used for normalization.

Because the BatchNorm is done over the C dimension, computing statistics on (N, L) slices, it's common terminology to call this Temporal BatchNorm

Args:
num_features: num_features from an expected input of size

batch_size x num_features [x width]

eps: a value added to the denominator for numerical stability.

Default: 1e-5

momentum: the value used for the running_mean and running_var

computation. Default: 0.1

affine: a boolean value that when set to True, gives the layer learnable

affine parameters. Default: True

Shape:
  • Input: \((N, C)\) or \((N, C, L)\)

  • Output: \((N, C)\) or \((N, C, L)\) (same shape as input)

Examples:
>>> # With Learnable Parameters
>>> m = SynchronizedBatchNorm1d(100)
>>> # Without Learnable Parameters
>>> m = SynchronizedBatchNorm1d(100, affine=False)
>>> input = torch.autograd.Variable(torch.randn(20, 100))
>>> output = m(input)
class model.sync_batchnorm.batchnorm.SynchronizedBatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)

Bases: model.sync_batchnorm.batchnorm._SynchronizedBatchNorm

Applies Batch Normalization over a 4d input that is seen as a mini-batch of 3d inputs

\[y = \frac{x - mean[x]}{ \sqrt{Var[x] + \epsilon}} * gamma + beta\]

This module differs from the built-in PyTorch BatchNorm2d as the mean and standard-deviation are reduced across all devices during training.

For example, when one uses nn.DataParallel to wrap the network during training, PyTorch's implementation normalize the tensor on each device using the statistics only on that device, which accelerated the computation and is also easy to implement, but the statistics might be inaccurate. Instead, in this synchronized version, the statistics will be computed over all training samples distributed on multiple devices.

Note that, for one-GPU or CPU-only case, this module behaves exactly same as the built-in PyTorch implementation.

The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size C (where C is the input size).

During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.

During evaluation, this running mean/variance is used for normalization.

Because the BatchNorm is done over the C dimension, computing statistics on (N, H, W) slices, it's common terminology to call this Spatial BatchNorm

Args:
num_features: num_features from an expected input of

size batch_size x num_features x height x width

eps: a value added to the denominator for numerical stability.

Default: 1e-5

momentum: the value used for the running_mean and running_var

computation. Default: 0.1

affine: a boolean value that when set to True, gives the layer learnable

affine parameters. Default: True

Shape:
  • Input: \((N, C, H, W)\)

  • Output: \((N, C, H, W)\) (same shape as input)

Examples:
>>> # With Learnable Parameters
>>> m = SynchronizedBatchNorm2d(100)
>>> # Without Learnable Parameters
>>> m = SynchronizedBatchNorm2d(100, affine=False)
>>> input = torch.autograd.Variable(torch.randn(20, 100, 35, 45))
>>> output = m(input)
class model.sync_batchnorm.batchnorm.SynchronizedBatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True)

Bases: model.sync_batchnorm.batchnorm._SynchronizedBatchNorm

Applies Batch Normalization over a 5d input that is seen as a mini-batch of 4d inputs

\[y = \frac{x - mean[x]}{ \sqrt{Var[x] + \epsilon}} * gamma + beta\]

This module differs from the built-in PyTorch BatchNorm3d as the mean and standard-deviation are reduced across all devices during training.

For example, when one uses nn.DataParallel to wrap the network during training, PyTorch's implementation normalize the tensor on each device using the statistics only on that device, which accelerated the computation and is also easy to implement, but the statistics might be inaccurate. Instead, in this synchronized version, the statistics will be computed over all training samples distributed on multiple devices.

Note that, for one-GPU or CPU-only case, this module behaves exactly same as the built-in PyTorch implementation.

The mean and standard-deviation are calculated per-dimension over the mini-batches and gamma and beta are learnable parameter vectors of size C (where C is the input size).

During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.

During evaluation, this running mean/variance is used for normalization.

Because the BatchNorm is done over the C dimension, computing statistics on (N, D, H, W) slices, it's common terminology to call this Volumetric BatchNorm or Spatio-temporal BatchNorm

Args:
num_features: num_features from an expected input of

size batch_size x num_features x depth x height x width

eps: a value added to the denominator for numerical stability.

Default: 1e-5

momentum: the value used for the running_mean and running_var

computation. Default: 0.1

affine: a boolean value that when set to True, gives the layer learnable

affine parameters. Default: True

Shape:
  • Input: \((N, C, D, H, W)\)

  • Output: \((N, C, D, H, W)\) (same shape as input)

Examples:
>>> # With Learnable Parameters
>>> m = SynchronizedBatchNorm3d(100)
>>> # Without Learnable Parameters
>>> m = SynchronizedBatchNorm3d(100, affine=False)
>>> input = torch.autograd.Variable(torch.randn(20, 100, 35, 45, 10))
>>> output = m(input)

model.sync_batchnorm.batchnorm_reimpl module

model.sync_batchnorm.comm module

class model.sync_batchnorm.comm.FutureResult

Bases: object

A thread-safe future implementation. Used only as one-to-one pipe.

get()
put(result)
class model.sync_batchnorm.comm.SlavePipe(identifier, queue, result)

Bases: model.sync_batchnorm.comm._SlavePipeBase

Pipe for master-slave communication.

run_slave(msg)
class model.sync_batchnorm.comm.SyncMaster(master_callback)

Bases: object

An abstract SyncMaster object.

  • During the replication, as the data parallel will trigger an callback of each module, all slave devices should

call register(id) and obtain an SlavePipe to communicate with the master. - During the forward pass, master device invokes run_master, all messages from slave devices will be collected, and passed to a registered callback. - After receiving the messages, the master device should gather the information and determine to message passed back to each slave devices.

property nr_slaves
register_slave(identifier)

Register an slave device.

Args:

identifier: an identifier, usually is the device id.

Returns: a SlavePipe object which can be used to communicate with the master device.

run_master(master_msg)

Main entry for the master device in each forward pass. The messages were first collected from each devices (including the master device), and then an callback will be invoked to compute the message to be sent back to each devices (including the master device).

Args:

master_msg: the message that the master want to send to itself. This will be placed as the first message when calling master_callback. For detailed usage, see _SynchronizedBatchNorm for an example.

Returns: the message to be sent back to the master device.

model.sync_batchnorm.replicate module

class model.sync_batchnorm.replicate.CallbackContext

Bases: object

class model.sync_batchnorm.replicate.DataParallelWithCallback(module, device_ids=None, output_device=None, dim=0)

Bases: torch.nn.parallel.data_parallel.DataParallel

Data Parallel with a replication callback.

An replication callback __data_parallel_replicate__ of each module will be invoked after being created by original replicate function. The callback will be invoked with arguments __data_parallel_replicate__(ctx, copy_id)

Examples:

> sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1]) # sync_bn.__data_parallel_replicate__ will be invoked.

replicate(module, device_ids)
model.sync_batchnorm.replicate.execute_replication_callbacks(modules)

Execute an replication callback __data_parallel_replicate__ on each module created by original replication.

The callback will be invoked with arguments __data_parallel_replicate__(ctx, copy_id)

Note that, as all modules are isomorphism, we assign each sub-module with a context (shared among multiple copies of this module on different devices). Through this context, different copies can share some information.

We guarantee that the callback on the master copy (the first copy) will be called ahead of calling the callback of any slave copies.

model.sync_batchnorm.replicate.patch_replication_callback(data_parallel)

Monkey-patch an existing DataParallel object. Add the replication callback. Useful when you have customized DataParallel implementation.

Examples:

> sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) > sync_bn = DataParallel(sync_bn, device_ids=[0, 1]) > patch_replication_callback(sync_bn) # this is equivalent to > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1])

model.sync_batchnorm.unittest module

class model.sync_batchnorm.unittest.TorchTestCase(methodName='runTest')

Bases: unittest.case.TestCase

assertTensorClose(x, y)

Module contents