Thanks to visit codestin.com
Credit goes to docs.pytorch.org

Shortcuts

torch.nn.init

Warning

All the functions in this module are intended to be used to initialize neural network parameters, so they all run in torch.no_grad() mode and will not be taken into account by autograd.

torch.nn.init.calculate_gain(nonlinearity, param=None)[source]

Return the recommended gain value for the given nonlinearity function. The values are as follows:

nonlinearity

gain

Linear / Identity

11

Conv{1,2,3}D

11

Sigmoid

11

Tanh

53\frac{5}{3}

ReLU

2\sqrt{2}

Leaky Relu

21+negative_slope2\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}

SELU

34\frac{3}{4}

Warning

In order to implement Self-Normalizing Neural Networks , you should use nonlinearity='linear' instead of nonlinearity='selu'. This gives the initial weights a variance of 1 / N, which is necessary to induce a stable fixed point in the forward pass. In contrast, the default gain for SELU sacrifices the normalization effect for more stable gradient flow in rectangular layers.

Parameters
  • nonlinearity – the non-linear function (nn.functional name)

  • param – optional parameter for the non-linear function

Examples

>>> gain = nn.init.calculate_gain('leaky_relu', 0.2)  # leaky_relu with negative_slope=0.2
torch.nn.init.uniform_(tensor, a=0.0, b=1.0)[source]

Fills the input Tensor with values drawn from the uniform distribution U(a,b)\mathcal{U}(a, b).

Parameters
  • tensor (Tensor) – an n-dimensional torch.Tensor

  • a (float) – the lower bound of the uniform distribution

  • b (float) – the upper bound of the uniform distribution

Return type

Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.uniform_(w)
torch.nn.init.normal_(tensor, mean=0.0, std=1.0)[source]

Fills the input Tensor with values drawn from the normal distribution N(mean,std2)\mathcal{N}(\text{mean}, \text{std}^2).

Parameters
  • tensor (Tensor) – an n-dimensional torch.Tensor

  • mean (float) – the mean of the normal distribution

  • std (float) – the standard deviation of the normal distribution

Return type

Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.normal_(w)
torch.nn.init.constant_(tensor, val)[source]

Fills the input Tensor with the value val\text{val}.

Parameters
  • tensor (Tensor) – an n-dimensional torch.Tensor

  • val (float) – the value to fill the tensor with

Return type

Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.constant_(w, 0.3)
torch.nn.init.ones_(tensor)[source]

Fills the input Tensor with the scalar value 1.

Parameters

tensor (Tensor) – an n-dimensional torch.Tensor

Return type

Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.ones_(w)
torch.nn.init.zeros_(tensor)[source]

Fills the input Tensor with the scalar value 0.

Parameters

tensor (Tensor) – an n-dimensional torch.Tensor

Return type

Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.zeros_(w)
torch.nn.init.eye_(tensor)[source]

Fills the 2-dimensional input Tensor with the identity matrix. Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible.

Parameters

tensor – a 2-dimensional torch.Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.eye_(w)
torch.nn.init.dirac_(tensor, groups=1)[source]

Fills the {3, 4, 5}-dimensional input Tensor with the Dirac delta function. Preserves the identity of the inputs in Convolutional layers, where as many input channels are preserved as possible. In case of groups>1, each group of channels preserves identity

Parameters
  • tensor – a {3, 4, 5}-dimensional torch.Tensor

  • groups (int, optional) – number of groups in the conv layer (default: 1)

Examples

>>> w = torch.empty(3, 16, 5, 5)
>>> nn.init.dirac_(w)
>>> w = torch.empty(3, 24, 5, 5)
>>> nn.init.dirac_(w, 3)
torch.nn.init.xavier_uniform_(tensor, gain=1.0)[source]

Fills the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled from U(a,a)\mathcal{U}(-a, a) where

a=gain×6fan_in+fan_outa = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}}

Also known as Glorot initialization.

Parameters
  • tensor (Tensor) – an n-dimensional torch.Tensor

  • gain (float) – an optional scaling factor

Return type

Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))
torch.nn.init.xavier_normal_(tensor, gain=1.0)[source]

Fills the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled from N(0,std2)\mathcal{N}(0, \text{std}^2) where

std=gain×2fan_in+fan_out\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in} + \text{fan\_out}}}

Also known as Glorot initialization.

Parameters
  • tensor (Tensor) – an n-dimensional torch.Tensor

  • gain (float) – an optional scaling factor

Return type

Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.xavier_normal_(w)
torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')[source]

Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from U(bound,bound)\mathcal{U}(-\text{bound}, \text{bound}) where

bound=gain×3fan_mode\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}

Also known as He initialization.

Parameters
  • tensor (Tensor) – an n-dimensional torch.Tensor

  • a (float) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu')

  • mode (str) – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

  • nonlinearity (str) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')[source]

Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from N(0,std2)\mathcal{N}(0, \text{std}^2) where

std=gainfan_mode\text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}

Also known as He initialization.

Parameters
  • tensor (Tensor) – an n-dimensional torch.Tensor

  • a (float) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu')

  • mode (str) – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

  • nonlinearity (str) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)[source]

Fills the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution N(mean,std2)\mathcal{N}(\text{mean}, \text{std}^2) with values outside [a,b][a, b] redrawn until they are within the bounds. The method used for generating the random values works best when ameanba \leq \text{mean} \leq b.

Parameters
  • tensor (Tensor) – an n-dimensional torch.Tensor

  • mean (float) – the mean of the normal distribution

  • std (float) – the standard deviation of the normal distribution

  • a (float) – the minimum cutoff value

  • b (float) – the maximum cutoff value

Return type

Tensor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.trunc_normal_(w)
torch.nn.init.orthogonal_(tensor, gain=1)[source]

Fills the input Tensor with a (semi) orthogonal matrix, as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened.

Parameters
  • tensor – an n-dimensional torch.Tensor, where n2n \geq 2

  • gain – optional scaling factor

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.orthogonal_(w)
torch.nn.init.sparse_(tensor, sparsity, std=0.01)[source]

Fills the 2D input Tensor as a sparse matrix, where the non-zero elements will be drawn from the normal distribution N(0,0.01)\mathcal{N}(0, 0.01), as described in Deep learning via Hessian-free optimization - Martens, J. (2010).

Parameters
  • tensor – an n-dimensional torch.Tensor

  • sparsity – The fraction of elements in each column to be set to zero

  • std – the standard deviation of the normal distribution used to generate the non-zero values

Examples

>>> w = torch.empty(3, 5)
>>> nn.init.sparse_(w, sparsity=0.1)

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources