nn.Linear weight initalization - uniform or kaiming_uniform?

IMHO there is a discrepancy between the docs and code of nn.Linear, when it comes to initialization.


documentation says that the weights are initialized from
uniform ( 1/sqrt(in_ feaures) , 1/sqrt(in_ feaures)):
https://github.com/pytorch/pytorch/blob/0df574017d8f0eb1527e973defb9c65a5befe966/torch/nn/modules/linear.py#L53-L56


code says that the weights are initialized from
kaiming_uniform 
https://github.com/pytorch/pytorch/blob/77721ee318d6785010144aa4569efb98199e7162/torch/nn/modules/linear.py#L88-L89
and that includes factors of sqrt(3), gain based on 'a', and 'fan':
https://github.com/pytorch/pytorch/blob/77721ee318d6785010144aa4569efb98199e7162/torch/nn/init.py#L390-L395

Is that an error or am I missing something?

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @brianjo @mruberry @albanD

	weight: the learnable weights of the module of shape
	:math:`(\text{out\_features}, \text{in\_features})`. The values are
	initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
	:math:`k = \frac{1}{\text{in\_features}}`

	fan = _calculate_correct_fan(tensor, mode)
	gain = calculate_gain(nonlinearity, a)
	std = gain / math.sqrt(fan)
	bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation
	with torch.no_grad():
	return tensor.uniform_(-bound, bound)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nn.Linear weight initalization - uniform or kaiming_uniform? #57109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def reset_parameters(self) -> None:
	init.kaiming_uniform_(self.weight, a=math.sqrt(5))

nn.Linear weight initalization - uniform or kaiming_uniform? #57109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions