torch.nn.functional.cosine_similarity
calls squeeze()
twice: once implicitly in sum()
and once explicitly before the tensor is returned.
https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py#L1653
The second (explicit) call in line 1653 leads to problems if other dimensions than the one over which the cosine similarity is calculated have size 1, e.g. with batch size 1.
This is probably a remnant from before the behavior of sum() was changed to squeeze by default (#289 and #1563), although even then it probably should have been squeeze(dim)
in order to only squeeze the cosine dimension.