On recent PyTorch master (built from source; 0.4.0a0+4970e73) on a P100, CUDA 9:
>>> torch.triu(torch.FloatTensor([1]).expand(3, 3))
1 1 1
0 1 1
0 0 1
[torch.FloatTensor of size 3x3]
>>> torch.triu(torch.cuda.FloatTensor([1]).expand(3, 3))
1 1 1
1 1 1
1 1 1
[torch.cuda.FloatTensor of size 3x3 (GPU 0)]