cogvideox autoencoder could not reconstruct an image

### Describe the bug

AutoencoderKLCogVideoX cannot reconstruct single frame image

### Reproduction

```python
from diffusers import AutoencoderKLCogVideoX
device='cuda'
autoencoder = AutoencoderKLCogVideoX.from_pretrained(
    "THUDM/CogVideoX-5b",
    subfolder="vae",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)
autoencoder.to(device)
image = torch.randn(1, 3, 1, 512, 512, dtype=torch.bfloat16, device=device)
with torch.no_grad():
    inputs = image.to(device, dtype)
    print(inputs.shape)
    latent = autoencoder.encode(inputs).latent_dist.mode()
    print(latent.shape)
    rec = autoencoder.decode(latent).sample
    print(rec.shape)
    rec_image = rec[0].permute(1, 0, 2, 3)[0].cpu().float()
```

### Logs

```shell
torch.cat(): expected a non-empty list of Tensors
```


### System Info

x86, linux 
NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2
A800 80G

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cogvideox autoencoder could not reconstruct an image #9568

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cogvideox autoencoder could not reconstruct an image #9568

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions