Product Website | 🤗 Hugging Face | Paper | Paper Website
NVIDIA Cosmos™ is a platform purpose-built for physical AI, featuring state-of-the-art generative world foundation models (WFMs), robust guardrails, and an accelerated data processing and curation pipeline. Designed specifically for real-world systems, Cosmos enables developers to rapidly advance physical AI applications such as autonomous vehicles (AVs), robots, and video analytics AI agents.
Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason.
-
[October 21, 2025] We added on-the-fly computation support for depth and segmentation, and fixed multicontrol experiments in inference. Also, updated Docker base image version, and Gradio related documentation.
-
[October 13, 2025] Updated Transfer2.5 Auto Multiview post-training datasets, and setup dependencies to support NVIDIA Blackwell.
-
[October 6, 2025] We released Cosmos-Transfer2.5 and Cosmos-Predict2.5 - the next generation of our world simulation models!
-
[June 12, 2025] As part of the Cosmos family, we released Cosmos-Transfer1-DiffusionRenderer
Cosmos-Transfer2.5 is a multi-controlnet designed to accept structured input of multiple video modalities including RGB, depth, segmentation and more. Users can configure generation using JSON-based controlnet_specs, and run inference with just a few commands. It supports both single-video inference, automatic control map generation, and multiple GPU setups.
Physical AI trains upon data generated in two important data augmentation workflows.
Minimizing the need for achieving high fidelity in 3D simulation.
Input prompt:
The video is a demonstration of robotic manipulation, likely in a laboratory or testing environment. It features two robotic arms interacting with a piece of blue fabric.
Click to see more prompt
The setting is a room with a beige couch in the background, providing a neutral backdrop for the robotic activity. The robotic arms are positioned on either side of the fabric, which is placed on a yellow cushion. The left robotic arm is white with a black gripper, while the right arm is black with a more complex, articulated gripper. At the beginning, the fabric is laid out on the cushion. The left robotic arm approaches the fabric, its gripper opening and closing as it positions itself. The right arm remains stationary initially, poised to assist. As the video progresses, the left arm grips the fabric, lifting it slightly off the cushion. The right arm then moves in, its gripper adjusting to grasp the opposite side of the fabric. Both arms work in coordination, lifting and holding the fabric between them. The fabric is manipulated with precision, showcasing the dexterity and control of the robotic arms. The camera remains static throughout, focusing on the interaction between the robotic arms and the fabric, allowing viewers to observe the detailed movements and coordination involved in the task.
Input Video | Computed Control | Output Video |
---|---|---|
robot_input.mp4 |
robot_edge.mp4 |
robot_edge_output.mp4 |
Leveraging sensor captured RGB or ground truth augmentations.
Input prompt:
The video is a driving scene through a modern urban environment, likely captured from a dashcam or a similar fixed camera setup inside a vehicle.
Click to see more prompt
The scene unfolds on a wide, multi-lane road flanked by tall, modern buildings with glass facades. The road is relatively empty, with only a few cars visible, including a black car directly ahead of the camera, maintaining a steady pace. The camera remains static, providing a consistent view of the road and surroundings as the vehicle moves forward.On the left side of the road, there are several trees lining the sidewalk, providing a touch of greenery amidst the urban setting. Pedestrians are visible on the sidewalks, some walking leisurely, while others stand near the buildings. The buildings are a mix of architectural styles, with some featuring large glass windows and others having more traditional concrete exteriors. A few commercial signs and logos are visible on the buildings, indicating the presence of businesses and offices.Traffic cones are placed on the road ahead, suggesting some form of roadwork or lane closure, guiding the vehicles to merge or change lanes. The road markings are clear, with white arrows indicating the direction of travel. The sky is clear, suggesting a sunny day, which enhances the visibility of the scene. Throughout the video, the vehicle maintains a steady speed, and the camera captures the gradual approach towards the intersection, where the road splits into different directions. The overall atmosphere is calm and orderly, typical of a city during non-peak hours.
Input Video | Computed Control | Output Video |
---|---|---|
car_input.mp4 |
car_edge.mp4 |
car_edge_output.mp4 |
Cosmos-Transfer supports data generation in multiple industry verticals, outlined below. Please check back as we continue to add more specialized models to the Transfer family!
Cosmos-Transfer2.5-2B: General checkpoints, trained from the ground up for Physical AI and robotics.
Cosmos-Transfer2.5-2B/auto: Specialized checkpoints, post-trained for Autonomous Vehicle applications. Multiview checkpoints
We thrive on community collaboration! NVIDIA-Cosmos wouldn't be where it is without contributions from developers like you. Check out our Contributing Guide to get started, and share your feedback through issues.
Big thanks 🙏 to everyone helping us push the boundaries of open-source physical AI!
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
NVIDIA Cosmos source code is released under the Apache 2 License.
NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.