Users may encounter issues with XID 119 and XID 120 on the host. To troubleshoot efficiently, debug logs should be captured using the provided script after a reboot and before any workload starts. The script is designed to collect detailed logs when these XIDs occur.

XID 119: Indicates GSP entering a hang state.
XID 120: Indicates a GSP crash.

The script monitors XIDs (e.g., 119, 120) and captures additional debug logs for effective troubleshooting.

Next Steps

Driver Setup: Install vGPU Driver version 16.6, 17.2, NV-AIE 5.0, or 4.2 on GPUs with Hopper or Ada architectures.
Run Script:
1. Execute the script after rebooting and before starting workloads.
2. Specify XIDs to monitor (e.g., 119, 120):
  1. Linux: ./nvidia-xid-monitor-linux.sh 119,120
  2. ESXi: ./nvidia-xid-monitor-vmware.sh 119,120
3. The script generates a bug report and saves it with a timestamp.
Wait for Logs:
1. After XID 119 or 120 occurs, avoid rebooting, shutting down, or migrating VMs until the script completes log generation.

For more detailed instructions and additional information, visit the full article here.

Generate a Log File for Support

When troubleshooting vGPU-related issues, providing the correct logs and diagnostic information is essential for faster resolution.

Next Steps

Generate an NVIDIA bug report from the host and attach the generated log file when reaching out for support.
Collect system information from a Windows VM, save the report as a .nfo file and attach it when requesting support.
Use NVIDIA SMI commands for debugging:
1. Monitor GPU usage by application: nvidia-smi vgpu -p
2. Capture frame buffer session: nvidia-smi vgpu -fs
3. Check encoder session usage (should be minimal for vGPU workloads): nvidia-smi vgpu -es
Attach the nvidia-bug-report from the host and the msinfo32 report from the Windows VM when reporting an issue. This ensures a faster and more accurate diagnosis.

Previous Networking and System Configuration Issues

Next Hypervisor-Specific Issues

Debugging Issues

Script to Capture NVIDIA Bug Report After XID 119 and 120 on Host

Generate a Log File for Support