Release Notes for NeMo Microservices#

Check out the latest release notes for the NeMo microservices.

Tip

If you’ve installed one of the previous releases of the NeMo microservices using Helm and want to upgrade, choose one of the following options:

To upgrade to the latest release, follow the steps at Upgrade NeMo Microservices Helm Chart.
To uninstall and reinstall, follow the steps at Uninstall NeMo Microservices Helm Chart and Install NeMo Microservices Helm Chart.

Release 25.10.0#

This release includes the following key features and known issues.

Key Features#

Platform#

Added a new Core microservices to support platform Job functionality for Data Designer, Safe Synthesizer, Auditor and Evaluator.

NeMo Auditor#

Breaking API changes for Python SDK: NeMo Auditor Python SDK now aligns with the platform-wide Jobs API. These changes require code updates:
- Job creation: Creating an audit job requires name, project, and spec fields. The config and target fields now nest under spec. Returns an AuditJob object instead of AuditJobHandle
- Job status: Getting job status returns a PlatformJobStatusResponse object instead of AuditJobStatus. Status values are lowercase strings (started, active, completed).
- Job logs: Getting job logs returns a PlatformJobLogPage object with pagination support for large log files.
- Job results: Viewing job results uses client.beta.audit.jobs.results.download() instead of download_result(). The result_id argument is renamed to result_name.
garak security scanner updated to v0.13.0: Upgraded security testing framework with enhanced probe coverage and improved reporting.

Breaking change in report format: The report.jsonl file renames these fields:

Old Name

New Name

zscore

relative_score

zscore_defcon

relative_defcon

zscore_comment

relative_comment

Risk level strings updated to: critical risk, very high risk, elevated risk, medium risk, and low risk.

New security probes:

doctor.Bypass: Tests medical roleplay scenarios that attempt to bypass safety guardrails.
doctor.BypassLeet: Tests medical roleplay with leetspeak encoding (for example, “h3ll0”).
doctor.Puppetry: Tests medical roleplay scenarios that elicit unsafe outputs.
encoding.InjectLeet: Tests 31337 (leet) text encoding that bypasses content filters.
encoding.InjectSneakyBits: Tests Unicode variant selector ASCII smuggling with invisible characters.
encoding.InjectUnicodeTagChars (Default): Tests Unicode tag-based ASCII smuggling attacks.
encoding.InjectUnicodeVariantSelectors: Tests Unicode variant selector attacks that hide malicious content.
sata.MLM: Tests simple assistive tasks that manipulate the model to disregard its system prompt. xss.MarkdownURIImageExfilExtended (Default): Tests Markdown image URIs that perform zero-click data exfiltration.
xss.MarkdownURINonImageExfilExtended (Default): Tests clickable Markdown URLs for data exfiltration.

Specify which probes to run when you create an audit config.

NeMo Customizer#

LoRA adapter weight merging and export: Export LoRA-tuned models with weights merged into the base model, producing a single deployable artifact. Works across Megatron, HuggingFace AutoModel, and embedding models.
Multi-GPU support for AutoModel fine-tuning: AutoModel engines (Phi-4, Qwen, Gemma) now support tensor parallelism greater than 1 through multi-GPU LoRA patch, enabling distributed training across multiple GPUs.
Embedding model configured by default: Embedding model fine-tuning configurations are available by default, enable the customization target to train it without additional setup.
Faster feedback when GPUs are unavailable: Training jobs that cannot acquire GPU resources fail after 30 minutes. Adjust this timeout with customizer.env.MAX_PENDING_JOB_MINUTES in Helm values.
In-batch negatives for embedding training: The in_batch_negatives dataset parameter enables contrastive learning where each example in a training batch serves as a negative sample for other examples. Improves embedding training without requiring explicit negative examples.
Custom pod specifications for model downloads: Specify Kubernetes pod configurations for model downloader through modelDownloader.podSpec in Helm values. Customize resource requests, node selectors, and tolerations for model download jobs.

NeMo Data Designer#

Kubernetes deployment support: Deploy NeMo Data Designer as a microservice on Kubernetes using Helm charts with support for custom pod specifications, replicas, node selectors, tolerations, and resource management. Integrates with platform-wide Jobs microservice for improved job lifecycle management, status tracking, and monitoring.
Multi-modal context for vision-enabled generation: Generate synthetic data using vision-enabled LLMs that analyze images alongside text prompts. Supports image context from URLs or base64-encoded data in multiple formats (PNG, JPG, JPEG, GIF, WebP), enabling workflows such as document intelligence from PDFs, image captioning, and visual question-answering.
Enhanced LLM error handling and visibility: Comprehensive error handling provides structured error messages with specific causes and actionable solutions for common LLM issues including authentication failures, rate limits, context window exceeded, connection errors, and unsupported parameters. All errors include operation-specific context (for example, “running generation for column ‘description’”).
Dataset profiling with persistent results: Generated datasets are automatically profiled for quality metrics, statistical distributions, and data characteristics. Profiling results persist after job completion.
Graceful job cancellation: Data generation jobs respond to SIGTERM signals for clean cancellation without data corruption or orphaned resources.
Model provider allow-lists: Configure allowed models for specific providers to enforce organizational policies. Set through model_provider_registry in Helm configuration.

NeMo Evaluator#

Version 2 (v2) jobs API: Improved abstraction of job orchestration and management and unifies the high level job API across NeMo Microservices. Live logs are available for jobs scheduled through the v2 jobs API. Maintains backward compatibility with v1 configurations and targets for gradual migration.
Docker Compose deployment: Improved quickstart experience with increased feature set supported when deployed with Docker Compose. Evaluator can now run all evaluation flows with the v2 jobs API. Extended Docker Compose to include optional model deployment.
Live evaluation of RAGAS metrics: Real time evaluation of RAGAS metrics covering Agentic (Tool call accuracy, agent goal accuracy and topic adherence), RAG (Faithfullness, Response relevancy, context relevance etc) and NVIDIA metrics (Answer accuracy, response groundedness, context relevance).
Safety Harness evaluation: Fixed safety evaluation errors due to models that output large number of reasoning tokens with new reasoning parameter include_if_not_finished
Multi-architecture container support: All evaluation jobs, including EvalFactory containers, support x86_64 and ARM64 architectures.
Custom evaluation metric: Added support for floats and negative numbers for the number check metric.
Prompt Optimization: Fixed serialization error on job failure.

NeMo Guardrails#

GKE Inference Gateway integration: Add NeMo Guardrails directly into a GKE Inference Gateway to guardrail all traffic for NIMs running in your cluster.
- Supports content safety and topic control Nemoguard models
- Supports streaming and batch mode
- Configurable refusal messages for each NIM
- Export metrics and tracing into Google Cloud Monitoring
OpenTelemetry observability: Comprehensive instrumentation enables monitoring and debugging of guardrail executions with distributed tracing, metrics, and logging. Track performance, identify bottlenecks, and debug policy evaluations. Refer to Observability for NeMo Guardrails.
End-to-end distributed tracing: Trace context propagates across all service boundaries to follow a single user request through guardrails, inference, and downstream services.
Kubernetes ConfigMap support: Load guardrail configuration files from Kubernetes ConfigMaps with support for environment-specific overrides.
Faster service startup: Improved ConfigRegistry cache initialization reduces cold start time.
Deprecated model mutation endpoints: The /v1/guardrail/models mutation endpoints are deprecated and will be removed in a future release. Use platform-wide model configuration mechanisms.

NeMo Safe Synthesizer#

NeMo Safe Synthesizer is a new microservice that enables you to create differentially-private synthetic versions of sensitive tabular datasets while maintaining statistical properties and protecting individual privacy. The microservice is released with early access availability and is subject to limited support and potential API changes in future releases. For more information, refer to About Generating Private Synthetic Data.

Known Issues#

NeMo Evaluator#

GPT-OSS models are not supported as LLM-as-a-Judge for Agentic, RAG, and Simple Harness evaluations due to incompatible response format. GPT-OSS models can be used as judge for custom evaluations.

NeMo Safe Synthesizer#

Arm64 is not supported.