AI-Q NVIDIA Research Assistant Blueprint

Overview

The AI-Q NVIDIA Research Assistant blueprint allows you to create a deep research assistant that can run on-premise, allowing anyone to create detailed research reports using on-premise data and web search.

Note

To obtain results consistent with the aiq-research-assistant DeepResearch Bench leaderboard results, replace llama-3.3-nemotron-super-49b-v1 with llama-3.3-nemotron-super-49b-v1.5. The updated model weights are available from Hugging Face and as an endpoint on NVIDIA's API Catalog. The updated prompt is part of the develop branch.

Key Features

Deep Research: Given a report topic and desired report structure, an agent (1) creates a report plan, (2) searches data sources for answers, (3) writes a report, (4) reflects on gaps in the report for further queries, (5) finishes a report with a list of sources.
Parallel Search: During the research phase, multiple research questions are searched in parallel. For each query, the RAG service is consulted and an LLM-as-a-judge is used to check the relevancy of the results. If more information is needed, a fallback web search is performed. This search approach ensures internal documents are given preference over generic web results while maintaining accuracy. Performing query search in parallel allows for many data sources to be consulted in an efficient manner.
Human-in-the-loop: Human feedback on the report plan, interactive report edits, and Q&A with the final report.
Data Sources: Integration with the NVIDIA RAG blueprint to search multimodal documents with text, charts, and tables. Optional web search through Tavily.
Demo Web Application: Frontend web application showcasing end-to-end use of the AI-Q Research Assistant.

Target Audience

Research Analysts: This blueprint can be deployed by IT organizations to provide an on-premise deep research application for analysts
Developers: This blueprint serves as a reference architecture for teams to adapt to their own AI research applications

Software Components

The AI-Q Research Assistant blueprint provides these components:

Demo Frontend: A docker container with a fully functional demo web application is provided. This web application is deployed by default if you follow the getting started guides and is the easiest way to quickly experiment with deep research using internal data sources via the NVIDA RAG blueprint. The source code for this demo web application is not distributed.
Backend Service via RESTful API: The main AI-Q Research Assistant code is distributed as the aiq-aira Python package located in the /aira directory. These backend functions are available directly or via a RESTful API.
Middleware Proxy: An nginx proxy is deployed as part of the getting started guides. This proxy enables frontend web applications to interact with a single backend service. In turn, the proxy routes requests between the NVIDIA RAG blueprint services and the AI-Q Research Assistant service.

Additionally, the blueprint uses these components:

NVIDIA NeMo Agent Toolkit Provides a toolkit for managing a LangGraph codebase. Provides observability, API services and documentation, and easy configuration of different LLMs.
NVIDIA RAG Blueprint Provides a solution for querying large sets of on-premise multi-modal documents.
NVIDIA NeMo Retriever Microservices
NVIDIA NIM Microservices Used through the RAG blueprint for multi-modal document ingestion. Provides the foundational LLMs used for report writing and reasoning, including the llama-3_3-nemotron-super-49b-v1_5 reasoning model.
Web search powered by Tavily Supplements on-premise sources with real-time web search.

Technical Diagram

Minimum System Requirements

Disk Space

250 GB minimum

OS Requirements

Ubuntu 22.04

Deploy Options

Docker Compose
NVIDIA AI Workbench
Helm

Drivers

NVIDIA Container ToolKit
GPU Driver - 530.30.02 or later
CUDA version - 12.6 or later

Note: Mixed MIG support in Helm deployment requires GPU operator 25.3.2 or higher and GPU Driver 570.172.08 or higher.

Hardware Requirements

The following are the hardware requiremnts for running all services locally using Docker Compose and Helm Chart deployment.

Docker Compose

Use	Service(s)	Recommended GPU*
Nemo Retriever Microservices for multi-modal document ingest	`graphic-elements`, `table-structure`, `paddle-ocr`, `nv-ingest`, `embedqa`	1 x H100 80GB* 1 x A100 80GB 1 x B200 1 x RTX PRO 6000
Reasoning Model for Report Generation and RAG Q&A Retrieval	`llama-3_3-nemotron-super-49b-v1_5` with a FP8 profile	1 x H100 80 GB* 2 x A100 80GB 2 x B200 1 x RTX PRO 6000
Instruct Model for Report Generation	`llama-3.3-70b-instruct`	2 x H100 80GB* 4 x A100 80GB 2 x B200 2 x RTX PRO 6000
Total	Entire AI-Q Research Blueprint	4 x H100 80GB* 7 x A100 80GB 5 x B200 4 x RTX PRO 6000

*This recommendation is based off of the configuration used to test the blueprint. For alternative configurations, view the RAG blueprint documentation.

Helm

Option	RAG Deployment	AI-Q Research Assistant Deployment	Total Hardware Requirement
Single Node - MIG Sharing	Use MIG sharing	Default Deployment	4 x H100 80GB for RAG 2 x H100 80GB for AI-Q Research Assistant
Multi Node	Default Deployment	Default Deployment	8 x H100 80GB for RAG 2 x H100 80GB for AI-Q Research Assistant --- 9 x A100 80GB for RAG 4 x A100 80GB for AI-Q Research Assistant --- 9 x B200 for RAG 2 x B200 for AI-Q Research Assistant --- 8 x RTX PRO 6000 for RAG 2 x RTX PRO 6000 for AI-Q Research Assistant

Running with hosted NVIDIA NIM Microservices

This blueprint can be run entirely with hosted NVIDIA NIM Microservices, see https://build.nvidia.com/ for details.

API Keys

NVIDIA AI Enterprise developer licence required to local host NVIDIA NIM Microservices.
NVIDIA API catalog or NGC API Keys for container download and access to hosted NVIDIA NIM Microservices
TAVILY API Key for optional web search

Next Steps

Use the Get Started Notebook to deploy the blueprint with Docker and interact with the sample web application
Deploy with Docker Compose or Helm
Customize the research assistant starting with the Local Development Guide

License

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use, found in License-3rd-party.txt.

GOVERNING TERMS: AIQ blueprint software and materials are governed by the Apache License, Version 2.0, and: (a) the models, other than the Llama-3.3-Nemotron-Super-49B-v1 model, are governed by the NVIDIA Community Model License; (b) the Llama-3.3-Nemotron-Super-49B-v1 model is governed by the NVIDIA Open Model License Agreement, and (c) the NeMo Retriever extraction is released under the Apache-2.0 license.

ADDITIONAL INFORMATION: For NVIDIA Retrieval QA Llama 3.2 1B Reranking v2 model, NeMo Retriever Graphic Elements v1 model, and NVIDIA Retrieval QA Llama 3.2 1B Embedding v2: Llama 3.2 Community License Agreement, Built with Llama. For Llama-3.3-70b-Instruct model, Llama 3.3 Community License Agreement, Built with Llama.

Security Considerations

Prompt Content Filtering: The AI-Q Research Assistant includes input validation that detects and blocks user prompts containing suspicious text patterns (e.g., "ignore all instructions", "DROP TABLE", "eval()", etc.). This helps reduce the risk of prompt injection but does not provide protection against SQL injection, code execution, or XSS—those require proper security controls at the database, application, and output layers respectively. See Prompt Content Filtering Tests for basic testing examples.
The AI-Q Research Assistant Blueprint doesn't generate any code that may require sandboxing.
The AI-Q Research Assistant Blueprint is shared as a reference and is provided "as is". The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, implement logging and monitoring capabilities, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment up to date, ensure the containers/source code are secure and free of known vulnerabilities.
A frontend that handles AuthN & AuthZ should be in place as missing AuthN & AuthZ could provide ungated access to customer models if directly exposed to e.g. the internet, resulting in either cost to the customer, resource exhaustion, or denial of service.
The AI-Q Research Assistant doesn't require any privileged access to the system.
The end users are responsible for ensuring the availability of their deployment.
The end users are responsible for building the container images and keeping them up to date.
The end users are responsible for ensuring that OSS packages used by the developer blueprint are current.
The logs from nginx proxy, backend, and demo app are printed to standard out. They can include input prompts and output completions for development purposes. The end users are advised to handle logging securely and avoid information leakage for production use cases.

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github		.github
.project		.project
aira		aira
configs		configs
data		data
demo		demo
deploy		deploy
docs		docs
notebooks		notebooks
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-3rd-party.txt		LICENSE-3rd-party.txt
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Q NVIDIA Research Assistant Blueprint

Overview

Table of Contents

Key Features

Target Audience

Software Components

Technical Diagram

Minimum System Requirements

Disk Space

OS Requirements

Deploy Options

Drivers

Hardware Requirements

Docker Compose

Helm

Running with hosted NVIDIA NIM Microservices

API Keys

Next Steps

License

Security Considerations

About

Uh oh!

Releases 9

Packages

Contributors 11

Uh oh!

Languages

License

NVIDIA-AI-Blueprints/aiq-research-assistant

Folders and files

Latest commit

History

Repository files navigation

AI-Q NVIDIA Research Assistant Blueprint

Overview

Table of Contents

Key Features

Target Audience

Software Components

Technical Diagram

Minimum System Requirements

Disk Space

OS Requirements

Deploy Options

Drivers

Hardware Requirements

Docker Compose

Helm

Running with hosted NVIDIA NIM Microservices

API Keys

Next Steps

License

Security Considerations

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 11

Uh oh!

Languages

Packages