The AI-Q NVIDIA Research Assistant blueprint allows you to create a deep research assistant that can run on-premise, allowing anyone to create detailed research reports using on-premise data and web search.
Note
To obtain results consistent with the aiq-research-assistant DeepResearch Bench leaderboard results, replace llama-3.3-nemotron-super-49b-v1
with llama-3.3-nemotron-super-49b-v1.5
. The updated model weights are available from Hugging Face and as an endpoint on NVIDIA's API Catalog. The updated prompt is part of the develop branch.
- AI-Q NVIDIA Research Assistant Blueprint
- Deep Research: Given a report topic and desired report structure, an agent (1) creates a report plan, (2) searches data sources for answers, (3) writes a report, (4) reflects on gaps in the report for further queries, (5) finishes a report with a list of sources.
- Parallel Search: During the research phase, multiple research questions are searched in parallel. For each query, the RAG service is consulted and an LLM-as-a-judge is used to check the relevancy of the results. If more information is needed, a fallback web search is performed. This search approach ensures internal documents are given preference over generic web results while maintaining accuracy. Performing query search in parallel allows for many data sources to be consulted in an efficient manner.
- Human-in-the-loop: Human feedback on the report plan, interactive report edits, and Q&A with the final report.
- Data Sources: Integration with the NVIDIA RAG blueprint to search multimodal documents with text, charts, and tables. Optional web search through Tavily.
- Demo Web Application: Frontend web application showcasing end-to-end use of the AI-Q Research Assistant.
- Research Analysts: This blueprint can be deployed by IT organizations to provide an on-premise deep research application for analysts
- Developers: This blueprint serves as a reference architecture for teams to adapt to their own AI research applications
The AI-Q Research Assistant blueprint provides these components:
- Demo Frontend: A docker container with a fully functional demo web application is provided. This web application is deployed by default if you follow the getting started guides and is the easiest way to quickly experiment with deep research using internal data sources via the NVIDA RAG blueprint. The source code for this demo web application is not distributed.
- Backend Service via RESTful API: The main AI-Q Research Assistant code is distributed as the
aiq-aira
Python package located in the/aira
directory. These backend functions are available directly or via a RESTful API. - Middleware Proxy: An nginx proxy is deployed as part of the getting started guides. This proxy enables frontend web applications to interact with a single backend service. In turn, the proxy routes requests between the NVIDIA RAG blueprint services and the AI-Q Research Assistant service.
Additionally, the blueprint uses these components:
- NVIDIA NeMo Agent Toolkit Provides a toolkit for managing a LangGraph codebase. Provides observability, API services and documentation, and easy configuration of different LLMs.
- NVIDIA RAG Blueprint Provides a solution for querying large sets of on-premise multi-modal documents.
- NVIDIA NeMo Retriever Microservices
- NVIDIA NIM Microservices Used through the RAG blueprint for multi-modal document ingestion. Provides the foundational LLMs used for report writing and reasoning, including the llama-3_3-nemotron-super-49b-v1_5 reasoning model.
- Web search powered by Tavily Supplements on-premise sources with real-time web search.
250 GB minimum
Ubuntu 22.04
Docker Compose
NVIDIA AI Workbench
Helm
NVIDIA Container ToolKit
GPU Driver - 530.30.02 or later
CUDA version - 12.6 or later
Note: Mixed MIG support in Helm deployment requires GPU operator 25.3.2 or higher and GPU Driver 570.172.08 or higher.
The following are the hardware requiremnts for running all services locally using Docker Compose and Helm Chart deployment.
Use | Service(s) | Recommended GPU* |
---|---|---|
Nemo Retriever Microservices for multi-modal document ingest | graphic-elements , table-structure , paddle-ocr , nv-ingest , embedqa |
1 x H100 80GB* 1 x A100 80GB 1 x B200 1 x RTX PRO 6000 |
Reasoning Model for Report Generation and RAG Q&A Retrieval | llama-3_3-nemotron-super-49b-v1_5 with a FP8 profile |
1 x H100 80 GB* 2 x A100 80GB 2 x B200 1 x RTX PRO 6000 |
Instruct Model for Report Generation | llama-3.3-70b-instruct |
2 x H100 80GB* 4 x A100 80GB 2 x B200 2 x RTX PRO 6000 |
Total | Entire AI-Q Research Blueprint | 4 x H100 80GB* 7 x A100 80GB 5 x B200 4 x RTX PRO 6000 |
*This recommendation is based off of the configuration used to test the blueprint. For alternative configurations, view the RAG blueprint documentation.
Option | RAG Deployment | AI-Q Research Assistant Deployment | Total Hardware Requirement |
---|---|---|---|
Single Node - MIG Sharing | Use MIG sharing | Default Deployment | 4 x H100 80GB for RAG 2 x H100 80GB for AI-Q Research Assistant |
Multi Node | Default Deployment | Default Deployment | 8 x H100 80GB for RAG 2 x H100 80GB for AI-Q Research Assistant --- 9 x A100 80GB for RAG 4 x A100 80GB for AI-Q Research Assistant --- 9 x B200 for RAG 2 x B200 for AI-Q Research Assistant --- 8 x RTX PRO 6000 for RAG 2 x RTX PRO 6000 for AI-Q Research Assistant |
This blueprint can be run entirely with hosted NVIDIA NIM Microservices, see https://build.nvidia.com/ for details.
- NVIDIA AI Enterprise developer licence required to local host NVIDIA NIM Microservices.
- NVIDIA API catalog or NGC API Keys for container download and access to hosted NVIDIA NIM Microservices
- TAVILY API Key for optional web search
- Use the Get Started Notebook to deploy the blueprint with Docker and interact with the sample web application
- Deploy with Docker Compose or Helm
- Customize the research assistant starting with the Local Development Guide
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use, found in License-3rd-party.txt.
GOVERNING TERMS: AIQ blueprint software and materials are governed by the Apache License, Version 2.0, and: (a) the models, other than the Llama-3.3-Nemotron-Super-49B-v1 model, are governed by the NVIDIA Community Model License; (b) the Llama-3.3-Nemotron-Super-49B-v1 model is governed by the NVIDIA Open Model License Agreement, and (c) the NeMo Retriever extraction is released under the Apache-2.0 license.
ADDITIONAL INFORMATION: For NVIDIA Retrieval QA Llama 3.2 1B Reranking v2 model, NeMo Retriever Graphic Elements v1 model, and NVIDIA Retrieval QA Llama 3.2 1B Embedding v2: Llama 3.2 Community License Agreement, Built with Llama. For Llama-3.3-70b-Instruct model, Llama 3.3 Community License Agreement, Built with Llama.
- Prompt Content Filtering: The AI-Q Research Assistant includes input validation that detects and blocks user prompts containing suspicious text patterns (e.g., "ignore all instructions", "DROP TABLE", "eval()", etc.). This helps reduce the risk of prompt injection but does not provide protection against SQL injection, code execution, or XSS—those require proper security controls at the database, application, and output layers respectively. See Prompt Content Filtering Tests for basic testing examples.
- The AI-Q Research Assistant Blueprint doesn't generate any code that may require sandboxing.
- The AI-Q Research Assistant Blueprint is shared as a reference and is provided "as is". The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, implement logging and monitoring capabilities, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment up to date, ensure the containers/source code are secure and free of known vulnerabilities.
- A frontend that handles AuthN & AuthZ should be in place as missing AuthN & AuthZ could provide ungated access to customer models if directly exposed to e.g. the internet, resulting in either cost to the customer, resource exhaustion, or denial of service.
- The AI-Q Research Assistant doesn't require any privileged access to the system.
- The end users are responsible for ensuring the availability of their deployment.
- The end users are responsible for building the container images and keeping them up to date.
- The end users are responsible for ensuring that OSS packages used by the developer blueprint are current.
- The logs from nginx proxy, backend, and demo app are printed to standard out. They can include input prompts and output completions for development purposes. The end users are advised to handle logging securely and avoid information leakage for production use cases.