Audio2X SDK

The Audio2X SDK is a comprehensive toolkit for fast audio-driven animation and emotion detection. It consists of two main components:

Audio2Emotion SDK: Analyzes audio (speech) input to detect and classify emotional states
Audio2Face SDK: Generates facial animations from audio (speech) input, supporting both regression and diffusion-based approaches

The SDK is designed for high-performance applications requiring fast audio processing and animation generation. It leverages NVIDIA's GPU computing capabilities through CUDA and TensorRT for optimal performance.

Key Features

Faster Than Real-time Processing: Faster than 60 FPS frame generation
Multi-track Support: Process multiple audio streams simultaneously
GPU Acceleration: Full CUDA/TensorRT integration
Flexible Architecture: Support for both batch and interactive processing modes
Cross-platform: Windows and Linux support

Build

Prerequisites

System Requirements

Operating System: Windows 10/11 or Linux (Ubuntu 20.04+)
GPU: NVIDIA GPU with CUDA 12.8.0+ support
Memory: 8GB+ RAM, 4GB+ GPU memory recommended
Storage: 10GB+ free space for SDK and models

Build Tools

To build Audio2X SDK, you will first need the following software packages.

Windows
- MSBuild (Visual Studio 2022+)
Linux
- g++
- make
Common
Note: You can use your own dependencies or use the pre-fetched versions, which will be downloaded to _deps\build-deps by running .\fetch_deps.{bat|sh}.
- CMake
- Ninja (Optional)

System Packages

CUDA >=12.8, <13.0 (12.9 is recommended)
git
git-lfs
python >= v3.8, <= v3.10.x
pip >= v19.0
TensorRT >=10.13, <11.0

Building Audio2X SDK

Using the default build script

(Note: Use debug instead of release below for a debug build)

Windows

git clone https://github.com/NVIDIA/Audio2Face-3D-SDK.git
cd Audio2Face-3D-SDK
git lfs pull # Pull large files in sample-data
.\fetch_deps.bat release

$env:TENSORRT_ROOT_DIR="C:\path\to\tensorrt"
$env:CUDA_PATH="C:\path\to\cuda" # Usually not needed if the CUDA Toolkit installer has already set it
.\build.bat clean release # Optional: Remove previous build
.\build.bat all release # Uses CMake and Ninja from `_deps\build-deps` and builds to `_build` by default.

Linux

git clone https://github.com/NVIDIA/Audio2Face-3D-SDK.git
cd Audio2Face-3D-SDK
git lfs pull # Pull large files in sample-data
./fetch_deps.sh release

export TENSORRT_ROOT_DIR="path/to/tensorrt"
./build.sh clean release # Optional: Remove previous build
./build.sh all release # Uses CMake and Ninja from `_deps\build-deps` and builds to `_build` by default.

Using CMake

Windows

$env:TENSORRT_ROOT_DIR="C:\path\to\tensorrt"
$env:CUDA_PATH="C:\path\to\cuda" # Usually not needed if the CUDA Toolkit installer has already set it
cmake -B _build -G "Visual Studio 17 2022" -S . -DTENSORRT_ROOT_DIR="$env:TENSORRT_ROOT_DIR" -DCUDA_PATH="$env:CUDA_PATH" -DCMAKE_GENERATOR_TOOLSET="cuda=$env:CUDA_PATH"
cmake --build _build --target ALL_BUILD --config Release --parallel

Linux

export TENSORRT_ROOT_DIR="path/to/tensorrt"
cmake -B _build -G "Unix Makefiles" -S . -DCMAKE_BUILD_TYPE=Release
cmake --build _build --target all --config Release --parallel

Build Output Structure

After a successful build, you should see a directory structure like this:

_build/
└── release/               # Release (or debug) build artifacts
    ├── audio2emotion-sdk/
    │   ├── bin/           # Audio2Emotion samples and unit test executables
    │   └── lib/           # Audio2Emotion static libraries
    ├── audio2face-sdk/
    │   ├── bin/           # Audio2Face samples and unit test executables
    │   └── lib/           # Audio2Face static libraries
    ├── audio2x-common/
    │   ├── bin/           # Audio2X Common unit test executables
    │   └── lib/           # Audio2X Common static libraries
    └── audio2x-sdk/       # Combines A2E + A2F + A2X Common into a single shared library
        ├── bin/           # audio2x.dll (on Windows)
        ├── include/       # Header files
        └── lib/           # Import libraries (on Windows) or libaudio2x.so (on Linux)

The audio2x-sdk directory contains the unified SDK that combines both Audio2Emotion and Audio2Face functionality into a single shared library for easy integration.

Downloading Models and Generating Test Data

License-protected models (Gated Models)

Audio2Emotion models are gated on Hugging Face and require a license click-through tied to your Hugging Face account. To download them, you must:

Accept the model's license on its Hugging Face page (click Agree and access repository).
- The default model used is https://huggingface.co/nvidia/Audio2Emotion-v2.2. Please visit the page with your Hugging Face account and accept the license. You should see the license prompt the first time you visit.
Authenticate the CLI so the script can use your credentials.
- Generate a user access token from your Hugging Face account.
  - Please ensure Read access to contents of all public gated repos you can access permission is enabled for this token.
- Log in via the CLI using: hf auth login

Here's a complete example of the whole process:

Windows

# Create venv
python -m venv venv # Requires python >= v3.8, <= v3.10.x
.\venv\Scripts\activate
pip install -r deps\requirements.txt # If this step fails, please verify your Python version (python --version).

# Run these scripts in venv
hf auth login         # One-time setup: when prompted, paste the user access token you generated on Hugging Face
.\download_models.bat # Download all the Audio2Face & Audio2Emotion models

# Generate unit test data
# Convert downloaded models to TensorRT format
.\gen_testdata.bat

Linux

# Create venv
python -m venv venv # Requires python >= v3.8, <= v3.10.x
source ./venv/bin/activate
pip install -r deps/requirements.txt # If this step fails, please verify your Python version (python --version).

# Run these scripts in venv
hf auth login         # One-time setup: when prompted, paste the user access token you generated on Hugging Face
./download_models.sh  # Download all the Audio2Face & Audio2Emotion models

# Generate unit test data
# Convert downloaded models to TensorRT format
./gen_testdata.sh

Running Audio2X SDK Unit Tests and Samples

To verify your setup is correct, run the provided samples and unit tests. This process involves several steps:

Install Python dependencies - Required packages for downloading models from Hugging Face and generating test data
Download models and generate test data - Using the provided scripts
Run samples using the wrapper script - The run_sample.{bat|sh} script is necessary because the SDK is a single DLL that depends on CUDA and TensorRT libraries, which must be properly located in the system PATH.

Below are the platform-specific instructions for Windows and Linux:

Windows

# Run samples (Please ensure that the environment variables CUDA_PATH and TENSORRT_ROOT_DIR are set)
.\run_sample.bat .\_build\release\audio2face-sdk\bin\audio2face-unit-tests.exe
.\run_sample.bat .\_build\release\audio2face-sdk\bin\sample-a2f-executor.exe

# By default, the script runs a release build. To run a debug build, pass the debug argument.
.\run_sample.bat debug .\_build\debug\audio2face-sdk\bin\sample-a2f-executor.exe

# Run benchmarks
.\run_sample.bat .\_build\release\audio2face-sdk\bin\audio2face-benchmarks.exe --benchmark_filter=<filter>

# Run benchmarks with a default set of filters
.\run_sample.bat .\audio2face-sdk\source\benchmarks\test_benchmark.bat .\_build\release\audio2face-sdk\bin\audio2face-benchmarks.exe

Linux

# Run samples (Please ensure that the environment variables CUDA_PATH and TENSORRT_ROOT_DIR are set)
./run_sample.sh ./_build/release/audio2face-sdk/bin/audio2face-unit-tests
./run_sample.sh ./_build/release/audio2face-sdk/bin/sample-a2f-executor

# By default, the script runs a release build. To run a debug build, pass the debug argument.
./run_sample.sh debug ./_build/debug/audio2face-sdk/bin/sample-a2f-executor

# Run benchmarks
./run_sample.sh ./_build/release/audio2face-sdk/bin/audio2face-benchmarks.exe --benchmark_filter=<filter>

# Run benchmarks with a default set of filters
./run_sample.sh ./audio2face-sdk/source/benchmarks/test_benchmark.bat ./_build/release/audio2face-sdk/bin/audio2face-benchmarks.exe

✅ What to Expect After Running the Samples

All unit tests should pass
Sample executables should complete without any errors

The sample does not include a GUI, so there is no visualization of the generated vertex positions.
To view the results, you can either:
- Export the data to a .bin file and visualize it in a DCC (Digital Content Creation) tool of your choice, or
- Use Maya-ACE for direct integration with Autodesk Maya.

⚠️ Common Issues and Troubleshooting

If you encounter errors, here are some common causes:

build.bat shows "Visual Studio installation not found"

Visual Studio with the C++ compiler toolchain was not found. Please install it or set the VS_PATH variable manually in build.bat.
build.bat shows "TENSORRT_ROOT_DIR is not defined"

Make sure the TENSORRT_ROOT_DIR environment variable points to your TensorRT directory.
Samples printed [A2F SDK] [ERROR] Unable to parse file...

Make sure to run the download_models and gen_testdata scripts before running the samples or unit tests. These scripts will create the required _data\generated directory.
.\venv\Scripts\Activate.ps1 cannot be loaded because running scripts is disabled on this system.

You need to allow PowerShell to run local scripts. To fix this, open PowerShell and run:
```
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```
Please check about_Execution_Policies - Powershell | Microsoft learn for more details.

Getting Started with Development

Now that you've set up the SDK, here are some recommended next steps to explore and get the most out of it:

Read the High-Level Overview

Check out the high-level overview document to understand the architecture, core concepts, and key components in the SDK.

Explore the Samples

Check out the provided samples to see example code and typical use cases for the SDK:

Try the Maya Plugin

Maya-ACE includes a local inference player node that demonstrates direct SDK integration. Setup is more complex than the samples but provides visual results.

Hugging Face Pretrained Models & Custom Training

Browse the Audio2Face-3D Hugging Face Collection for available models compatible with this SDK, or use the Audio2Face-3D Training Framework to customize and train your own models!

Citation

If you use Audio2Face-3D Training Framework or Audio2Face-3D models in publications or other outputs, please use citations in the following format (BibTeX entry for LaTeX):

@misc{
      nvidia2025audio2face3d,
      title={Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars},
      author={Chaeyeon Chung and Ilya Fedorov and Michael Huang and Aleksey Karmanov and Dmitry Korobchenko and Roger Ribera and Yeongho Seol},
      year={2025},
      eprint={2508.16401},
      archivePrefix={arXiv},
      primaryClass={cs.GR},
      url={https://arxiv.org/abs/2508.16401},
      note={Authors listed in alphabetical order}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
audio2emotion-sdk		audio2emotion-sdk
audio2face-sdk		audio2face-sdk
audio2x-common		audio2x-common
audio2x-sdk		audio2x-sdk
cmake/Modules		cmake/Modules
deps		deps
docs		docs
licenses		licenses
sample-data		sample-data
tools/packman		tools/packman
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.md		CITATION.md
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
build.bat		build.bat
build.sh		build.sh
download_models.bat		download_models.bat
download_models.sh		download_models.sh
fetch_deps.bat		fetch_deps.bat
fetch_deps.sh		fetch_deps.sh
gen_testdata.bat		gen_testdata.bat
gen_testdata.sh		gen_testdata.sh
run_sample.bat		run_sample.bat
run_sample.sh		run_sample.sh

License

NVIDIA/Audio2Face-3D-SDK

Folders and files

Latest commit

History

Repository files navigation

Audio2X SDK

Key Features

Build

Prerequisites

System Requirements

Windows

Linux

Common

Building Audio2X SDK

Using the default build script

Windows

Linux

Using CMake

Windows

Linux

Build Output Structure

Downloading Models and Generating Test Data

License-protected models (Gated Models)

Windows

Linux

Running Audio2X SDK Unit Tests and Samples

Windows

Linux

✅ What to Expect After Running the Samples

⚠️ Common Issues and Troubleshooting

Getting Started with Development

Read the High-Level Overview

Explore the Samples

Try the Maya Plugin

Hugging Face Pretrained Models & Custom Training

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages