Audio2Face-3D Training Framework

Resources:

Audio2Face-3D Example Dataset: https://huggingface.co/datasets/nvidia/Audio2Face-3D-Dataset-v1.0.0-claire
Maya-ACE plugin: https://github.com/NVIDIA/Maya-ACE
Research Paper: https://arxiv.org/abs/2508.16401

Audio2Face-3D

Audio2Face-3D generates high-fidelity facial animations from an audio source. The technology is capable of producing detailed and realistic articulation, including precise motion for the skin, jaw, tongue, and eyes, to achieve accurate lip-sync and lifelike character expression, including emotions.

Audio2Face-3D Training Framework is the core tool for training high-fidelity facial animation models within the Audio2Face-3D ecosystem. It supports both NVIDIA's prebuilt models and custom models tailored to specific characters, languages, or artistic styles. Training these models requires extensive datasets of synchronized facial animation and corresponding audio, which the framework is designed to leverage efficiently.

Documentation Navigation

This README

Detailed Guides

Prerequisites

System Requirements

Operating System: Linux or WSL2 (Ubuntu 22.04 recommended)
- How to install WSL2 on Windows
Storage: ~1 GB of free space for framework artifacts and the example dataset
Hardware: CUDA-compatible GPU with at least 6 GB VRAM
NVIDIA Driver: Use the following supported range:
- Linux: 575.57 - 579.x
- Windows/WSL2: 576.57 - 579.x
- Check your current version: nvidia-smi
Docker: Required for running the framework
- How to install Docker
NVIDIA Docker: Required for GPU acceleration
- Installing with Apt
- Configuring Docker

Quick Start

This quick start guide provides a comprehensive walkthrough of the Audio2Face-3D Training Framework.

Using a sample dataset available from Hugging Face, you will learn the complete end-to-end workflow, from initial setup to testing a newly trained model.

In this guide, you will learn to:

Set up the Training Framework environment.
Train a new model using the sample data.
Deploy the trained model into a usable format.
Test the new model by running an inference.

Note: If you are not familiar with Linux and are working on a Windows system, please refer to the Detailed Setup Under Windows (WSL2 / Ubuntu) section in the Training Framework page.

1. Clone Repository

Clone the Audio2Face-3D Training Framework repository:

# Create audio2face directory and navigate to it
mkdir -p ~/audio2face && cd ~/audio2face

# Clone the repository
git clone https://github.com/NVIDIA/Audio2Face-3D-Training-Framework.git

2. Setup Workspace

Create new directories to hold datasets and training files:

# Create datasets and workspace directories
mkdir -p ~/audio2face/datasets
mkdir -p ~/audio2face/workspace

3. Configure Environment

# Navigate to the repository directory
cd ~/audio2face/Audio2Face-3D-Training-Framework

# Copy environment file template
cp .env.example .env

Edit the .env file with your actual paths (use absolute paths):

A2F_DATASETS_ROOT="/home/<username>/audio2face/datasets"
A2F_WORKSPACE_ROOT="/home/<username>/audio2face/workspace"

4. Download Example Dataset

We provide the Audio2Face-3D Example Dataset as part of this framework.

Download the dataset:
- You can download the Claire dataset from: Claire Dataset on Hugging Face
- It needs to be placed under the A2F_DATASETS_ROOT directory as defined in the environment
- Authentication: You may need to authenticate with Hugging Face to access the dataset:
  - Using Tokens: Hugging Face Tokens
  - Using SSH Key: Hugging Face SSH Keys
- Clone the dataset using the following commands:

# Navigate to the datasets directory
cd ~/audio2face/datasets

# Make sure git LFS is installed
sudo apt-get install -y git-lfs
git lfs install

# Clone Claire dataset in the datasets directory using https
git clone https://huggingface.co/datasets/nvidia/Audio2Face-3D-Dataset-v1.0.0-claire

# Or alternatively clone Claire dataset in the datasets directory using SSH
git clone git@hf.co:datasets/nvidia/Audio2Face-3D-Dataset-v1.0.0-claire

Verify the dataset structure:
- After download, your dataset directory should look like this:

/home/<username>/audio2face/datasets/
└── Audio2Face-3D-Dataset-v1.0.0-claire/
      ├── data/
      │   └── claire/
      │       ├── audio/
      │       ├── cache/
      │       └── ...
      ├── docs/
      └── ...

5. Setup Permissions and Build Docker

# Navigate to the repository directory
cd ~/audio2face/Audio2Face-3D-Training-Framework

# Add executable permissions
chmod +x docker/*.sh

# Build Docker container
./docker/build_docker.sh

Note: In the next steps, all python run_*.py commands automatically execute inside Docker containers with pre-configured dependencies.

6. Run Example Training

Python Note: In Ubuntu, the python command can be python3. You'll get a warning with the correct spelling for your installation.

Step 1: Preprocess the Dataset

# Run preprocessing with example config
python run_preproc.py example-diffusion claire

Once this process is completed, the log will print the Preproc Run Name Full, like this:

This name is important for future steps. It needs to be added to the config_train.py file located in the configs/example-diffusion directory. In this file, you need to locate the following section:

PREPROC_RUN_NAME_FULL = {
    "claire": "XXXXXX_XXXXXX_example",
}

The value needs to be updated with the name that was provided in the shell log from the preproc script. In the example above, it would be updated as follows:

PREPROC_RUN_NAME_FULL = {
    "claire": "250909_135508_example",
}

Note: A new sub-directory is also created in the workspace/output_preproc directory containing the artifacts of the preproc process.

Step 2: Train

# Run training example
python run_train.py example-diffusion

Note: The training process can take some time (between 30 and 40 minutes depending on your hardware). The training log provides guidance on how much time is needed to complete the training.

Again, once this process is completed, a new sub-directory will be created in the workspace/output_train directory. The name of that directory will be reflected in the shell log. It will look like this:

You can use this name as <TRAINING_RUN_NAME_FULL> in next step.

Step 3: Deploy

# run the deploy example
python run_deploy.py example-diffusion <TRAINING_RUN_NAME_FULL>

This process creates a new sub-directory in the workspace/output_deploy directory. The name of that directory will be reflected in the shell log.

This new directory contains all the files required to use the trained model for inference.

7. Model Validation and Testing

Once training is complete, validate your custom model using one of the following methods:

Option 1: Python Inference: Generate animations in .npy format or Maya cache (.mc) format using the built-in inference engine:

python run_inference.py example-diffusion <TRAINING_RUN_NAME_FULL>

Option 2: Maya-ACE Integration: Deploy and test your model in a visual production environment using Maya and the Maya-ACE plugin.

The Maya-ACE plugin enables real-time visualization of animation inference. It allows you to see the output from a model directly on a character within the Autodesk Maya 3D environment, providing immediate visual feedback for testing and validation

Documentation: Using Trained Models in Maya-ACE 2.0
Reference Scene: Audio2Face-3D-Dataset-v1.0.0-claire/data/claire/geom/fullface/a2f_maya_scene.mb

Citation

If you use Audio2Face-3D Training Framework in your research, please cite:

@misc{nvidia2025audio2face3d,
      title={Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars},
      author={Chaeyeon Chung and Ilya Fedorov and Michael Huang and Aleksey Karmanov and Dmitry Korobchenko and Roger Ribera and Yeongho Seol},
      year={2025},
      eprint={2508.16401},
      archivePrefix={arXiv},
      primaryClass={cs.GR},
      url={https://arxiv.org/abs/2508.16401},
      note={Authors listed in alphabetical order}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
audio2face		audio2face
configs		configs
docker		docker
docs		docs
runners		runners
tools		tools
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.md		CITATION.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
VERSION.md		VERSION.md
run_deploy.py		run_deploy.py
run_inference.py		run_inference.py
run_preproc.py		run_preproc.py
run_train.py		run_train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio2Face-3D Training Framework

Audio2Face-3D

Documentation Navigation

This README

Detailed Guides

Prerequisites

System Requirements

Quick Start

1. Clone Repository

2. Setup Workspace

3. Configure Environment

4. Download Example Dataset

5. Setup Permissions and Build Docker

6. Run Example Training

Step 1: Preprocess the Dataset

Step 2: Train

Step 3: Deploy

7. Model Validation and Testing

Citation

About

Uh oh!

Releases

Packages

Languages

License

NVIDIA/Audio2Face-3D-Training-Framework

Folders and files

Latest commit

History

Repository files navigation

Audio2Face-3D Training Framework

Audio2Face-3D

Documentation Navigation

This README

Detailed Guides

Prerequisites

System Requirements

Quick Start

1. Clone Repository

2. Setup Workspace

3. Configure Environment

4. Download Example Dataset

5. Setup Permissions and Build Docker

6. Run Example Training

Step 1: Preprocess the Dataset

Step 2: Train

Step 3: Deploy

7. Model Validation and Testing

Citation

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages