A CLI tool to prepare your Pytorch models for efficient inference. The only prerequisite is a model trained and saved with torch.save(model_name, model_path)
. See example.py
for an example.
Be warned: torchprep
is an experimental tool so expect bugs, deprecations and limitations. That said if you like the project and would like to improve it please open up a Github issue!
Create a virtual environment
apt-get install python3-venv
python3 -m venv venv
source venv/bin/activate
Install poetry
sudo python3 -m pip install -U pip
sudo python3 -m pip install -U setuptools
pip install poetry
Install torchprep
cd torchprep
poetry install
pip install torchprep
torchprep quantize --help
# Install example dependencies
pip install torchvision transformers
# Download resnet and bert example
python tests/download_example.py
# quantize a cpu model with int8 on cpu and profile with a float tensor of shape [64,3,7,7]
torchprep quantize models/resnet152.pt int8
To profile a model you need to create a yaml
file describing your model input shape. The YAML can accept multiple inputs
# restnet.yaml
input:
dtype: "int8"
device: "cpu"
shape: [16, 3, 7, 7] # the first element is the batch size
Then you can pass in the yaml
file to torchprep
# profile a model for a 100 iterations
torchprep profile models/resnet152.pt --iterations 100 --device cpu --input-shape config/resnet.yaml
# set omp threads to 1 to optimize cpu inference
torchprep env --device cpu
# Prune 30% of model weights
torchprep prune models/resnet152.pt --prune-amount 0.3
Usage: torchprep [OPTIONS] COMMAND [ARGS]...
Options:
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or
customize the installation.
--help Show this message and exit.
Commands:
distill Create a smaller student model by setting a distillation...
prune Zero out small model weights using l1 norm
env-variables Set environment variables for optimized inference.
fuse Supports optimizations including conv/bn fusion, dropout...
profile Profile model latency
quantize Quantize a saved torch model to a lower precision float...
torchprep <command> --help
Usage: torchprep quantize [OPTIONS] MODEL_PATH PRECISION:{int8|float16}
Quantize a saved torch model to a lower precision float format to reduce its
size and latency
Arguments:
MODEL_PATH [required]
PRECISION:{int8|float16} [required]
Options:
--device [cpu|gpu] [default: Device.cpu]
--input-shape TEXT Comma separated input tensor shape
--help Show this message and exit.
pytest --disable-pytest-warnings
To create binaries and test them out locally
poetry build
pip install --user /path/to/wheel
poetry config pypi-token.pypi <SECRET_KEY>
poetry publish --build
- Supporting add custom model names and output paths
- Support multiple input tensors for models like BERT that expect a batch size and sequence length
- Support multiple input tensor types
- Print environment variables
- TensorRT
- IPEX
- Integrate into universal benchmark tool
serve/benchmarks
- Automatic distillation example: Reduce parameter count by 1/3
torchprep distill model.pt 1/3
- Training aware optimizations
- Get model input shape with type annotations - solution exists in Python 3.11 only
- Automated release with github actions - low priority for now