This repository contains a collection of Python scripts that demonstrate how to use the OpenAI API and Azure AI Evaluation SDK to evaluate the quality and safety of AI-generated content. Most of the scripts can be run for free with GitHub Models in GitHub Codespaces, but the safety_eval.py script requires an Azure AI Project. See below for more details.
Check the samples
directory for the available scripts.
- chat_error_contentfilter.py: Makes a chat completion call with OpenAI package with a violent message and handles the content safety error in the response.
- chat_error_jailbreak.py: Makes a chat completion call with OpenAI package with a jailbreak attempt and handles the content safety error in the response.
- quality_eval_groundedness.py: Evaluates the groundedness of a sample answer and sources using the Azure AI Evaluation SDK.
- quality_eval_all_builtin_judges.py: Evaluates the quality of a sample query and answer using all of the built-in GPT-based evaluators in the Azure AI Evaluation SDK.
- quality_eval_custom.py: Evaluates the quality of a sample query and answer the Azure AI Evaluation SDK with a custom evaluator for "friendliness".
- quality_eval_other_builtins.py: Evaluates the quality of a sample query and answer using non-GPT-based evaluators in the Azure AI Evaluation SDK (NLP metrics like F1, BLEU, ROUGE, etc.).
- quality_eval_bulk.py: Evaluates the quality of multiple query/answer pairs using the Azure AI Evaluation SDK.
- safety_eval.py: Evaluates the safety of a sample query and answer using the Azure AI Evaluation SDK. This script requires an Azure AI Project.
If you open this repository in GitHub Codespaces, you can run the scripts for free using GitHub Models without any additional steps, as your GITHUB_TOKEN
is already configured in the Codespaces environment.
If you want to run the scripts locally, you need to set up the GITHUB_TOKEN
environment variable with a GitHub personal access token (PAT). You can create a PAT by following these steps:
-
Go to your GitHub account settings.
-
Click on "Developer settings" in the left sidebar.
-
Click on "Personal access tokens" in the left sidebar.
-
Click on "Tokens (classic)" or "Fine-grained tokens" depending on your preference.
-
Click on "Generate new token".
-
Give your token a name and select the scopes you want to grant. For this project, you don't need any specific scopes.
-
Click on "Generate token".
-
Copy the generated token.
-
Set the
GITHUB_TOKEN
environment variable in your terminal or IDE:export GITHUB_TOKEN=your_personal_access_token
This project includes infrastructure as code (IaC) to provision the Azure AI resources needed to run the quality and safety evaluation scripts. The IaC is defined in the infra
directory and uses the Azure Developer CLI to provision the resources.
-
Make sure the Azure Developer CLI (azd) is installed.
-
Login to Azure:
azd auth login
For GitHub Codespaces users, if the previous command fails, try:
azd auth login --use-device-code
-
Provision the OpenAI account:
azd provision
It will prompt you to provide an
azd
environment name (like "ai-evals"), select a subscription from your Azure account, and select a location where the Azure AI safety evaluators are available. Then it will provision the resources in your account. -
Once the resources are provisioned, you should now see a local
.env
file with all the environment variables needed to run the scripts. -
To delete the resources, run:
azd down