Generate CodeQL Models-as-Data (MaD) summaries (sources, sinks, summaries) from existing CodeQL databases and export them in multiple formats suitable for:
- Data extensions (YAML) for CodeQL packs
- Customization libraries (
.qll) - Bundled packs containing generated customizations
- Raw JSON for further processing
- Automated download of CodeQL databases via the Code Scanning API (when a token is provided)
- Multiple export formats:
json,extensions,customizations,bundle - GitHub Action + GH CLI extension + direct CLI usage
- Automatic language detection from database metadata (fallback to manual selection)
- Caching support (skip with
--disable-cache) - Supports (current):
java,csharp
Currently limited to the languages enforced in the code (CODEQL_LANGUAGES):
- Java
- C#
Requests / PRs to add more languages are welcome once the upstream model generator queries support them.
- name: Generate CodeQL Summaries
uses: advanced-security/codeql-summarize@v0.2.0
with:
projects: ./projects.json
token: ${{ secrets.CODEQL_SUMMARY_GENERATOR_TOKEN }}
format: extensions
output: ./generatedgh extension install advanced-security/gh-codeql-summarize
gh codeql-summarize --helpExample:
gh codeql-summarize \
--format bundle \
--input examples/projects.json \
--output ./examplesgit clone https://github.com/advanced-security/codeql-summarize.git
cd codeql-summarize
pipenv install --dev # or pip install -e . if a setup is added later
pipenv run python -m codeqlsummarize --helpMinimal invocation (using a local database + explicit language):
python -m codeqlsummarize \
-db /path/to/codeql-db \
-l java \
-f json \
-o ./out| Input | Description | Default |
|---|---|---|
project |
Single repository (owner/name) to summarize | (none) |
projects |
Path to a JSON file mapping language to list of repositories | ./projects.json |
language |
Comma-separated language list (overrides auto-detect) | (auto) |
format |
Export format: json, extensions, customizations, bundle |
extensions |
output |
Output directory (or file for certain formats) | ./ |
repository |
GitHub repository context (fallback for project) |
${{ github.repository }} |
token |
GitHub token used to download databases | ${{ github.token }} |
Notes:
- To download CodeQL databases the token must have appropriate permissions (typically
security_events:read/repodepending on visibility). A fine‑grained PAT with Code scanning read access is recommended. - If a database cannot be downloaded it will be skipped.
Example (examples/projects.json):
{
"java": ["ESAPI/esapi-java-legacy"]
}Structure: <language> → array of <owner>/<repo> strings.
| Format | Description | Output Shape |
|---|---|---|
json |
Raw rows per model type | One JSON file per database / summary (future enhancement) |
extensions |
Data extensions YAML under a CodeQL pack structure | Writes .yml under generated/ inside the detected pack |
customizations |
Single .qll customization library aggregating models |
Requires -o <file>.qll |
bundle |
Initializes / updates a CodeQL pack containing generated customizations | Creates / updates pack in output dir |
bundle will (if necessary) create a pack (e.g. java-summarize/) and generate per‑repository .qll files plus a Customizations.qll aggregator.
| Variable | Purpose |
|---|---|
GITHUB_TOKEN |
Default token for API calls (Actions) |
GITHUB_REPOSITORY |
Default repo context (owner/name) |
RUNNER_TEMP |
Temp directory root (Actions) |
DEBUG |
If set (non-empty) enables debug logging |
The tool skips repositories whose databases cannot be fetched or located, logging warnings rather than stopping the entire run.
- Maintain a
projects.jsonfile listing target repositories per language. - Schedule a workflow (e.g. nightly) to regenerate models.
- Commit or publish the generated Data Extensions / Pack as needed.
- Consume generated models in downstream CodeQL analysis.
Run tests:
pipenv run python -m unittest -vLint / format:
pipenv run black .See CONTRIBUTING.md. Please open an issue before large changes.
See SECURITY.md.
See SUPPORT.md. For general questions open a GitHub issue.
- Limited language set (Java, C#)
- No parallel download throttling handling yet
- No direct GitHub language detection fallback implemented
- JSON exporter minimal (subject to enhancement)
Licensed under the MIT License – see LICENSE.
- @GeekMasher – Author
- @zbazztian – Major contributor