Advanced usage¶
This page collects power-user workflows: custom payloads, batch experiments, and CI integration.
Custom prompts¶
Multiple prompts¶
The YAML schema supports a list of prompts:
Current behavior:
- The benchmark uses the first prompt (
prompts[0]).
If you need “run all prompts”, you can run multiple times (see Batch runs below), or contribute a feature.
Benchmark an OpenAI-compatible gateway¶
CLI¶
lgb run --provider custom --model my-model \
--base-url https://my-gateway.example/v1 \
--prompt "Say hello." \
--requests 50 --concurrency 10
YAML¶
providers:
- name: custom
model: my-model
base_url: https://my-gateway.example/v1
api_key: ${MY_GATEWAY_API_KEY}
Batch experiments¶
Matrix run (models × concurrency)¶
In Bash:
for c in 1 3 10; do
for m in gpt-4.1-mini gpt-4.1; do
lgb run --provider openai --model "$m" --concurrency "$c" --requests 30 \
--output "reports/openai_${m}_c${c}.md"
done
done
In PowerShell:
$models = @('gpt-4.1-mini','gpt-4.1')
$concurrency = @(1,3,10)
foreach ($c in $concurrency) {
foreach ($m in $models) {
lgb run --provider openai --model $m --concurrency $c --requests 30 `
--output ("reports/openai_{0}_c{1}.md" -f $m,$c)
}
}
Making results comparable¶
To reduce noise:
- Keep prompt text stable
- Fix requests and concurrency
- Run from the same region/network
- Repeat runs and compare medians / p95
CI integration (GitHub Actions)¶
You can run a small benchmark suite on a schedule, or on every release.
⚠️ Be careful with cost: benchmarking sends real requests.
Example workflow (.github/workflows/bench.yml):
name: Bench
on:
workflow_dispatch:
schedule:
- cron: "0 3 * * *" # daily
jobs:
bench:
runs-on: ubuntu-latest
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install
run: |
python -m pip install -U pip
python -m pip install llm-gateway-bench
- name: Run suite
run: |
lgb compare example-bench.yaml --output bench-report.md
- name: Upload report artifact
uses: actions/upload-artifact@v4
with:
name: bench-report
path: bench-report.md
CI integration (pull requests)¶
For PRs, consider a smoke benchmark with low request count:
requests: 3concurrency: 1
This catches obvious failures without burning budget.
Custom provider fields¶
Provider entries allow extra fields in YAML (future-proofing). For example:
providers:
- name: openrouter
model: meta-llama/llama-3.3-70b-instruct
api_key: ${OPENROUTER_API_KEY}
# extra fields (ignored by the current runner)
headers:
HTTP-Referer: https://example.com
X-Title: llm-gateway-bench
Note: currently name, model, base_url, and api_key are used by the runner. Extra fields are preserved in config validation but ignored during execution.