Usage and Cost
OpenSquilla records token usage and estimated cost from the running gateway. Use the cost view after routed, tool-heavy, channel, or long-context work to understand where model spend is going.
Requirements
Cost inspection uses the gateway:
opensquilla gateway status
If the gateway is not running:
opensquilla gateway run
Show Cost
opensquilla cost
The default view lists session/model rows with input tokens, output tokens, and estimated cost.
Group by Model
opensquilla cost --by-model
Use this when SquillaRouter is enabled and you want to see which models carried the recent workload.
Use JSON Output
opensquilla cost --json
opensquilla cost --by-model --json
JSON output is useful for local dashboards, regression checks, and automated reports.
What to Check First
| Signal | What it can mean |
|---|---|
| Many rows for premium models | Router policy or task shape may be escalating more often than expected. |
| High input tokens | Long history, large tool results, or large prompt/tool schema surfaces may dominate cost. |
| High output tokens | The task may need tighter instructions or a smaller response format. |
| Cost concentrated in one session | Inspect that session before changing global configuration. |
Lower Cost Safely
Start with router and diagnostics:
opensquilla configure router --router recommended
opensquilla diagnostics on
opensquilla cost --by-model
For large tool results, read:
For simple one-shot automation, bound the run:
opensquilla agent --max-iterations 20 --timeout 600 -m "Bounded task"
Notes and Limits
- Cost is an estimate based on recorded runtime usage and configured pricing.
- Provider bills remain the source of truth for actual charges.
- Tool compression and routing can reduce model context cost, but they should be checked against task success, not only token totals.
- Diagnostics can explain why a turn routed, compacted, retried, or produced unusually large outputs.
Read next:
Docs index · Product guide · Improve this page · Report a docs issue