v0.5.1 · Apache License 2.0 · Python 3.12+

Token-Efficient AI Agent Intelligence

Microkernel AI Agent — same budget, have your Agent do more, do it better.
Smart routing, persistent memory, secure sandbox, plus built-in search and local embeddings.

Quickstart GitHub

60-80%¹

Token Cost Savings

N+

Meta-skills

› Fable 5

Smart routing ahead

10+

Channels Built-in

See It in Action

Quick demos showing how OpenSquilla solves real workflows

Windows portable install walkthrough

Short drama meta-skill

Paper writing meta-skill

Quickstart

Four paths to get started — pick the one that fits you

The clearest path on macOS and Windows — a signed desktop app that bundles the control UI and the gateway runtime. No Python, no terminal.

Download for macOS Apple Silicon · .dmg Download for Windows Windows 10/11 x64 · .exe

v0.5.1 · signed & notarized.

⚡Faster in Mainland China： macOS · Windows

Download and install

Pick your platform above. On macOS, open the .dmg and drag OpenSquilla into Applications. On Windows, run the .exe installer.

Open OpenSquilla

Launch the app. The bundled gateway starts automatically — there is nothing else to install.

Complete first-run setup

On first launch, onboarding walks you through choosing a provider and pasting the requested keys. The control UI then opens right inside the app.

Windows builds are signed and macOS builds are notarized, so no SmartScreen or Gatekeeper override is needed. On Linux, use Quick terminal install.

The recommended path on Windows, macOS, and Linux. uv installs OpenSquilla into its own isolated environment and manages its own Python — no system Python required. This path installs published releases only.

Install uv

Skip if uv --version already works.

$ curl -LsSf https://astral.sh/uv/install.sh | sh
$ . "$HOME/.local/bin/env"

Install OpenSquilla

The same command on every platform.

$ uv tool install --python 3.12 "opensquilla[recommended] @ https://github.com/opensquilla/opensquilla/releases/download/v0.5.1/opensquilla-0.5.1-py3-none-any.whl"

Installs the OpenSquilla wheel from the release URL, then lets uv download the dependencies declared by the selected extras. The default recommended extra includes SquillaRouter runtime dependencies (ONNX Runtime, LightGBM, NumPy, tokenizers).

Configure and run

# Interactive onboarding wizard
$ opensquilla onboard

# Start ASGI server
$ opensquilla gateway run

If opensquilla is not found right after a fresh uv install, open a new terminal or re-run the PATH line from step 1.

Wheel URLs are versioned by design — installers validate the version in the filename. The command above pins to v0.5.1.

Prerequisites: Git · Git LFS · uv

If uv is unavailable, the installer falls back to Python 3.12+ with pip ≥ 23.

Optional: install prerequisites from a terminal

Windows PowerShell

winget install --id Git.Git -e
winget install --id GitHub.GitLFS -e
powershell -ExecutionPolicy Bypass -c "irm https://astral.sh/uv/install.ps1 | iex"
git lfs install

macOS (Homebrew)

brew install git git-lfs uv
git lfs install

Debian / Ubuntu

sudo apt update
sudo apt install -y git git-lfs
curl -LsSf https://astral.sh/uv/install.sh | sh
git lfs install

Fedora

sudo dnf install -y git git-lfs
curl -LsSf https://astral.sh/uv/install.sh | sh
git lfs install

Arch

sudo pacman -S --needed git git-lfs
curl -LsSf https://astral.sh/uv/install.sh | sh
git lfs install

Clone with LFS

$ git lfs install
$ git clone https://github.com/opensquilla/opensquilla.git
$ cd opensquilla
$ git lfs pull --include="src/opensquilla/squilla_router/models/**"

Git LFS pulls bundled ML routing models. The pull is idempotent — it fetches missing assets and exits quietly when the checkout is already complete.

Run the installer

# Installs .[recommended] via uv tool install (falls back to pip --user)
$ bash scripts/install_source.sh

Most channels work from the base install. Optional extras: matrix, matrix-e2e, document-extras — pass via OPENSQUILLA_INSTALL_EXTRAS=matrix (bash) or -Extras matrix (PowerShell).

Configure (interactive wizard)

$ opensquilla onboard

Walks you through model providers, channels, and security policies. Use the installed opensquilla command — do not prefix with uv run unless you chose Develop from source.

Run the Gateway

# Start ASGI server (default 127.0.0.1:18791)
$ opensquilla gateway run

Then visit http://127.0.0.1:18791/control/ to open the control panel.

On Windows without the Visual C++ Redistributable, the gateway still starts; the bundled router falls back to a safe direct route.

Use this path only to modify, test, or debug the current checkout. Unlike Install from source, this path requires uv: uv sync creates a checkout-local .venv and uv run executes against the live source tree.

# Create the checkout-local .venv with recommended + dev extras
$ uv sync --extra recommended --extra dev

# Verify the install
$ uv run opensquilla --help

The recommended extra includes SquillaRouter for development too; the dev extra installs the test, lint, and typecheck tools.

Install additional extras into the same environment: uv sync --extra recommended --extra dev --extra matrix

In this mode, prefix every opensquilla command with uv run. Do not debug a development checkout through a user-local opensquilla command — that command runs in a different Python environment.

Open the contributor guide on GitHub

For advanced usage, visit GitHub repo

Deploy Once, Reach Everywhere ³

Configure one Agent, serve users across multiple channels

Terminal Web Slack Discord Telegram MS Teams Matrix Lark DingTalk WeCom QQ

Every Token Spent Where It Matters

OpenSquilla makes your Agent spend less, remember more, and run safer.

💰

Cost Optimization

Multiple strategies coordinated to maximize every Token

Smart Routing ²

Like ride-sharing — simple questions take the bus (cheap models), complex ones get the premium ride (top models). The system decides.

Hybrid Feature Analysis

Combines hand-crafted features (length, language, code blocks, keywords) with embedding-based semantic features to assess complexity and pick the right model.

Reasoning Depth Tiers

Disables reasoning billing for simple queries, only enabling deep thought for complex ones — no paying reasoning Tokens for "hello".

Adaptive Prompts

Auto-tunes the prompt based on task complexity — telling the model how deeply to think. Light for simple, full power for complex.

On-Demand Skills

No dumping all capabilities into context. Only loads what's needed for the current task to avoid Token waste.

🎯

Quality Boost

Multi-model ensemble routing lifts answer quality on the hardest turns — beyond any single model

Model Ensemble Routing

Fans a hard turn across multiple candidate models and synthesizes their answers, lifting accuracy beyond what any single model reaches alone.

Beats Fable 5

In internal hard-turn evals, ensemble routing outperformed every single-model baseline — including Fable 5.

Progressive Results

Useful partial answers surface while the ensemble settles, so you are never stuck waiting on the slowest candidate.

Presets & Custom Lineups

Pick a clear preset (static OpenRouter B5 / TokenRhythm B5) or build a custom lineup — with timeout tuning so slow candidates never hold up a turn.

Smart Fallback

When a single model is the better fit, routing takes the direct path automatically — accuracy where it helps, no waste where it does not.

🧩

More Core Capabilities

Beyond cost and accuracy — MetaSkills, human-like memory, and a security sandbox, all in place

MetaSkills Protocol

A meta-protocol for skills at scale: the Agent autonomously retrieves, filters, and composes Skills, then distills patterns from replayable execution traces to draft new MetaSkills — capability grows in the background.

Human-Like Memory

Four tiers — working / episodic / semantic / raw — with hybrid vector + keyword search and local ONNX embeddings (data stays on your machine). Hot memories bubble up, dated ones decay, and a 24-hour "dream" consolidates it all — smarter the more you use it.

Security Sandbox

Three-tier policy plus real Bubblewrap / Seatbelt isolation, a denial ledger, stale-output protection, and prompt-injection defense — let the Agent act without fearing what it might do.

Microkernel: Tiny Core, Vast Ecosystem

Inspired by OS microkernels — the core engine does the minimum: orchestration and state management. Everything else runs as plugins in "user space". Switch LLM providers? Implement a Protocol. Add new tools? 5 lines of code. Plugin crashes don't affect the core; core upgrades don't break plugins.

OpenSquilla Core Engine

Compact pipeline orchestrator · State machine · Fully async · Auto-rollback on errors

⚙️

engine/

State Machine

🤖

provider/

Multi-LLM Provider

🌐

gateway/

ASGI RPC Gateway

🧠

memory/

Multi-Tier Memory

📡

channels/

Channel Adapters

🔧

tools/ + mcp/

MCP-First Tools

🛡️

sandbox/

Security Sandbox

⏰

scheduler/

Task Scheduler

🧩

skills/

Skill Plugins

🎭

identity/

Identity & Prompts

Built-in

🔍 Search: Brave / DuckDuckGo 🧬 Local Embeddings: ONNX local inference (offline · data stays on-device) 🔌 Optional Embeddings: OpenAI / Ollama

Same Budget, Higher Intelligence Density

Side-by-side comparison with peer open-source Agent frameworks⁴

Dimension	OpenSquilla	OpenClaw	Hermes Agent
🏗️Architecture	✅ Microkernel with 5-layer separation, ultra-compact core orchestrator (~100 lines), all capabilities pluggable, auto-skip + rollback on errors	⚠️ Mature plugin ecosystem (dozens of extensions), clean boundaries but more layers	❌ Massive monolithic sync main loop (thousands of lines), all logic tightly coupled
💰Cost Optimization	✅ ML routing + reasoning depth tiers + prompt cache isolation + on-demand skills — multi-strategy savings of 60-80%	⚠️ Config-pinned primary + fallback chain, no content-aware selection	⚠️ Crude keyword + length heuristics, single routing strategy only
🎯Quality Boost	✅ Multi-model ensemble routing — dispatches hard problems to several candidates and aggregates their answers, beating any single model (including Fable 5); results stream progressively, and cost-aware fallback skips the ensemble when one model suffices	❌ Single-model direct calls, no ensemble aggregation — accuracy is capped by the one model you pick	⚠️ Mixture-of-Agents preset aggregates multiple models — strong quality, but no cost-aware fallback, so the heavy ensemble runs even when one model would do
🪄MetaSkills Protocol	✅ Composable workflows + meta-skill-creator for self-authoring + 10+ bundled & N+ community Skills auto-retrieved + Dream Mode distills usage into new candidates while idle	⚠️ Prompt-driven skill chains, no meta-protocol layer, no self-evolution; new workflows live as docs, not first-class runtime objects	❌ No reusable workflow abstraction — multi-step work is re-prompted from scratch every session
💾Memory System	✅ Vector + keyword + dedup + temporal decay + hot memory promotion + auto schema migration	⚠️ Has decay / promotion / diversity reranking, but lacks four-tier cognitive structure & Memory Dream consolidation	⚠️ Keyword-only search, no vector semantics, requires external integration for semantic memory
🛡️Security Sandbox	✅ No Docker dependency — syscall-level CPU/memory/time isolation + 3-tier network control. Fits in serverless	⚠️ Docker optional with OpenShell as a lighter alternative, still heavier than syscall-level isolation	✅ Dangerous command approval + 6 execution environments (local/Docker/SSH etc)
💰Cost Tracking	✅ Actual cost per call out of the box, quota hooks for auto-throttling on overspend	✅ Built-in pricing table, cost written to session metadata	✅ Input/output/cache-read/cache-write/reasoning tokens tracked separately
📊Observability	✅ Decision logs store hashes, not raw text (compliance-friendly), every pipeline stage instrumented	✅ Native OpenTelemetry (as plugin), plug-and-play with Prometheus/Grafana	⚠️ SQLite session table + call counter, basic level
🧩Extension DX	✅ 5-line duck-typed class is a valid plugin — no base class, no SDK package, no manifest	⚠️ Implement interface in plugin-sdk + write manifest file	⚠️ Tools auto-register on import (implicit side effects)

🏗️Architecture

OpenSquilla

✅ Microkernel with 5-layer separation, ultra-compact core orchestrator (~100 lines), all capabilities pluggable, auto-skip + rollback on errors

OpenClaw

⚠️ Mature plugin ecosystem (dozens of extensions), clean boundaries but more layers

Hermes Agent

❌ Massive monolithic sync main loop (thousands of lines), all logic tightly coupled

💰Cost Optimization

OpenSquilla

✅ ML routing + reasoning depth tiers + prompt cache isolation + on-demand skills — multi-strategy savings of 60-80%

OpenClaw

⚠️ Config-pinned primary + fallback chain, no content-aware selection

Hermes Agent

⚠️ Crude keyword + length heuristics, single routing strategy only

🎯Quality Boost

OpenSquilla

✅ Multi-model ensemble routing — dispatches hard problems to several candidates and aggregates their answers, beating any single model (including Fable 5); results stream progressively, and cost-aware fallback skips the ensemble when one model suffices

OpenClaw

❌ Single-model direct calls, no ensemble aggregation — accuracy is capped by the one model you pick

Hermes Agent

⚠️ Mixture-of-Agents preset aggregates multiple models — strong quality, but no cost-aware fallback, so the heavy ensemble runs even when one model would do

🪄MetaSkills Protocol

OpenSquilla

✅ Composable workflows + meta-skill-creator for self-authoring + 10+ bundled & N+ community Skills auto-retrieved + Dream Mode distills usage into new candidates while idle

OpenClaw

⚠️ Prompt-driven skill chains, no meta-protocol layer, no self-evolution; new workflows live as docs, not first-class runtime objects

Hermes Agent

❌ No reusable workflow abstraction — multi-step work is re-prompted from scratch every session

💾Memory System

OpenSquilla

✅ Vector + keyword + dedup + temporal decay + hot memory promotion + auto schema migration

OpenClaw

⚠️ Has decay / promotion / diversity reranking, but lacks four-tier cognitive structure & Memory Dream consolidation

Hermes Agent

⚠️ Keyword-only search, no vector semantics, requires external integration for semantic memory

🛡️Security Sandbox

OpenSquilla

✅ No Docker dependency — syscall-level CPU/memory/time isolation + 3-tier network control. Fits in serverless

OpenClaw

⚠️ Docker optional with OpenShell as a lighter alternative, still heavier than syscall-level isolation

Hermes Agent

✅ Dangerous command approval + 6 execution environments (local/Docker/SSH etc)

💰Cost Tracking

OpenSquilla

✅ Actual cost per call out of the box, quota hooks for auto-throttling on overspend

OpenClaw

✅ Built-in pricing table, cost written to session metadata

Hermes Agent

✅ Input/output/cache-read/cache-write/reasoning tokens tracked separately

📊Observability

OpenSquilla

✅ Decision logs store hashes, not raw text (compliance-friendly), every pipeline stage instrumented

OpenClaw

✅ Native OpenTelemetry (as plugin), plug-and-play with Prometheus/Grafana

Hermes Agent

⚠️ SQLite session table + call counter, basic level

🧩Extension DX

OpenSquilla

✅ 5-line duck-typed class is a valid plugin — no base class, no SDK package, no manifest

OpenClaw

⚠️ Implement interface in plugin-sdk + write manifest file

Hermes Agent

⚠️ Tools auto-register on import (implicit side effects)

Who Benefits Most from OpenSquilla?

These scenarios get the highest ROI

🏢

On-Prem Deployment

Fully offline, data never leaves your network, ML routing runs locally

📋

Compliance & Audit

Three-tier policies + hashed decision logs + human approval gates

💸

Tight Budget, High Bar

Run more tasks for the same cost — smart routing picks the most cost-effective model

🧠

Agent That Gets You

Four-tier human-like memory accumulates context — never start from zero again

Pilot Program · Open Now

Less burn. Less drama. Real delivery.

Join the OpenSquilla Pilot Group: 70 CNY in OpenRouter Tokens on entry, a daily ChatGPT Plus draw for 30 days, and a path into the Founders Group with weekly Pro draws and rewards for adopted suggestions.
Help us turn OpenSquilla into a living open-source project.

Join the Pilot Program

Want to dive into the source first? View the project on GitHub

70 CNY OpenRouter Tokens

Daily ChatGPT Plus draw

Top reward ChatGPT Pro

Token-Efficient AI Agent Intelligence

See It in Action

Quickstart

Download and install

Open OpenSquilla

Complete first-run setup

Install uv

Install OpenSquilla

Configure and run

Clone with LFS

Run the installer

Configure (interactive wizard)

Run the Gateway

Deploy Once, Reach Everywhere 3

Every Token Spent Where It Matters

Cost Optimization

Quality Boost

More Core Capabilities

Microkernel: Tiny Core, Vast Ecosystem

Same Budget, Higher Intelligence Density

Who Benefits Most from OpenSquilla?

Less burn. Less drama. Real delivery.

Deploy Once, Reach Everywhere ³