# Synrouter

> Session-aware inference API that cuts agent API costs by up to 85% by transparently injecting Anthropic prompt cache_control. Built for Claude Code, Codex CLI, Hermes, and any LLM-powered agent.

Synrouter is a session-aware inference proxy. Clients switch by changing only their base URL and API key — no SDK changes, no agent logic changes. Synrouter injects Anthropic `cache_control` breakpoints, trims `tool_result` blocks, aggregates per-session fingerprints, and reports turn-level savings.

## Key facts

- Compatible clients: Claude Code, Codex CLI, Hermes, OpenClaw, Kilo Code, any OpenAI- or Anthropic-compatible client.
- Switch method: change base URL + API key only. No code changes inside the agent.
- Two session modes:
  - **coding** — for coding agents (Claude Code / Codex CLI / Factory Droid). Target cache hit rate 85–95%, session-lifetime TTL.
  - **general** — for general agents (Hermes / OpenClaw). Layered tools (core / extended / platform / MCP), cross-session shared cache.
- Core mechanism: auto-injected Anthropic `cache_control` (4 breakpoints), intelligent `tool_result` truncation, session fingerprint aggregation, turn-level savings metering.
- API key format: `sk-sr-` prefixed keys sent in the `Authorization` header.

## Base URLs

- OpenAI-compatible clients: `https://synrouter.ai/api/v1`
- Anthropic-compatible clients: `https://synrouter.ai/api/anthropic`

## Client integration (one-line switch)

```sh
# Claude Code (Anthropic-compatible)
export ANTHROPIC_BASE_URL="https://synrouter.ai/api/anthropic"
export ANTHROPIC_API_KEY="sk-sr-..."

# Codex CLI / OpenAI-compatible
export OPENAI_BASE_URL="https://synrouter.ai/api/v1"
export OPENAI_API_KEY="sk-sr-..."

# Hermes / Kilo Code
base_url = "https://synrouter.ai/api/v1"
```

## Supported models

Models are routed via the gateway; use the full ID (e.g. `anthropic/claude-opus-4.8`) as the `model` field.

### Anthropic
- `anthropic/claude-opus-4.8` — Claude Opus 4.8 (1000K context): Maximum-quality outputs, frontier research, and the hardest reasoning tasks
- `anthropic/claude-opus-4.8-fast` — Claude Opus 4.8 Fast (1000K context): Latency-sensitive frontier workloads, interactive coding agents
- `anthropic/claude-opus-4.7` — Claude Opus 4.7 (1000K context): Cutting-edge research, safety-critical reasoning, and maximum-quality outputs
- `anthropic/claude-opus-4.7-fast` — Claude Opus 4.7 Fast (1000K context): Interactive coding agents and real-time reasoning tasks
- `anthropic/claude-sonnet-4.6` — Claude Sonnet 4.6 (1000K context): Daily coding workflows, code review, and multi-file refactoring
- `anthropic/claude-opus-4.6` — Claude Opus 4.6 (1000K context): Complex architecture design, hard debugging, and research-grade analysis
- `anthropic/claude-opus-4.6-fast` — Claude Opus 4.6 Fast (1000K context): Latency-sensitive agent workloads that still need deep reasoning
- `anthropic/claude-haiku-4.5` — Claude Haiku 4.5 (1000K context): Simple completions, classification, and cost-sensitive high-volume tasks

### DeepSeek
- `deepseek/deepseek-v4-pro` — DeepSeek V4 Pro (1000K context): Cost-effective deep reasoning, coding, and mathematical problem-solving
- `deepseek/deepseek-v4-flash` — DeepSeek V4 Flash (1000K context): Everyday agent tasks, quick coding assistance, and high-volume inference
- `deepseek/deepseek-v4-flash-free` — DeepSeek V4 Flash (Free) (1000K context): Prototyping, testing, and non-critical background tasks

### Google
- `google/gemini-3.5-flash` — Gemini 3.5 Flash (1000K context): Multimodal agent tasks, fast coding, and general-purpose inference
- `google/gemini-3.1-flash-lite-preview` — Gemini 3.1 Flash Lite (1000K context): Cost-sensitive workloads, simple Q&A, and batch processing
- `google/gemini-3.1-pro-preview` — Gemini 3.1 Pro (1000K context): Complex reasoning, long-document analysis, and research tasks
- `google/gemini-3.1-flash-image-preview` — Gemini 3.1 Flash Image (1000K context): Image generation, visual analysis, and multimodal creative workflows

### MiniMax
- `minimax/minimax-m3` — MiniMax M3 (1000K context): Long-horizon agentic workflows, multimodal tasks, and multilingual coding
- `minimax/minimax-m2.7` — MiniMax M2.7 (1000K context): Multilingual applications, content generation, and general-purpose chat

### Moonshot AI
- `moonshotai/kimi-k2.6` — Kimi K2.6 (1000K context): Long-document processing, bilingual workflows, and research assistance

### OpenAI
- `openai/gpt-5.5-pro` — GPT-5.5 Pro (1000K context): Maximum-quality outputs, frontier research, and mission-critical agent tasks
- `openai/gpt-5.5` — GPT-5.5 (1000K context): General-purpose agent tasks, balanced performance and cost
- `openai/gpt-5.4-image-2` — GPT-5.4 Image 2 (1000K context): Image generation, visual design tasks, and multimodal creative projects

### Qwen
- `qwen/qwen3.7-max` — Qwen 3.7 Max (256K context): High-quality bilingual reasoning, long documents, and enterprise Dashscope workflows
- `qwen/qwen3.6-max-preview` — Qwen 3.6 Max (1000K context): Bilingual coding, enterprise applications, and complex reasoning tasks
- `qwen/qwen3.6-flash` — Qwen 3.6 Flash (1000K context): Everyday coding, quick Q&A, and cost-effective agent pipelines

### xAI
- `x-ai/grok-4.3` — Grok 4.3 (1000K context): Real-time research, creative brainstorming, and unconventional problem-solving

### Z.ai
- `z-ai/glm-5.2` — GLM 5.2 (1000K context): Chinese-language applications, general chat, and content generation
- `z-ai/glm-5.1` — GLM 5.1 (1000K context): Chinese-language applications, general chat, and content generation


## Important links

- [Quickstart](https://synrouter.ai/docs/quickstart)
- [Agent clients guide](https://synrouter.ai/docs/agent-clients)
- [Models](https://synrouter.ai/docs/models)
- [API reference](https://synrouter.ai/docs/api-reference)
- [Pricing & usage](https://synrouter.ai/docs/pricing-and-usage)
- [Blog](https://synrouter.ai/blog)
- [Full spec for AI search engines](https://synrouter.ai/llms-full.txt)

## Notes for agents

- Prefer `/api/anthropic` for Anthropic-native clients to get full `cache_control` benefit.
- Coding-mode sessions are scoped by a session fingerprint; keep the same system prompt + tool list stable across turns to maximize cache hits.
- New models are added to the registry above automatically; no need to hardcode model IDs in client code.