# Synrouter > Session-aware inference API that cuts agent API costs by up to 85% by transparently injecting Anthropic prompt cache_control. Built for Claude Code, Codex CLI, Hermes, and any LLM-powered agent. Synrouter is a session-aware inference proxy. Clients switch by changing only their base URL and API key — no SDK changes, no agent logic changes. Synrouter injects Anthropic `cache_control` breakpoints, trims `tool_result` blocks, aggregates per-session fingerprints, and reports turn-level savings. ## Key facts - Compatible clients: Claude Code, Codex CLI, Hermes, OpenClaw, Kilo Code, any OpenAI- or Anthropic-compatible client. - Switch method: change base URL + API key only. No code changes inside the agent. - Two session modes: - **coding** — for coding agents (Claude Code / Codex CLI / Factory Droid). Target cache hit rate 85–95%, session-lifetime TTL. - **general** — for general agents (Hermes / OpenClaw). Layered tools (core / extended / platform / MCP), cross-session shared cache. - Core mechanism: auto-injected Anthropic `cache_control` (4 breakpoints), intelligent `tool_result` truncation, session fingerprint aggregation, turn-level savings metering. - API key format: `sk-sr-` prefixed keys sent in the `Authorization` header. ## Base URLs - OpenAI-compatible clients: `https://synrouter.ai/api/v1` - Anthropic-compatible clients: `https://synrouter.ai/api/anthropic` ## Client integration (one-line switch) ```sh # Claude Code (Anthropic-compatible) export ANTHROPIC_BASE_URL="https://synrouter.ai/api/anthropic" export ANTHROPIC_API_KEY="sk-sr-..." # Codex CLI / OpenAI-compatible export OPENAI_BASE_URL="https://synrouter.ai/api/v1" export OPENAI_API_KEY="sk-sr-..." # Hermes / Kilo Code base_url = "https://synrouter.ai/api/v1" ``` ## Supported models Models are routed via the gateway; use the full ID (e.g. `anthropic/claude-opus-4.8`) as the `model` field. ### Anthropic - `anthropic/claude-opus-4.8` — Claude Opus 4.8 (1000K context): Maximum-quality outputs, frontier research, and the hardest reasoning tasks - `anthropic/claude-opus-4.8-fast` — Claude Opus 4.8 Fast (1000K context): Latency-sensitive frontier workloads, interactive coding agents - `anthropic/claude-opus-4.7` — Claude Opus 4.7 (1000K context): Cutting-edge research, safety-critical reasoning, and maximum-quality outputs - `anthropic/claude-opus-4.7-fast` — Claude Opus 4.7 Fast (1000K context): Interactive coding agents and real-time reasoning tasks - `anthropic/claude-sonnet-4.6` — Claude Sonnet 4.6 (1000K context): Daily coding workflows, code review, and multi-file refactoring - `anthropic/claude-opus-4.6` — Claude Opus 4.6 (1000K context): Complex architecture design, hard debugging, and research-grade analysis - `anthropic/claude-opus-4.6-fast` — Claude Opus 4.6 Fast (1000K context): Latency-sensitive agent workloads that still need deep reasoning - `anthropic/claude-haiku-4.5` — Claude Haiku 4.5 (1000K context): Simple completions, classification, and cost-sensitive high-volume tasks ### DeepSeek - `deepseek/deepseek-v4-pro` — DeepSeek V4 Pro (1000K context): Cost-effective deep reasoning, coding, and mathematical problem-solving - `deepseek/deepseek-v4-flash` — DeepSeek V4 Flash (1000K context): Everyday agent tasks, quick coding assistance, and high-volume inference - `deepseek/deepseek-v4-flash-free` — DeepSeek V4 Flash (Free) (1000K context): Prototyping, testing, and non-critical background tasks ### Google - `google/gemini-3.5-flash` — Gemini 3.5 Flash (1000K context): Multimodal agent tasks, fast coding, and general-purpose inference - `google/gemini-3.1-flash-lite-preview` — Gemini 3.1 Flash Lite (1000K context): Cost-sensitive workloads, simple Q&A, and batch processing - `google/gemini-3.1-pro-preview` — Gemini 3.1 Pro (1000K context): Complex reasoning, long-document analysis, and research tasks - `google/gemini-3.1-flash-image-preview` — Gemini 3.1 Flash Image (1000K context): Image generation, visual analysis, and multimodal creative workflows ### MiniMax - `minimax/minimax-m3` — MiniMax M3 (1000K context): Long-horizon agentic workflows, multimodal tasks, and multilingual coding - `minimax/minimax-m2.7` — MiniMax M2.7 (1000K context): Multilingual applications, content generation, and general-purpose chat ### Moonshot AI - `moonshotai/kimi-k2.6` — Kimi K2.6 (1000K context): Long-document processing, bilingual workflows, and research assistance ### OpenAI - `openai/gpt-5.5-pro` — GPT-5.5 Pro (1000K context): Maximum-quality outputs, frontier research, and mission-critical agent tasks - `openai/gpt-5.5` — GPT-5.5 (1000K context): General-purpose agent tasks, balanced performance and cost - `openai/gpt-5.4-image-2` — GPT-5.4 Image 2 (1000K context): Image generation, visual design tasks, and multimodal creative projects ### Qwen - `qwen/qwen3.7-max` — Qwen 3.7 Max (256K context): High-quality bilingual reasoning, long documents, and enterprise Dashscope workflows - `qwen/qwen3.6-max-preview` — Qwen 3.6 Max (1000K context): Bilingual coding, enterprise applications, and complex reasoning tasks - `qwen/qwen3.6-flash` — Qwen 3.6 Flash (1000K context): Everyday coding, quick Q&A, and cost-effective agent pipelines ### xAI - `x-ai/grok-4.3` — Grok 4.3 (1000K context): Real-time research, creative brainstorming, and unconventional problem-solving ### Z.ai - `z-ai/glm-5.2` — GLM 5.2 (1000K context): Chinese-language applications, general chat, and content generation - `z-ai/glm-5.1` — GLM 5.1 (1000K context): Chinese-language applications, general chat, and content generation ## Important links - [Quickstart](https://synrouter.ai/docs/quickstart) - [Agent clients guide](https://synrouter.ai/docs/agent-clients) - [Models](https://synrouter.ai/docs/models) - [API reference](https://synrouter.ai/docs/api-reference) - [Pricing & usage](https://synrouter.ai/docs/pricing-and-usage) - [Blog](https://synrouter.ai/blog) - [Full spec for AI search engines](https://synrouter.ai/llms-full.txt) ## Notes for agents - Prefer `/api/anthropic` for Anthropic-native clients to get full `cache_control` benefit. - Coding-mode sessions are scoped by a session fingerprint; keep the same system prompt + tool list stable across turns to maximize cache hits. - New models are added to the registry above automatically; no need to hardcode model IDs in client code.