Run AnyModel with GPT, Gemini, DeepSeek, Llama, Qwen, or any of 300+ models via OpenRouter. Run fully offline with Ollama. You bring your own key — free models cost $0.
No cloning repos. No building from source. No dependency installs.npx anymodel — that's it.
OpenRouter (300+ cloud models), Ollama (local/offline), or any OpenAI-compatible API. Switch with a flag.
Nothing stored server-side. Your API key goes directly to the provider. Open source — verify it yourself.
Pure Node.js built-ins. No node_modules, no supply chain risk, no bloat. ~8KB published.
Skills, MCP servers, hooks, slash commands — the entire Claude Code ecosystem works out of the box. No compromises.
Get a free OpenRouter API key (no credit card for free models), then:
The model is set on the proxy. AnyModel just connects to it. Works offline with Ollama too.
Use a short name — the proxy resolves the full OpenRouter model ID automatically.
Or use any of 300+ models: npx anymodel proxy --model mistralai/codestral-2508
Popular picks below. AnyModel works with every model on OpenRouter — switch with --model.
Claude Opus 4.6, Sonnet 4.6, Haiku 4.5. Anthropic's flagship models — accessible through OpenRouter at a fraction of the cost.
anthropic/claude-opus-4.6
Gemini 3.1 Pro & Flash Lite. Enhanced software engineering, 1M context. Best price-to-performance for coding.
google/gemini-3.1-flash-lite-preview
GPT-5.4 and Codex 5.3. Latest flagship with 1M context. Industry-leading reasoning with broad tool support.
openai/gpt-5.4
Llama 4 Maverick, Llama 3.3, CodeLlama. Run via OpenRouter or locally with Ollama. Fully open weights.
meta-llama/llama-4-maverick
Chain-of-thought reasoning model. Shows its thinking process step by step. Exceptional at complex code analysis and multi-step problem solving.
deepseek/deepseek-r1
NVIDIA's reasoning model with chain-of-thought. Built for complex technical tasks — code generation, architecture decisions, and multi-step debugging.
nvidia/llama-3.1-nemotron-70b-instruct
Devstral 2 (256K, agentic coding), Codestral 2508 (fast code), Devstral Small (budget). Europe's leading AI for code.
mistralai/devstral-2512
Gemma 4 31B by Google DeepMind. Dense multimodal with 256K context, reasoning mode, native function calling.
google/gemma-4-31b-it
Run any GGUF model locally. Zero cloud dependency. Fully private. No API key required.
ollama --model gemma3n
Qwen, Cohere, Phi, Yi, StableLM, Nous, WizardLM, and every model on OpenRouter.
Browse all models →AnyModel works with any model on OpenRouter — not just the ones listed above. If OpenRouter supports it, AnyModel routes it.
Built for reliability. Handles the translation so your tools just work.
npx anymodel runs instantly. No clone, no build, no global install needed.
Pure Node.js for local mode. No bloat, no supply chain risk, no node_modules.
29 free models via OpenRouter. $0 cost. Use --free-only to restrict.
Exponential backoff, 3 attempts. Handles 429 and 5xx errors gracefully.
Secure your proxy with --token. Protect shared deployments from unauthorized use.
60 req/min default, configurable with --rpm. Protects shared deployments.
Complete reference for installation, providers, configuration, and API.
anymodel [command] [options]
Commands:
(none) Connect to running proxy
proxy <preset> Start proxy with a preset
proxy --model <id> Start proxy with any model
proxy ollama --model X Proxy with local Ollama
claude Run with native Claude (no proxy)
Presets: (use with proxy)
gpt openai/gpt-5.4
codex openai/gpt-5.3-codex
gemini google/gemini-3.1-flash-lite-preview
deepseek deepseek/deepseek-r1-0528
mistral mistralai/devstral-2512
gemma google/gemma-4-31b-it
qwen qwen/qwen3-coder:free
nemotron nvidia/nemotron-3-super-120b-a12b:free
llama meta-llama/llama-3.3-70b-instruct:free
Options:
--model, -m Model ID
--port, -p Port (default: 9090)
--free-only Only allow free models
--help, -h Show help
OPENROUTER_API_KEY
Your OpenRouter API key
OPENROUTER_MODEL
Default model override
PROXY_PORT
Proxy listen port (default: 9090)
AnyModel auto-loads .env from the current directory.
GET /health
Response:
{
"status": "ok",
"version": "1.6.12",
"provider": "openrouter",
"model": "deepseek/deepseek-r1-0528",
"uptime": 3600.5,
"timestamp": "2026-04-02T10:30:00Z"
}
/v1/messages
→ Routed to your chosen provider
/v1/*
→ Passed through to the model provider
/health
→ Returns proxy status JSON
The proxy sanitizes request bodies: strips betas, metadata, thinking, cache_control, and normalizes tool_choice for cross-provider compatibility.
Copy-paste examples for common scenarios.
# Terminal 1 — proxy with preset:
OPENROUTER_API_KEY=sk-or-v1-... \
npx anymodel proxy deepseek
# Terminal 2 — connect:
npx anymodel
# Terminal 1 — any OpenRouter model:
OPENROUTER_API_KEY=sk-or-v1-... \
npx anymodel proxy \
--model mistralai/codestral-2508
# Terminal 2 — connect:
npx anymodel
# Pull a model (once):
ollama pull gemma3n
# Terminal 1 — proxy:
npx anymodel proxy ollama \
--model gemma3n
# Terminal 2 — connect:
npx anymodel
sk-or-v1-)# Add to .env file
OPENROUTER_API_KEY=sk-or-v1-your-key-here
Free tier: no credit card needed. 29 free models at $0. Add credit ($5+) for paid models and higher rate limits.
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download from ollama.com
ollama pull qwen3-coder:30b
Other options: llama3, deepseek-r1:32b, qwen3.5
# Terminal 1 — start proxy:
npx anymodel proxy ollama --model qwen3-coder:30b
# Terminal 2 — use it:
npx anymodel qwen
Fully offline. Nothing leaves your machine. Requires 16GB+ RAM for 30B models, 8GB for smaller ones.
AnyModel itself is free and open source (MIT). OpenRouter offers free models (marked FREE in the preset table) at $0 cost — no credit card needed. Paid models like GPT-5.4 and Gemini 3.1 cost per-token through your own OpenRouter account.
Yes. Your OpenRouter key stays on your machine. The proxy runs locally — nothing is stored, logged, or sent to AnyModel servers. The code is open source so you can verify this yourself.
Best paid: openai/gpt-5.3-codex — OpenAI's frontier coding model. Use preset codex.
Best free: qwen/qwen3-coder:free — 480B MoE, excellent at code. Use preset qwen.
Best reasoning: deepseek/deepseek-r1-0528 — chain-of-thought. Use preset deepseek.
Best local: google/gemma-4-31b-it — 256K context, runs via Ollama.
Presets are short names for popular models. Instead of typing --model deepseek/deepseek-r1-0528, just use npx anymodel proxy deepseek. See the preset table above for the full list.
Yes. Start each proxy on a different port: npx anymodel proxy --port 9090 --model openai/gpt-5.4 and npx anymodel proxy --port 9091 --model deepseek/deepseek-r1-0528. Then connect to either with npx anymodel --port 9090.
Yes. Use Ollama for local models: ollama pull gemma3n, then npx anymodel proxy ollama --model gemma3n. No internet, no API key — everything stays on your machine.
npx anymodel works without installing — npm downloads it on the fly. Or install globally: npm i -g anymodel. Both work the same way.
AnyModel strips Anthropic-specific fields (cache_control, betas, thinking), normalizes tool_choice, handles retries with exponential backoff, and forwards the cleaned request to OpenRouter or Ollama. The response streams back unchanged.