AnyModel itself is free and open source (MIT). OpenRouter offers free models at $0 cost — no credit card needed. Paid models like GPT-5.4 and Gemini 3.1 cost per-token through your own OpenRouter account.

Which AI model should I use for coding?

Best paid: openai/gpt-5.3-codex — OpenAI's frontier coding model. Best free: qwen/qwen3-coder:free — 480B MoE, excellent at code. Best reasoning: deepseek/deepseek-r1-0528 — chain-of-thought. Best local: google/gemma-4-31b-it — 256K context, runs via Ollama.

AnyModel — Use Any AI Model with Claude Code, Cursor & CLI Tools

Why AnyModel

The simplest way to use any model.

⚡

One command

No cloning repos. No building from source. No dependency installs.
npx anymodel — that's it.

🌐

Cloud + local, 1 proxy

OpenRouter (300+ cloud models), plus Ollama, LMStudio, and llama.cpp for fully offline runs. Same proxy, switch with a flag.

🔒

Your key, your machine

Nothing stored server-side. Your API key goes directly to the provider. Open source — verify it yourself.

⚙

Zero dependencies

Pure Node.js built-ins. No node_modules, no supply chain risk, no bloat. ~8KB published.

🧩

Full ecosystem

Skills, MCP servers, hooks, slash commands — the entire Claude Code ecosystem works out of the box. No compromises.

How It Works

Two terminals. Any model.

Get a free OpenRouter API key (no credit card for free models), then:

1 Start the proxy with a model:

Terminal 1 — Proxy

$ OPENROUTER_API_KEY=sk-or-v1-... \ npx anymodel proxy deepseek ↔ Proxy on :9090 → OpenRouter Model: deepseek/deepseek-r1

2 Connect:

Terminal 2

$ npx anymodel ✓ Connected to proxy on :9090 AnyModel running (deepseek-r1)

The model is set on the proxy. AnyModel just connects to it. Works offline with Ollama, LMStudio, or llama.cpp too.

AnyModel

→

anymodel proxy

→

OpenRouter / Ollama / LMStudio / llama.cpp

Built-in presets (OpenRouter)

Use a short name — the proxy resolves the full OpenRouter model ID automatically.

Preset

OpenRouter Model ID

gpt

openai/gpt-5.4

PAID

codex

openai/gpt-5.3-codex

PAID CODING

gemini

google/gemini-3.1-flash-lite-preview

PAID

deepseek

deepseek/deepseek-r1-0528

PAID

mistral

mistralai/devstral-2512

PAID CODING

gemma

google/gemma-4-31b-it

PAID CODING

qwen

qwen/qwen3-coder:free

FREE

nemotron

nvidia/nemotron-3-super-120b-a12b:free

FREE

llama

meta-llama/llama-3.3-70b-instruct:free

FREE

Or use any of 300+ models: npx anymodel proxy --model mistralai/codestral-2508

300+ Models

Every model. One proxy.

Popular picks below. AnyModel works with every model on OpenRouter — switch with --model.

Anthropic Claude

Cloud

Claude Opus 4.6, Sonnet 4.6, Haiku 4.5. Anthropic's flagship models — accessible through OpenRouter at a fraction of the cost.

anthropic/claude-opus-4.6

Google Gemini

Cloud

Gemini 3.1 Pro & Flash Lite. Enhanced software engineering, 1M context. Best price-to-performance for coding.

google/gemini-3.1-flash-lite-preview

OpenAI GPT

Cloud

GPT-5.4 and Codex 5.3. Latest flagship with 1M context. Industry-leading reasoning with broad tool support.

openai/gpt-5.4

Meta Llama

Open Weight

Llama 4 Maverick, Llama 3.3, CodeLlama. Run via OpenRouter or locally with Ollama. Fully open weights.

meta-llama/llama-4-maverick

DeepSeek R1

Thinking

Chain-of-thought reasoning model. Shows its thinking process step by step. Exceptional at complex code analysis and multi-step problem solving.

deepseek/deepseek-r1

NVIDIA Nemotron

Thinking

NVIDIA's reasoning model with chain-of-thought. Built for complex technical tasks — code generation, architecture decisions, and multi-step debugging.

nvidia/llama-3.1-nemotron-70b-instruct

Mistral

Cloud

Devstral 2 (256K, agentic coding), Codestral 2508 (fast code), Devstral Small (budget). Europe's leading AI for code.

mistralai/devstral-2512

Google Gemma 4

New

Gemma 4 31B by Google DeepMind. Dense multimodal with 256K context, reasoning mode, native function calling.

google/gemma-4-31b-it

Local Models

Ollama · LMStudio · llama.cpp

Run any GGUF model locally via Ollama (:11434), LMStudio (:1234, GUI model browser), or llama.cpp / llama-server (:8080, lightest footprint). Same file, three backends. Zero cloud dependency. No API key.

proxy ollama | lmstudio | llamacpp

300+

more models

Qwen, Cohere, Phi, Yi, StableLM, Nous, WizardLM, and every model on OpenRouter.

Browse all models →

AnyModel works with any model on OpenRouter — not just the ones listed above. If OpenRouter supports it, AnyModel routes it.

Features

Production-grade proxy.

Built for reliability. Handles the translation so your tools just work.

Zero Install

npx anymodel runs instantly. No clone, no build, no global install needed.

Zero Dependencies

Pure Node.js for local mode. No bloat, no supply chain risk, no node_modules.

Free Models

29 free models via OpenRouter. $0 cost. Use --free-only to restrict.

Smart Retries

Exponential backoff, 3 attempts. Handles 429 and 5xx errors gracefully.

Token Auth

Secure your proxy with --token. Protect shared deployments from unauthorized use.

Rate Limiting

60 req/min default, configurable with --rpm. Protects shared deployments.

Local-First Backends

First-class support for Ollama, LMStudio, and llama.cpp. Prompt condensing, tool trimming, prefix-aware KV caching, and think:false for Ollama. Same GGUF model runs across all three.

Universal Skills

Bring your Claude, Codex, or Gemini skills. AnyModel auto-discovers SKILL.md under .claude/skills, .agents/skills, .codex/skills, and more — project and $HOME — and loads them with zero format translation.

Documentation

Everything you need to know.

Complete reference for installation, providers, configuration, and API.

CLI Reference

anymodel [command] [options]

Commands:
  (none)                   Connect to running proxy
  proxy <preset>           Start proxy with a preset
  proxy --model <id>       Start proxy with any model
  proxy ollama --model X   Local via Ollama (:11434)
  proxy lmstudio --model X Local via LMStudio (:1234/v1)
  proxy llamacpp --model X Local via llama-server (:8080/v1)
  claude                   Run with native Claude (no proxy)

Presets:  (use with proxy)
  gpt      openai/gpt-5.4
  codex    openai/gpt-5.3-codex
  gemini   google/gemini-3.1-flash-lite-preview
  deepseek deepseek/deepseek-r1-0528
  mistral  mistralai/devstral-2512
  gemma    google/gemma-4-31b-it
  qwen     qwen/qwen3-coder:free
  nemotron nvidia/nemotron-3-super-120b-a12b:free
  llama    meta-llama/llama-3.3-70b-instruct:free

Options:
  --model, -m     Model ID
  --port, -p      Port (default: 9090)
  --free-only     Only allow free models
  --help, -h      Show help

Environment Variables

OPENROUTER_API_KEY Your OpenRouter API key

OPENROUTER_MODEL Default model override

PROXY_PORT Proxy listen port (default: 9090)

OLLAMA_NUM_CTX Context window size (default: 8192)

OLLAMA_KEEP_ALIVE GPU model retention (default: 30m)

OLLAMA_MAX_TOOLS Max tools to send (default: unlimited)

OLLAMA_MAX_TOOL_DESC Max tool description chars (default: 100)

OLLAMA_MAX_SYSTEM_CHARS System prompt condensing threshold (default: 4000)

LMSTUDIO_BASE_URL LMStudio endpoint (default: http://localhost:1234/v1)

LLAMACPP_BASE_URL llama-server endpoint (default: http://localhost:8080/v1)

AnyModel auto-loads .env from the current directory. OLLAMA_* vars only apply to the Ollama provider. LMSTUDIO_* and LLAMACPP_* override the default local endpoints.

Health Endpoint

GET /health

Response:
{
  "status": "ok",
  "version": "1.6.12",
  "provider": "openrouter",
  "model": "deepseek/deepseek-r1-0528",
  "uptime": 3600.5,
  "timestamp": "2026-04-02T10:30:00Z"
}

Request Routing

/v1/messages → Routed to your chosen provider

/v1/* → Passed through to the model provider

/health → Returns proxy status JSON

The proxy sanitizes request bodies per provider: strips betas, metadata, thinking, and normalizes tool_choice. OpenRouter preserves cache_control for prompt caching. Ollama gets additional optimizations: prompt condensing, tool trimming, prefix-aware caching, and think:false to suppress reasoning-token waste. LMStudio and llama.cpp route through the OpenAI-compatible path.

Examples

Real-world usage patterns.

Copy-paste examples for common scenarios.

Preset

DeepSeek R1

# Terminal 1 — proxy with preset:
OPENROUTER_API_KEY=sk-or-v1-... \
  npx anymodel proxy deepseek

# Terminal 2 — connect:
npx anymodel

Custom

Any of 300+ models

# Terminal 1 — any OpenRouter model:
OPENROUTER_API_KEY=sk-or-v1-... \
  npx anymodel proxy \
  --model mistralai/codestral-2508

# Terminal 2 — connect:
npx anymodel

Local

Fully offline with Ollama

# Pull a model (once):
ollama pull gemma3n

# Terminal 1 — proxy:
npx anymodel proxy ollama \
  --model gemma3n

# Terminal 2 — connect:
npx anymodel

Local

LMStudio (GUI model browser)

# Load a model in LMStudio & start its local server
# (runs on :1234/v1 by default)

# Terminal 1 — proxy:
npx anymodel proxy lmstudio \
  --model qwen/qwen3-coder-30b

# Terminal 2 — connect:
npx anymodel

Local

llama.cpp / llama-server

# Start llama-server with a GGUF model:
llama-server -m model.gguf --port 8080

# Terminal 1 — proxy:
npx anymodel proxy llamacpp \
  --model my-model

# Terminal 2 — connect:
npx anymodel

Setup Guides

Get running in 2 minutes.

Get Your OpenRouter Key

Go to openrouter.ai/keys and sign up (Google/GitHub login works)

Click "Create Key" — give it any name

Copy the key (starts with sk-or-v1-)

Save it in your project:

# Add to .env file
OPENROUTER_API_KEY=sk-or-v1-your-key-here

Free tier: no credit card needed. 29 free models at $0. Add credit ($5+) for paid models and higher rate limits.

Set Up Ollama (Local Models)

Install Ollama:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download from ollama.com

Pull a coding model:

ollama pull qwen3-coder:30b

Other options: llama3, deepseek-r1:32b, qwen3.5

Start the proxy, then connect:

# Terminal 1 — start proxy:
npx anymodel proxy ollama --model qwen3-coder:30b

# Terminal 2 — use it:
npx anymodel qwen

Fully offline. Nothing leaves your machine. Requires 16GB+ RAM for 30B models, 8GB for smaller ones.

Set Up LMStudio (GUI Model Browser)

Download LMStudio for macOS, Windows, or Linux (free, GUI).

Inside LMStudio, browse and download a coding model (e.g., qwen/qwen3-coder-30b, deepseek-coder-v2). Click Start Server to expose it on :1234/v1.

Start the proxy, then connect:

# Terminal 1 — start proxy:
npx anymodel proxy lmstudio --model qwen/qwen3-coder-30b

# Terminal 2 — use it:
npx anymodel

LMStudio's GUI makes browsing, downloading, and quantizing GGUF models effortless. OpenAI-compatible API under the hood — same GGUF file runs on Ollama or llama.cpp too.

llama.cpp / llama-server

Install llama.cpp:

# macOS
brew install llama.cpp

# Build from source
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build --config Release

Download a GGUF model from HuggingFace

Look for Q4_K_M or Q5_K_S quantizations. Example: Qwen3-Coder-30B-Q4_K_M.gguf

Start llama-server:

llama-server -m your-model.gguf --port 8080

Start the proxy, then connect:

# Terminal 1 — start proxy:
npx anymodel proxy llamacpp --model my-model

# Terminal 2 — use it:
npx anymodel

Lightest footprint — llama.cpp is the inference engine under Ollama and LMStudio. Use it directly to tune context size, GPU layers, batch size, and quantization. No API key needed.

FAQ

Common questions.

Is AnyModel free?

AnyModel itself is free and open source (MIT). OpenRouter offers free models (marked FREE in the preset table) at $0 cost — no credit card needed. Paid models like GPT-5.4 and Gemini 3.1 cost per-token through your own OpenRouter account.

Is my API key safe?

Yes. Your OpenRouter key stays on your machine. The proxy runs locally — nothing is stored, logged, or sent to AnyModel servers. The code is open source so you can verify this yourself.

Which model should I use for coding?

Best paid: openai/gpt-5.3-codex — OpenAI's frontier coding model. Use preset codex.
Best free: qwen/qwen3-coder:free — 480B MoE, excellent at code. Use preset qwen.
Best reasoning: deepseek/deepseek-r1-0528 — chain-of-thought. Use preset deepseek.
Best local: google/gemma-4-31b-it — 256K context, runs via Ollama.

What are presets?

Presets are short names for popular models. Instead of typing --model deepseek/deepseek-r1-0528, just use npx anymodel proxy deepseek. See the preset table above for the full list.

Can I run multiple models at once?

Yes. Start each proxy on a different port: npx anymodel proxy --port 9090 --model openai/gpt-5.4 and npx anymodel proxy --port 9091 --model deepseek/deepseek-r1-0528. Then connect to either with npx anymodel --port 9090.

Can I use it fully offline?

Yes. Three first-class local backends:
Ollama (easiest) — ollama pull gemma3n, then npx anymodel proxy ollama --model gemma3n.
LMStudio (GUI model browser) — load a model in the LMStudio app, then npx anymodel proxy lmstudio --model qwen/qwen3-coder-30b.
llama.cpp / llama-server (lightest, maximum control) — llama-server -m model.gguf --port 8080, then npx anymodel proxy llamacpp --model my-model.
The same GGUF file runs across all three. No internet, no API key — everything stays on your machine.

Do I need to install anything?

npx anymodel works without installing — npm downloads it on the fly. Or install globally: npm i -g anymodel. Both work the same way.

What does the proxy actually do?

AnyModel translates Anthropic Messages API requests for each provider. For OpenRouter, it preserves cache_control for prompt caching and strips unsupported fields (betas, thinking). For Ollama, it uses the native API with think:false to suppress reasoning-token waste, condenses large system prompts, trims tool descriptions, limits tool count, and stabilizes prefix ordering for KV cache reuse. LMStudio (:1234/v1) and llama.cpp / llama-server (:8080/v1) route through the OpenAI-compatible path. All providers get retries with exponential backoff and tool_choice normalization.

AnyModelUse Any AI Model with One Proxy

The simplest way to use any model.

One command

Cloud + local, 1 proxy

Your key, your machine

Zero dependencies

Full ecosystem

Two terminals. Any model.

Built-in presets (OpenRouter)

Every model. One proxy.

Anthropic Claude

Google Gemini

OpenAI GPT

Meta Llama

DeepSeek R1

NVIDIA Nemotron

Mistral

Google Gemma 4

Local Models

more models

Production-grade proxy.

Zero Install

Zero Dependencies

Free Models

Smart Retries

Token Auth

Rate Limiting

Local-First Backends

Universal Skills

Everything you need to know.

CLI Reference

Environment Variables

Health Endpoint

Request Routing

Real-world usage patterns.

DeepSeek R1

Any of 300+ models

Fully offline with Ollama

LMStudio (GUI model browser)

llama.cpp / llama-server

Get running in 2 minutes.

Get Your OpenRouter Key

Set Up Ollama (Local Models)

Set Up LMStudio (GUI Model Browser)

llama.cpp / llama-server

Common questions.

AnyModel
Use Any AI Model with One Proxy