|
|
@@ -0,0 +1,131 @@ |
|
|
Here's a simple way for Claude Code users to switch from the costly Claude models to the newly released SOTA open-source/weights coding model, Qwen3-Coder, via OpenRouter using LiteLLM on your local machine. |
|
|
|
|
|
This process is quite universal and can be easily adapted to suit your needs. Feel free to explore other models (including local ones) as well as different providers and coding agents. |
|
|
|
|
|
I'm sharing what works for me. This guide is set up so you can just copy and paste the commands into your terminal. |
|
|
|
|
|
1\. Clone the official LiteLLM repo: |
|
|
|
|
|
```sh |
|
|
git clone https://github.com/BerriAI/litellm.git |
|
|
cd litellm |
|
|
``` |
|
|
|
|
|
2\. Create an `.env` file with your OpenRouter API key (make sure to insert your own API key!): |
|
|
|
|
|
```sh |
|
|
cat <<\EOF >.env |
|
|
LITELLM_MASTER_KEY = "sk-1234" |
|
|
|
|
|
# OpenRouter |
|
|
OPENROUTER_API_KEY = "sk-or-v1-…" # 🚩 |
|
|
EOF |
|
|
``` |
|
|
|
|
|
3\. Create a `config.yaml` file that replaces Anthropic models with Qwen3-Coder (with all the recommended parameters): |
|
|
|
|
|
```sh |
|
|
cat <<\EOF >config.yaml |
|
|
model_list: |
|
|
- model_name: "anthropic/*" |
|
|
litellm_params: |
|
|
model: "openrouter/qwen/qwen3-coder" # Qwen/Qwen3-Coder-480B-A35B-Instruct |
|
|
max_tokens: 65536 |
|
|
repetition_penalty: 1.05 |
|
|
temperature: 0.7 |
|
|
top_k: 20 |
|
|
top_p: 0.8 |
|
|
EOF |
|
|
``` |
|
|
|
|
|
4\. Create a `docker-compose.yml` file that loads `config.yaml` (it's easier to just create a finished one with all the required changes than to edit the original file): |
|
|
|
|
|
```sh |
|
|
cat <<\EOF >docker-compose.yml |
|
|
services: |
|
|
litellm: |
|
|
build: |
|
|
context: . |
|
|
args: |
|
|
target: runtime |
|
|
############################################################################ |
|
|
command: |
|
|
- "--config=/app/config.yaml" |
|
|
container_name: litellm |
|
|
hostname: litellm |
|
|
image: ghcr.io/berriai/litellm:main-stable |
|
|
restart: unless-stopped |
|
|
volumes: |
|
|
- ./config.yaml:/app/config.yaml |
|
|
############################################################################ |
|
|
ports: |
|
|
- "4000:4000" # Map the container port to the host, change the host port if necessary |
|
|
environment: |
|
|
DATABASE_URL: "postgresql://llmproxy:dbpassword9090@db:5432/litellm" |
|
|
STORE_MODEL_IN_DB: "True" # allows adding models to proxy via UI |
|
|
env_file: |
|
|
- .env # Load local .env file |
|
|
depends_on: |
|
|
- db # Indicates that this service depends on the 'db' service, ensuring 'db' starts first |
|
|
healthcheck: # Defines the health check configuration for the container |
|
|
test: [ "CMD-SHELL", "wget --no-verbose --tries=1 http://localhost:4000/health/liveliness || exit 1" ] # Command to execute for health check |
|
|
interval: 30s # Perform health check every 30 seconds |
|
|
timeout: 10s # Health check command times out after 10 seconds |
|
|
retries: 3 # Retry up to 3 times if health check fails |
|
|
start_period: 40s # Wait 40 seconds after container start before beginning health checks |
|
|
|
|
|
db: |
|
|
image: postgres:16 |
|
|
restart: always |
|
|
container_name: litellm_db |
|
|
environment: |
|
|
POSTGRES_DB: litellm |
|
|
POSTGRES_USER: llmproxy |
|
|
POSTGRES_PASSWORD: dbpassword9090 |
|
|
ports: |
|
|
- "5432:5432" |
|
|
volumes: |
|
|
- postgres_data:/var/lib/postgresql/data # Persists Postgres data across container restarts |
|
|
healthcheck: |
|
|
test: ["CMD-SHELL", "pg_isready -d litellm -U llmproxy"] |
|
|
interval: 1s |
|
|
timeout: 5s |
|
|
retries: 10 |
|
|
|
|
|
volumes: |
|
|
postgres_data: |
|
|
name: litellm_postgres_data # Named volume for Postgres data persistence |
|
|
EOF |
|
|
``` |
|
|
|
|
|
5\. Build and run LiteLLM (this is important, as some required fixes are not yet in the published image as of 2025-07-23): |
|
|
|
|
|
```sh |
|
|
docker compose up -d --build |
|
|
``` |
|
|
|
|
|
6\. Export environment variables that make Claude Code use Qwen3-Coder via LiteLLM (remember to execute this before starting Claude Code or include it in your shell profile (`.zshrc`, `.bashrc`, etc.) for persistence): |
|
|
|
|
|
```sh |
|
|
export ANTHROPIC_AUTH_TOKEN=sk-1234 |
|
|
export ANTHROPIC_BASE_URL=http://localhost:4000 |
|
|
export ANTHROPIC_MODEL=openrouter/qwen/qwen3-coder |
|
|
export ANTHROPIC_SMALL_FAST_MODEL=openrouter/qwen/qwen3-coder |
|
|
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 # Optional: Disables telemetry, error reporting, and auto-updates |
|
|
``` |
|
|
|
|
|
7\. Start Claude Code and it'll use Qwen3-Coder via OpenRouter instead of the expensive Claude models (you can check with the `/model` command that it's using a custom model): |
|
|
|
|
|
```sh |
|
|
claude |
|
|
``` |
|
|
|
|
|
8\. Optional: Add an alias to your shell profile (`.zshrc`, `.bashrc`, etc.) to make it easier to use (e.g. `qlaude` for "Claude with Qwen"): |
|
|
|
|
|
```sh |
|
|
alias qlaude='ANTHROPIC_AUTH_TOKEN=sk-1234 ANTHROPIC_BASE_URL=http://localhost:4000 ANTHROPIC_MODEL=openrouter/qwen/qwen3-coder ANTHROPIC_SMALL_FAST_MODEL=openrouter/qwen/qwen3-coder claude' |
|
|
``` |
|
|
|
|
|
Have fun and happy coding! |
|
|
|
|
|
PS: There are other ways to do this using dedicated Claude Code proxies, of which there are quite a few on GitHub. Before implementing this with LiteLLM, I reviewed some of them, but they all had issues, such as not handling the recommended inference parameters. I prefer using established projects with a solid track record and a large user base, which is why I chose LiteLLM. Open Source offers many options, so feel free to explore other projects and find what works best for you. |