Cheap-model routing - Claude Code & AI Coding

If subtopic 1.1 was should you be using Claude Code?, subtopic 1.2 is the follow-up every paying user hits the moment the first invoice arrives: can you keep using Claude Code without paying Opus prices for every keystroke? The channel's answer is yes — but only if you understand three things: what the token-plan math actually buys you, how to plug a non-Anthropic model into Claude Code without breaking the config file, and which model to swap to in the first place. Three videos cover the territory. The first is a step-by-step install guide; the second explains why the same logic works inside OpenClaw; the third runs an open-source benchmark to pick the model itself.

This article walks through all three, then zooms out to the routing rules the channel's coverage implies.

What you'll learn

The Minimax token plan gives you 4,500 model requests every 5 hours on Plus (15,000 on Max) on a rolling window, which is effectively unlimited for overnight Claude Code runs on a VPS — but only if you point at api.minimax.io, never the China endpoint.
You route a third-party model into Claude Code with three env vars in settings.json (ANTHROPIC_AUTH_TOKEN, the token-plan API key, and the base URL); the swap also works for Kilo Code, Open Claude, and Grok CLI.
Minimax 2.7 is "near Opus, not Opus" — keep Sonnet or Opus reserved for the final review pass on security-sensitive or money-handling code, and always diff the overnight build before merging.
A 300-line soul.md puts Minimax into a "dumb zone" where it starts messaging the wrong target; compressing the soul to 15–30 lines and trimming agents.md brought it back. The fix for a polluted agent directory is a full SSH reinstall, not in-place repair.
WildClaw is the channel's open-source Dockerized OpenClaw benchmark; Claude Opus tops the leaderboard at 51% but costs $80/run, while GPT-5.4 runs a quarter the cost and Grok completes the suite in 94 minutes vs ~500.

Claude Code + Minimax 2.7: the step-by-step

This is the canonical "how do I actually do this" video for the course. The pitch is simple: take Claude Code (the local Node CLI from subtopic 1.1) and point it at a Minimax token plan instead of Anthropic's API. You keep Claude Code's interface, the repo-awareness, and the headless overnight workflow; you swap who you pay.

What the token plan actually gives you

Two tiers are covered in the video:

Plus plan — 4,500 model requests every 5 hours
Max plan — 15,000 model requests every 5 hours

The 5-hour window is a rolling counter, not a daily reset, which is what makes the workflow viable. The creator frames it as effectively unlimited for overnight builds: "just run the tap, let it flow, let it develop, let it work." For the boxminingai.com site rebuild, the creator scheduled the work on a VPS that "runs 24 hours for me" and woke up to a finished build, calling this the normal workflow, not a stretch goal.

The config change

Three env vars in Claude Code's settings.json do the swap:

Your Minimax token-plan API key — specifically the one that "resets," not a pay-as-you-go key. The reset behaviour is what keeps overnight runs from triggering overage charges.
ANTHROPIC_AUTH_TOKEN — the same key as the bearer token.
The base URL: https://api.minimax.io/anthropic.

The international portal (minimax.com) is the slow China endpoint and will degrade your experience if you're outside China. The creator flagged this as a recurring gotcha.

The mechanical gotcha: if your settings.json is non-empty, paste only the inner JSON object from the snippet and add a trailing comma. Pasting a fresh top-level object into a file that already has entries breaks the JSON parser and Claude Code will fail to start.

Cost reality

On the token plan you pay a fixed subscription, not per-token Anthropic pricing, so heavy overnight sessions don't rack up extra bills. Off-plan, Minimax 2.7 is still cheaper than Claude Sonnet/Opus, and the creator claims output is "near Opus level intelligence for programming." That's the working hypothesis of the whole article — but it isn't a guarantee, which is why the creator keeps Sonnet or Opus in reserve for the final review pass on security-sensitive or money-handling code.

What transfers to other clients

The same env-var swap works for Kilo Code, Open Claude, Grok CLI, and other Anthropic-compatible clients. You don't have to reinstall Claude Code just to test whether the model is the right pick — point any of them at the same base URL and token.

Is Minimax the best model for OpenClaw? (cross-listed)

This video is cross-listed under OpenClaw, but the routing logic is identical: you're paying for a model backend, not for the client. The relevant details here are the failure modes of running a cheap model on a long-running agent, which the previous video glosses over.

The core trade-off

Opus starts at $5 per million tokens
Minimax sits at roughly $0.30 per million tokens — about 1/16th the price
The Minimax coding plan goes for under $10 (starter tier) and gives you 100 prompts every 5 hours — a fixed package, not pay-per-use
The creator's own Opus run burned $30 in a single hour, which is the exact problem this pricing structure solves

The "dumb zone" failure mode

This is the most important operational detail in the video. Once the soul.md file swells, the model degrades sharply. The creator's original soul was 300 lines. At that size, the agent "starts messaging your girlfriend instead of building a presentation." Compressing soul.md to 15–30 lines and trimming agents.md (the bootstrap file) brought it back into the "smart zone."

This is a model-side failure, not an OpenClaw failure — the same compression rule applies when Minimax is routing through Claude Code.

M2.1 and M2.5 both failed out of the box

The creator ran M2.1 first, "didn't do so well," upgraded to M2.5, "still didn't do so well." The actual fix wasn't a model upgrade — it was a full reinstall from scratch (requires SSH access to the server). Random files scattered across directories were "muddying up" the context window, confirmed independently by Cursor and an orchestrator agent.

The lesson: if your agent directory is polluted, do a full reinstall from scratch via SSH. Don't try to repair it in place.

The Claude Code hack, restated

You can route Minimax's coding plan through Claude Code by swapping the Anthropic base URL to the Minimax Anthropic endpoint and using the coding-plan key as the token. You get Claude Code's interface at Minimax's pricing. This is the same swap from the first video, framed as the bridge between the two course sections.

For complex multi-step workflows, write the roadmap yourself

For the daily news report, Opus was used to walk through the workflow (research → plan → filter → present). Minimax can't guide that process on its own. The workaround: scrape an existing open-source GitHub repo, set a cron job, and let it run overnight using the 5-hour refresh window. Skip building an aggregator from scratch.

Best Model for OpenClaw: the WildClaw benchmark

The third video answers the question the first two assume: which model should you actually route to? The channel ran an open-source benchmark called WildClaw — a Dockerized OpenClaw suite that runs real agentic tasks (reading emails, launching tasks) instead of pure software-engineering coding tests.

The benchmark setup

The suite is open source — clone it, modify tests, rerun on your own models
It tests agent workflows, not just code generation
The creator's caveat: "the reliability of these tests might not be super good in the future as companies optimize specifically for this benchmark"

That last point matters. WildClaw is now public, so vendor gaming is a real risk. Don't read the score as a long-term signal — treat it as a snapshot.

Top scores and costs

Claude Opus: 51% overall, $80 to run the full suite — "the cost is very very high"
GPT-5.4: close second at roughly a quarter of Opus's cost, faster too
Mimo V2 (Xiaomi): scored high at a $26 run cost; free extended access for ~6 more days via Kilo Code and partner providers at time of filming
Minimax 2.7: used internally on Loki/Gambit agents for two months; real-world drop-off vs Opus is visible but cost is "really really cheap"
Grok: full suite completed in 94 minutes vs ~500 minutes — "almost five times faster"

Why the coding-plan beats a token plan

The creator runs his agents on a Minimax coding plan rather than token plans: "I don't want to fix and play with my open claw all the time." Generous flat-rate limits beat per-token optimization when the model flaps — fewer heartbeats to tune, fewer overage bills to chase.

GLM 5.1 was still running

GLM 5 released two days before the video; the creator was mid-test on GLM 5.1 because "they tune themselves for agentic use case" and claim 90% of Opus. He couldn't finish in time for the video since GLM 5 is "a bit slow." Worth waiting for the next benchmark pass.

Try it yourself

The hands-on goal for this subtopic: get Claude Code running against a non-Anthropic model end-to-end, then run the WildClaw benchmark against a model of your own.

Subscribe to a token plan. Sign up for the Minimax Plus or Max plan and confirm you have the token-plan API key (the one that resets, not a pay-as-you-go key). The Plus tier's 4,500 requests per 5 hours is enough to feel the workflow; the Max tier's 15,000 is what you want for genuine overnight builds.
Locate Claude Code's settings.json. On a fresh install it's empty or has a single object. Back the file up before editing.
Drop in the three env vars. The token-plan API key, ANTHROPIC_AUTH_TOKEN, and the base URL https://api.minimax.io/anthropic. If the file is non-empty, paste only the inner object and add a trailing comma.
Smoke-test with a one-line task. Ask Claude Code to add a comment to an existing file. If the change lands, your config is correct. If Claude Code fails to start, you have a JSON parse error — re-check the trailing comma.
Run a real overnight build on a VPS. Pick a low-stakes feature — a new settings page, a content-inventory script, an integration test suite — and queue it before bed. Review the diff in the morning.
Compress soul.md to 15–30 lines if you're routing through an OpenClaw agent. Trim agents.md to the minimum bootstrap. If the agent starts behaving erratically after months of use, nuke the agent directory and reinstall from scratch via SSH.
Clone WildClaw and run the suite against any model you're considering for production. Don't trust the published leaderboard — run it on your own tasks before committing budget.
Time the build. If a Minimax 2.7 overnight run matches a Sonnet run on the same prompt at one-tenth the cost, you've reproduced the channel's working hypothesis. If it doesn't, escalate to Sonnet for the specific task class and keep Minimax for the rest.

Common pitfalls

Pointing at minimax.com instead of api.minimax.io. The minimax.com endpoint is the slow China base URL. International users must use the international portal. The creator flagged this as a recurring failure mode in support questions.
Pasting a fresh top-level object into a non-empty settings.json. Claude Code will fail to parse the file and refuse to start. Paste only the inner object and check the trailing comma.
Using a pay-as-you-go key instead of the token-plan key. Overnight runs will rack up overage charges. Use the key that resets.
Trusting "near Opus" output without diffing. Minimax 2.7 is near-Opus, not Opus. Always review the diff before merging, especially for security-sensitive or money-handling code.
Letting soul.md bloat past 30 lines. The model enters the "dumb zone" — wrong targets, ignored instructions, off-topic replies. Cap it.
Trying to repair a polluted agent directory in place. If the agent has been running for months and behaviour is degrading, nuke the directory and reinstall from scratch via SSH. Don't try to fix it incrementally.
Reading the WildClaw leaderboard as a long-term signal. The suite is open source, so vendors will start gaming it. Run your own version against your own tasks.
Benching on Minimax 2.7 only. The channel's actual ranking is GPT-5.4 for OpenClaw (cheap, near-Opus), Mimo V2 if you catch the free window, Grok if latency matters, and Opus only if you have an uncapped coding plan.
Optimising tokens on a token plan. Flat-rate limits are the point. If you're tuning heartbeats and trimming prompts to save tokens, you're on the wrong plan.
Letting an ambitious build run live. The channel's overnight pattern exists for a reason: review the diff in the morning, not in real time. If the build is high-stakes, supervise it.

Sources

Claude Code + Minimax 2.7: Unlimited AI Coding on a Budget — 6,532 views · video_id: dURSH_Fwu6s
Is Minimax the Best AI Model for OpenClaw? — 3,219 views · video_id: 258R3kzDRAQ
Best Model for Openclaw (WildClaw Benchmarks!) — 4,574 views · video_id: 31Ij4Cum5tg
Supabase query — SELECT video_id, title, views, summary_content, summary_key_takeaways FROM public.videos WHERE video_id = ANY(ARRAY['dURSH_Fwu6s','258R3kzDRAQ','31Ij4Cum5tg']); against project ttxdssgydwyogq.