Why Minimax is the channel's cheap default - Minimax: The Cheap Executor

This is the framing article. The pitch in one sentence: Opus costs $5 per million input tokens; Minimax costs roughly $0.30 per million input tokens — about 1/16th the price. The pitch in one paragraph: the channel's own Opus run burned $30 in a single hour, and the same workflow on a Minimax coding plan runs at a fixed monthly subscription. That asymmetry is the entire reason the channel routes around Claude for executor work, and it's why Minimax — not GLM, not Kimi, not Qwen — is the model the channel actually runs on production agents.

The article walks through three videos: the M2.7 review (the highest-viewed Minimax video on the channel at 31,049 views), the OpenClaw-specific verdict (which introduces the "dumb zone" failure mode), and the Hermes Agent tier list (which places M2.7 in the executor slot, not the orchestrator slot). The point of all three is the same: Minimax is a slot, not a replacement. The slot is the executor — the hands that call tools reliably, cheaply, and fast. The orchestrator — the brain that plans and reasons across many turns — is still on Opus 4.6, GPT 5.4, or Gemini 3.1 Pro for the channel's most ambitious builds. If you try to use Minimax as the planner, the dumb-zone failures from §5.4 show up the moment the model has to do high-level reasoning instead of executing an explicit plan.

What you'll learn

The 1/16th cost ratio: Opus at $5 per million input tokens, Minimax at roughly $0.30 per million — and the channel's own $30/hour Opus burn as the receipt for why the ratio matters.
The "executor, not orchestrator" framing from the Hermes tier list, and why the channel's actual stack is Opus 4.6 (orchestrator) → Minimax 2.7 (executor), not Minimax → Minimax.
The M2.5-to-M2.7 jump as continued post-training, not a new base model — 230B MoE / 10B active per token, the same architecture with explicit "recursive self-improvement" framing.
The M2.7 BFCL floor (76.8% on M2.5) and the agent-behaviour delta — parallel sub-agents, fewer hallucinated slide transitions, a one-shot presentation in 2–3 minutes versus 10–15 on Claude.
The "slot-machine" quality delta the channel actually experiences in production, and the routing rule it implies: use Minimax for iterative coding, multi-file refactors, long agentic loops, and Go/Rust/TypeScript/Java work; use Opus for deep reasoning, terminal ops, and architecture setup.

The 1/16th cost ratio — and the $30/hour burn

The M2.7 review is the channel's most-viewed model review (31,049 views) and the video that put the "you don't need Opus" framing on the page. The architecture call is the headline: M2.7 is not a new base model. It keeps M2.5's structure — 230B total parameters, mixture-of-experts, 10B active per token. The jump comes from continued post-training that Minimax describes as "beginning the journey of recursive self-improvement."

The cost math is the entire pitch. Opus starts at $5 per million input tokens. Minimax sits at roughly $0.30 per million, a 1/16th ratio. The channel's own Opus run burned $30 in a single hour on a multi-agent flow. The same workflow on a Minimax coding plan — under $10 for 100 prompts per 5-hour window — is a fixed package rather than pay-per-use. That asymmetry is the reason the channel routes around Claude for executor work, and it's the reason this course exists separately from Course 2 §2.3: the cost gap is structural, not marginal.

The migration story is short. On the channel's Discord, the seven OpenClaw agents running Minimax were switched to M2.7 by updating the model parameter on the existing API key and endpoint — no new keys, no new billing line for pay-as-you-go or plus-plan users. Run /status in Discord or OpenClaw status in the terminal to confirm the switch actually took; a stale config can keep you on the older build.

The numbers worth internalising

M2.5 already scored 76.8% on BFCL (Berkeley Function-Calling Leaderboard), and 2.7 is positioned to push that further. M2.5 already beat Claude Opus 4.6 and Sonnet 4.6 on multi-SWE-bench and BFCL multi-turn, so treat those as the floor.
A one-shot presentation finished in 2–3 minutes for M2.7 versus 10–15 minutes for Claude on the same task. That delta is the operational case for routing presentation-style jobs to Minimax.
Less overthinking on complex tasks, tighter instruction following, improved tool calling — the three areas Minimax officially targeted for 2.7 (per their public framing, repeated in the video).
The $40/month wall. The high-speed variant of M2.7 requires that tier. The standard M2.7 is cheap enough on the plus plan to do most agent work — don't pay the high-speed premium for ordinary overnight builds.

Two caveats. M2.7 is explicitly an agentic coding model, not a general chatbot, so keep Claude or GPT for raw knowledge Q&A. And M2.7 is a strong executor, weak orchestrator — a distinction the channel returns to in every Minimax video, and the distinction the Hermes tier list hard-codes into the routing rule.

The agent-behaviour delta

The channel ran a one-shot presentation test that made the M2.5-to-M2.7 delta visible. M2.7's agent (named Gambit in the channel's setup) spawned parallel sub-agents for research, presentation, and self-audit. M2.5's agent did everything solo and produced a slide that flashed OpenClaw 3.7 on every transition — the kind of artefact-hallucination that happens when a single context window is asked to plan, research, draft, and self-audit in one pass.

The pattern matters because the channel's actual production stack is Opus 4.6 (orchestrator) → Minimax 2.7 (executor), not Minimax → Minimax. The orchestrator slot uses Opus 4.6 for deep reasoning and autonomous terminal work; the executor slot uses Minimax 2.7 for research, presentation, and code generation. The plan is generated by Opus 4.6, GPT 5.4, or Notebook LM ("really good for planning compared to building from scratch with Claude code" per the video); the implementation is handed to M2.7. If you try to use M2.7 as the planner, you get the dumb-zone failures from §5.4 the moment the model has to do high-level reasoning instead of executing an explicit plan.

The OpenClaw verdict — same ratio, same routing

The OpenClaw-specific video is shorter but the routing logic is identical: you're paying for a model backend, not for a client. The cost framing is restated with the channel's own burn number for contrast — the creator's Opus run burned $30 in a single hour, and the MiniMax coding plan starter tier is under $10 for 100 prompts per 5-hour window.

The relevant details here are the failure modes of running a cheap model on a long-running agent, which the M2.7 review glosses over. The dumb-zone introduction is the most important: the model enters a "dumb zone" once soul.md swells. The original soul was 300 lines. At that size, the agent "starts messaging your girlfriend instead of building a presentation." Compressing soul.md to 15–30 lines and trimming agents.md (the bootstrap file) brought it back into the "smart zone." That's a model-side failure, not an OpenClaw failure — the same compression rule applies when Minimax is routing through Claude Code.

M2.1 and M2.5 both failed out of the box for the host. The real fix wasn't a model upgrade — it was a full reinstall from scratch, which requires SSH access to your server. Random files scattered across directories were "muddying up" the context window, confirmed independently by Cursor and an orchestrator agent. The lesson: if your agent directory is polluted, do a full reinstall from scratch via SSH. Don't try to repair it in place.

For complex multi-step workflows, the creator was explicit: write the roadmap yourself. Minimax will not guide you through planning the way Opus does. The workaround for the daily news report was to scrape an existing open-source GitHub repo, set a cron job, and let it run overnight using the 5-hour refresh window. Skip building an aggregator from scratch.

The Hermes tier list — executor slot hard-coded

The Hermes tier list (8,107 views) is the most systematic ranking the channel has published, and the conceptual move worth internalising is the two-slot model:

Orchestrator (the brain): plans multi-step work, holds state across many turns, decides which executor to call and when.
Executor (the hands): reliably calls tools, follows formatting instructions, doesn't get clever.

Minimax M2.7 is a strong executor (not orchestrator) because it was "trained on the OpenClaw Agent Harness framework," the same lineage as Hermes. Xiaomi is an official News Research Team partner. The same slot also includes GLM 5.1 (the standout — one-shotted a space shooter benchmark that Opus 4.7 choked on, and survives Hermes' 85% auto-compaction events with strong context recovery) and DeepSeek 3.2 ("think inside tool calls," eliminating redundant reasoning passes in Hermes' 40+ tool environments and slashing cron job errors). On the orchestrator side, GPT 5.4 is the new king, Gemini 3.1 Pro ties as top orchestrator, Qwen 3.6 Plus earns the third slot for active chain-of-thought, and Kimi 2.5 rounds it out as a swarm-capable model.

The point for this course: the channel's actual production stack is a hybrid, with Minimax in the executor slot and Opus / GPT / Gemini in the orchestrator slot. If you try to use Minimax as the planner, the dumb-zone failures show up. If you use Minimax as the hands, you get the 1/16th cost ratio and the 2–3-minute presentation win.

The M3 follow-up — and the verdict

M3 is the next-generation release — the first open-weight model from this family to ship 1M-token context, agentic coding, and native multimodal training in a single drop. Open weights drop in "the next 10 days" from the time of the launch video, so most of the channel's tests are inside the desktop app rather than a real harness.

The benchmark numbers worth remembering:

SWE Bench Pro: 59% — 0.4 points above GPT 5.5 but roughly 10% behind Claude Opus 4.8's 69.2%. The channel calls SWE Bench Pro "the most reliable indicator of how strong the model is for coding" and explicitly notes it "is not cherry-picked."
OmniDock document understanding: above Gemini 3.1 Pro.
MSA pre-fill: 9.7× faster at 1M tokens versus M2.7.
MSA decode: 15.6× faster at 1M tokens versus M2.7.
API pricing: $0.30 per 1M input tokens, $1.20 per 1M output tokens.

The engine is MiniMax Sparse Attention (MSA), which drops per-token compute to "roughly about 1/20 of what the previous generation needed." The speed wins are measured specifically at the "worst-case long end, exactly where dense models typically crawl," so don't expect a 10× speedup on a 200K chat — use the advantage on tasks that actually need 1M.

The follow-up M3 review (hTkxebQdtH8, 856 views) lands the verdict: M3 is the cheapest daily-driver execution model in the creator's stack, but only after a smarter model has planned the work. The error-rate delta is the headline — M2.7 "made a lot of errors" and required constant babysitting; M3 in a two-week test is "making a lot fewer much fewer errors," and on refactors it "cleaned up a lot of logs, files, refactored code, took up bits" with "much sketch gaslighting" (the creator's phrasing for hallucinated output). M3 "doesn't tend to overthink things anymore" — no visible chain-of-thought sprawl compared to the previous build.

The Chinese-alternatives comparison is a useful reality check. While testing Chinese models after "Fable 5 was banned" (Anthropic's export-controlled release), the creator ran Kimi 2.7 and GLM. GLM's quality was "pretty good," but the cost "went up quite aggressively" and he "burnt through my 5-hour allowance rather quickly" — in one hour he used ~20% of his MiniMax weekly quota on GLM, "which is the wrong way around." If you're moving from Minimax to GLM hoping to save money, you'll likely pay more.

Try it yourself

Subscribe to the Minimax token plan. Start with the Plus tier (4,500 requests / 5-hour rolling window) and confirm you have the token-plan API key — the one that resets — not a pay-as-you-go key. The rolling reset is the contract that makes overnight runs viable.
Verify the model is actually M2.7 (not M2.5). Run /status in Discord or OpenClaw status in the terminal. Don't trust the model parameter alone — a stale config can keep you on the older build.
Set the routing rule explicitly. Opus / GPT / Gemini = orchestrator; Minimax = executor. Don't ask M2.7 to plan a "large function" from scratch. Ask it to execute a plan another model produced. That's the slot where M2.7 is cheapest and most reliable.
Run a one-shot presentation test. Time a 10-slide deck end-to-end. On M2.7 you should see 2–3 minutes; on Claude you should see 10–15. The delta is the operational case for the routing rule.
Audit your Opus burn for a week. If you're paying Opus prices for executor work, you have a budget case for swapping to Minimax. The channel's own burn was $30 in a single hour on a multi-agent run — that's the headline cost the 1/16th ratio solves.
When M3 ships on the Kilo Code token plan, batch a 1M-token task and measure pre-fill / decode latency. Use the 9.7× pre-fill / 15.6× decode advantage over M2.7 on a real long-context workload — the speedup doesn't show up on short chats.
Cap your subs. Don't pay for the high-speed M2.7 variant ($40/month tier) on ordinary overnight builds. The standard M2.7 is enough for most agent work; the high-speed variant is reserved for latency-sensitive interactive use.

Common pitfalls

Using Minimax as the planner. The channel's data is consistent across videos: M2.7 / M3 don't guide multi-step workflows the way Opus does. The dumb zone shows up the moment Minimax has to do high-level reasoning instead of executing an explicit plan. Reserve Opus / GPT / Z for planning; route M2.7 or M3 for execution.
Reading "1/16th the cost" as a flat replacement for Opus. The channel's framing is "near Opus, not Opus" — keep Sonnet or Opus in reserve for the final review pass on security-sensitive or money-handling code. The 1/16th ratio is an executor-slot win, not a free Opus license.
Paying pay-as-you-go pricing for an overnight run. The 5-hour rolling limit on the token plan is the right shape for batched work. Pay-as-you-go on the same model will spike costs without giving you more capability.
Trusting the slot-machine quality as a model bug. The same prompt produces different quality outputs on Minimax — the channel's rule is to run 2-3x and pick the best. That variability is the model, not your prompt. Don't try to fix a slot-machine output with another prompt; run it again.
Trusting M2.7 as a general chatbot. The channel's framing is explicit: M2.7 is an agentic coding model, not a general chatbot. Raw knowledge Q&A is still a Claude / GPT strength. Use Minimax for the executor slot.
Benching on M2.7 only. Treat M2.7 as the executor default; M3 is meaningfully better on long-context work (MSA, 1M context, fewer errors) but Kilo Code's token plan still shows M2.7. Pick by task.
Hitting the 5-hour limit on GLM and assuming Minimax will scale the same way. The creator burned ~20% of his weekly MiniMax quota in a single hour on GLM. The pricing shapes aren't equivalent — the same workflow can cost 5× more on the "comparable" Chinese model.
Trusting the hourly token counter on the Minimax plan. It doesn't tell you exactly how many millions of tokens you used. Log your own usage if precise cost tracking matters.
Letting the high-speed premium leak into ordinary builds. The $40/month high-speed M2.7 tier is reserved for latency-sensitive interactive use. Don't pay the premium for overnight cron work.

Sources

Minimax M2.7 is INSANELY GOOD! (Full Review) — 31,049 views · video_id: --uxieT5J9Y · watch
Is Minimax the Best AI Model for OpenClaw? — 3,219 views · video_id: 258R3kzDRAQ · watch
Top AI Models for Hermes Agent (Tier List) — 8,107 views · video_id: Af7Fg1m7hRw · watch
MiniMax M3 is HERE! (Real Tests and Review) — 4,398 views · video_id: -Qf3bvFTIzY · watch
Minimax M3 Review: Is This Cheap AI Model Actually Worth It? — 856 views · video_id: hTkxebQdtH8 · watch
Supabase query — SELECT video_id, title, views, summary_content, summary_key_takeaways FROM public.videos WHERE video_id = ANY(ARRAY['--uxieT5J9Y','258R3kzDRAQ','Af7Fg1m7hRw','-Qf3bvFTIzY','hTkxebQdtH8']); against project ttxdssgydwyurwwnjogq.