M2.7: the agentic coding model - Minimax: The Cheap Executor

Subtopic 5.1 established why Minimax is the channel's cheap default — the 1/16th cost ratio, the executor-not-orchestrator framing, the slot-machine quality, and the routing rule. Subtopic 5.2 zooms in on the model the channel actually runs on production agents today: M2.7. The architecture call is the headline — M2.7 is not a new base model — and the agentic-coding specialisation is what makes M2.7 cheap-but-usable in a slot where a generic chatbot would not be.

The M2.7 story is also a story about a specific Chinese model family that explicitly trained on the OpenClaw Agent Harness framework, the same lineage as Hermes. That's why M2.7 is a strong executor out of the box: the post-training saw the harness before the harness saw the model. M2.5 already beat Claude Opus 4.6 and Sonnet 4.6 on multi-SWE-bench and BFCL multi-turn — the floor is high. M2.7 is positioned to push the BFCL score (76.8% on M2.5) further with continued post-training Minimax describes as "beginning the journey of recursive self-improvement."

This article walks through the architecture, the agent-behaviour delta, the BFCL floor, the post-training self-auditing pattern, the multi-agent angle, and the operational caveats. The point is to make the routing rule concrete: use Minimax 2.7 for iterative coding, multi-file refactors, long agentic loops, and Go/Rust/TypeScript/Java work; use Opus for deep reasoning, terminal ops, and architecture setup. If you read this and conclude "I should use Minimax for everything," you have missed the executor-not-orchestrator framing from §5.1.

What you'll learn

The M2.7 architecture is the same as M2.5 — 230B total parameters, mixture-of-experts, 10B active per token — with continued post-training targeting real-world engineering, professional office delivery, and "character-rich interaction."
The BFCL floor (76.8% on M2.5, beat Opus 4.6 and Sonnet 4.6 on multi-SWE-bench and BFCL multi-turn) and the agent-behaviour delta — parallel sub-agents, fewer hallucinated slide transitions, a one-shot presentation in 2–3 minutes versus 10–15 on Claude.
The post-training self-auditing pattern: M2.7 "spawns parallel sub-agents to handle research simultaneously, assigns a separate sub-agent to handle the presentation, and then audits its own output" from a single prompt.
The M2.7 multi-agent specialisation — MiniMax M2.7's Best Feature Nobody's Using — and how the executor slot is structured for parallel work.
The migration story: the same API key, the same endpoint, just a model parameter change. No new keys, no new billing line for pay-as-you-go or plus-plan users.
The $40/month high-speed wall, the agentic-coding-not-general-chatbot caveat, and the M2.7 → M3 trade-off on long-context work.

The architecture call — same as M2.5, better post-training

The M2.7 review (31,049 views) is the canonical source. The architecture call is the headline: M2.7 is not a new base model. It keeps M2.5's structure — 230B total parameters, mixture-of-experts, 10B active per token. The jump comes from continued post-training that Minimax describes as "beginning the journey of recursive self-improvement."

That framing matters for the executor slot. The post-training target is the harness, not the benchmark. Three areas Minimax officially targeted for 2.7:

Real-world engineering — day-to-day coding, not just benchmarks.
Professional office delivery — documents, analysis, business tasks.
"Character-rich interaction" — more personality in conversation.

The agent-behaviour delta the channel observed in the one-shot presentation test is the operational case. M2.7's agent (Gambit) spawned parallel sub-agents for research, presentation, and self-audit. M2.5's agent did everything solo and produced a slide that flashed OpenClaw 3.7 on every transition — the kind of artefact-hallucination that happens when a single context window is asked to plan, research, draft, and self-audit in one pass. The M2.7 routing of these sub-tasks to separate workers is what kills the slide-flasher failure.

The numbers worth internalising:

M2.5 already scored 76.8% on BFCL. M2.7 is positioned to push that further. M2.5 already beat Claude Opus 4.6 and Sonnet 4.6 on multi-SWE-bench and BFCL multi-turn, so treat those as the floor.
Less overthinking on complex tasks, tighter instruction following, improved tool calling.
A one-shot presentation finished in 2–3 minutes for M2.7 versus 10–15 minutes for Claude on the same task.

The migration story is short. On the channel's Discord, the seven OpenClaw agents running Minimax were switched to M2.7 by updating the model parameter on the existing API key and endpoint — no new keys, no new billing line. Run /status in Discord or OpenClaw status in the terminal to confirm the switch actually took; a stale config can keep you on the older build.

Two caveats. The high-speed variant of M2.7 requires the $40/month tier — the rest can stay on the plus plan. And M2.7 is explicitly an agentic coding model, not a general chatbot, so keep Claude or GPT for raw knowledge Q&A.

The post-training self-auditing pattern

The post-training self-auditing pattern is the second-load-bearing detail in the M2.7 story. The model itself "spawns parallel sub-agents to handle research simultaneously, assigns a separate sub-agent to handle the presentation, and then audits its own output" from a single prompt. That is the architectural reason M2.7 is a strong executor: the post-training baked the executor pattern into the model.

The implication for your routing: when you ask M2.7 to "build a presentation on X" or "research Y and write a summary," the model is going to fan out the work across parallel sub-agents by default, then audit the result. You don't have to prompt for the task split ("use sub agents", "force the task split") the way you do with Claude Code or older OpenClaw builds — the model does it for you.

The constraint is the same one the channel's coverage of sub-agents in Course 6 §6.4 hard-codes: a sub-agent is a prompted persona on top of the same underlying model, not a separate process. M2.7's parallel sub-agents share the model and the context window. The split is a prompt-structure trick layered on top of one model. If the model's context fills up, the sub-agents fill up with it. The "dumb zone" failure mode from §5.4 applies to M2.7's sub-agents the same way it applies to a single Minimax run.

The multi-agent angle — "M2.7's best feature nobody's using"

The M2.7 multi-agent angle is the same self-auditing pattern, scaled up to a team. M2.7's multi-agent mode structures parallel work into a small set of named roles — researcher, writer, QA — with the model itself coordinating the handoffs. The "best feature nobody's using" framing in the video title is honest: most users ask M2.7 to do everything in one prompt, then complain about the slot-machine quality, when the multi-agent mode is the actual answer for a long-horizon task.

The Hermes tier list (Af7Fg1m7hRw) reinforces the pattern: M2.7 is a strong executor because it was trained on the OpenClaw Agent Harness framework, the same lineage as Hermes. The multi-agent mode is the harness's executor pattern, baked into the model. If you're already running Hermes, M2.7 in the executor slot is the natural pairing.

The OpenClaw verdict — and the migration story

The OpenClaw-specific video is the operational receipts. The cost framing is restated with the channel's own burn number for contrast — the creator's Opus run burned $30 in a single hour, and the MiniMax coding plan starter tier is under $10 for 100 prompts per 5-hour window. The migration is a model-parameter change on the same API key.

The relevant details for the M2.7 architecture:

M2.1 and M2.5 both failed out of the box for the host. The real fix wasn't a model upgrade — it was a full reinstall from scratch (SSH access required). Random files scattered across directories were "muddying up" the context window. The M2.7 migration is a clean re-install with the new model parameter, not a half-step over the M2.5 pollution.
The dumb zone still applies. M2.7 isn't a magic immunity from the soul.md-bloat failure mode. The same 15-30 line cap on soul.md and the same agents.md trim rule from §5.4 hold. Compress and reinstall — don't patch.
The starter coding plan is the right entry point. Under $10 for 100 prompts per 5-hour window. The Plus tier (4,500 requests / 5 hours) is the production tier for overnight cron work. The Max tier (15,000 requests / 5 hours) is for heavy multi-agent fan-out.

For complex multi-step workflows, the creator was explicit: write the roadmap yourself. Minimax will not guide you through planning the way Opus does. The workaround for the daily news report was to scrape an existing open-source GitHub repo, set a cron job, and let it run overnight using the 5-hour refresh window. Skip building an aggregator from scratch.

The M3 trade-off — and when to skip M2.7

The M3 follow-up review (856 views) is the verdict for users deciding between M2.7 and M3. The M3 numbers from §5.1 — SWE Bench Pro 59%, MSA 9.7× pre-fill / 15.6× decode at 1M tokens, $0.30 / $1.20 per million input/output — are the headline. The error-rate delta is the practical case: M2.7 "made a lot of errors" and required constant babysitting; M3 in a two-week test is "making a lot fewer much fewer errors" and on refactors it "cleaned up a lot of logs, files, refactored code, took up bits" with "much sketch gaslighting" (the creator's phrasing for hallucinated output). M3 "doesn't tend to overthink things anymore" — no visible chain-of-thought sprawl.

The trade-off: M2.7 is the model Kilo Code's token plan still shows today. M3 is on Kilo's pay-per-use gateway only, with a 5-hour token reset on MiniMax's side. The "10-day open-weight drop" mentioned in the M3 launch video is the unlock event — when M3 lands on the token plan, Kilo Code and Hermes Agent integrations should add M3 to the default executor slot, and that's when to re-test properly inside a real coding harness.

Until M3 is on the token plan, the routing rule is M2.7 for executor work, M3 if you have a 1M-token long-context task and can stomach pay-per-use. Don't pay-per-use for ordinary overnight cron work — the token plan's 5-hour rolling reset is the contract that makes that shape viable.

Try it yourself

Subscribe to the Minimax Plus or Max tier. Confirm the token-plan key is the rolling key, not a pay-as-you-go key. The reset is the contract.
Edit your model parameter to minimax-m2.7 (or the equivalent in your harness). Run /status to confirm the switch took.
Pick a one-shot presentation task and time it end-to-end. On M2.7 you should see 2–3 minutes; on Claude you should see 10–15. The delta is the operational case.
Try the multi-agent mode. Ask M2.7 to "build a presentation on X" or "research Y and write a summary" and watch the parallel sub-agents fire. Compare to running the same prompt on M2.5 — M2.5 did everything solo, M2.7 fans out by default.
Run a multi-file refactor through M2.7. Pick a Go / Rust / TypeScript / Java project. Time the run. Watch for the slot-machine quality — same prompt, different quality, run 2-3x and pick the best.
Test the 40% context threshold. Run your routine workload for 2-3 days and watch the context meter. The first time it crosses ~40%, check whether tool calls and recall have started to degrade. If they have, the dumb zone is the variable, not the model.
Reserve Opus for the final review pass on security-sensitive or money-handling code. The "near Opus, not Opus" framing is the safe default.
If M3 is on Kilo's pay-per-use gateway, batch a 1M-token task and measure pre-fill / decode latency. Use the 9.7× pre-fill / 15.6× decode advantage over M2.7 on a real long-context workload. The speedup doesn't show up on short chats.

Common pitfalls

Treating M2.7 as a flat Opus replacement. M2.7 is an executor. Don't ask it to plan a "large function" from scratch. Use Opus / GPT / Gemini for the orchestrator slot.
Trusting "near Opus" output without diffing. The "near Opus, not Opus" framing is the safe default. Always review the diff before merging, especially for security-sensitive or money-handling code.
Paying pay-as-you-go for an overnight run. The 5-hour rolling limit on the token plan is the right shape for batched work. Pay-as-you-go will spike costs without giving you more capability.
Paying the high-speed M2.7 premium on ordinary builds. The $40/month tier is reserved for latency-sensitive interactive use. Don't pay the premium for overnight cron work.
Trusting a single multi-agent run from M2.7. The "self-auditing" pattern is a model-level default, but the audit isn't always right. Verify the deliverable against your spec before declaring victory.
Confusing "post-training self-auditing" with "fresh context per sub-agent". The model's sub-agents share the same model and the same context window. If the model's context fills up, the sub-agents fill up with it. The 40% threshold applies to all of them.
Skipping the migration verification. Run /status in Discord or OpenClaw status in the terminal. A stale config can keep you on M2.5 even after you changed the model parameter.
Letting soul.md and agents.md grow past 15-30 lines. The dumb-zone threshold applies to M2.7 the same way it applied to M2.5. Compress and reinstall — don't patch.
Running M2.7 in cron / Kanban without a persistent gateway on VPS. A tmux-wrapped gateway survives logout ~50% of the time. Use systemd.
Benching on M2.7 only. M3 is meaningfully better on long-context work (MSA, 1M context, fewer errors). M2.7 is the model Kilo Code's token plan still shows. Pick by task.
Trusting the slot-machine quality as a model bug. The same prompt produces different quality outputs on Minimax. Run 2-3x and pick the best. Don't try to fix a slot-machine output with another prompt.

Sources

Minimax M2.7 is INSANELY GOOD! (Full Review) — 31,049 views · video_id: --uxieT5J9Y · watch
Is Minimax the Best AI Model for OpenClaw? — 3,219 views · video_id: 258R3kzDRAQ · watch
Why MINIMAX M 2.7 WINS! Parallel Subagents and Self Auditing — 669 views · video_id: ocjfBoM_eTM · watch
MiniMax M2.7's Best Feature Nobody's Using (Multi-Agent Teams) — video_id: Ttb_Tw6-YBA · watch
Top AI Models for Hermes Agent (Tier List) — 8,107 views · video_id: Af7Fg1m7hRw · watch
Minimax M3 Review: Is This Cheap AI Model Actually Worth It? — 856 views · video_id: hTkxebQdtH8 · watch
Supabase query — SELECT video_id, title, views, summary_content, summary_key_takeaways FROM public.videos WHERE video_id = ANY(ARRAY['--uxieT5J9Y','258R3kzDRAQ','ocjfBoM_eTM','Ttb_Tw6-YBA','Af7Fg1m7hRw','hTkxebQdtH8']); against project ttxdssgydwyurwwnjogq.