This is the article every §6 model card leads into. The four Chinese labs the channel covers are not interchangeable. Each has a structural reason it wins on a particular workload, and the routing rule that combines them into a single agent stack has been the same since the channel's Top AI Models for Hermes Agent (Tier List) review: default executor goes to DeepSeek V4 Flash or Kimi 2.7, default orchestrator goes to Qwen 3.6 Plus (with GPT 5.4 / Gemini 3.1 Pro above it), and GLM 5.1 sits in the executor slot for one-shot coding builds at the new $72/mo Pro plan (or $7–10/mo Light plan). The 30%→$72/mo Z.AI price move from §6.4 is the most important update in mid-2026; the KimiClaw "skip the wrapper" rule from §6.1 is the second; the "Qwen 3.6 Plus free on Hermes" framing correction from §6.2 is the third. Together, those three updates reshape the routing table the channel has been publishing.

This article is a routing decision, not a comparison shopping list. For each common agent workload, you'll get the channel's pick among the four labs, the plan tier, the "default tier" foot-gun to flip first, and the failure mode that pushes the work to a non-Chinese fallback. The CrossLab view at the end aggregates every video in §6 into a single Source list.

What you'll learn

  • The four-axis decision tree (speed vs quality, context length, cost-per-task, willingness to follow instructions) applied to the four Chinese labs side by side.
  • The "use the model, not the wrapper" rule for Kimi, the "free period ended" rule for Qwen 3.6 Plus, the "default tier" foot-gun for DeepSeek V4 Flash, and the $30→$72/mo Z.AI price move for GLM 5.1 — and how they reshape the channel's routing table.
  • The orchestrator vs executor split, with the four labs assigned to specific slots per the channel's tier list: Kimi 2.5 / Qwen 3.6 Plus in the orchestrator slot, DeepSeek V4 Flash / GLM 5.1 / Kimi 2.7 in the executor slot.
  • A load-bearing workload-by-workload routing table: refactor, repo map, long-doc summary, one-shot creative, vision, cron, long-horizon autonomy, privacy-sensitive, GPU-bound.
  • The non-Chinese fallbacks: GPT 5.4, Gemini 3.1 Pro, Claude Fable 5 — and when the routing decision should bounce off the Chinese stack entirely.

The four-axis decision tree, applied to the four labs

The channel's model-choice framework from Course 2 §2.1 scores every model on four axes: speed vs quality, context length, cost-per-task, and willingness to follow instructions. Scoring the four Chinese labs on each axis gives you the routing map for the rest of the article.

1. Speed vs quality. GLM 5.1 wins on the 47.9 vs Opus 4.5's 45.3 coding-eval inversion (§6.4) — and loses on the "one-shot complex build" test, where the broken projectile physics is the canary. Qwen 3.7 Max wins on raw speed (8m 53s for the ancient Chinese 3D building, §6.2) but costs $7.50 per 1M output tokens, which forces the decision toward Qwen 3.7 Plus for chat-shaped work. DeepSeek V4 Flash wins on the cost-adjusted quality axis (35% of real tasks at ~4x less than V4 Pro, §6.3) — for "boring but expensive" work, Flash is the right call. Kimi 2.7 is the closest thing to a "boring but cheap" entry in the orchestrator slot, with the ~30% thinking-token cut from 2.6 and the agent-swarm fallback for the multi-step research that Flash loses on.

2. Context length. DeepSeek V4 Flash wins with a 1M-token context window and hybrid attention (§6.3) — the cache trick that makes "free" actually free. Qwen 3.7 Max also runs long-horizon autonomy and the channel's "agent first design" framing puts it on hour-long sessions (§6.2). Kimi 2.7 and GLM 5.1 don't make the 1M headline; their context windows are competitive but not the load-bearing feature. The decision rule: if your workload is a 1M-token repo map, V4 Flash is the only one of the four that hits it cleanly with the cache trick.

3. Cost-per-task. This is the axis where the §6.4 Z.AI price move matters most. GLM 5.1's $30→$72/mo Pro plan change moves it from "cheapest credible one-shot coder" to "priced above GPT-5.4." The Light plan at $7–10/mo is the budget tier; Pro at $72/mo is the catch-up-after-Opus-failed tier. Qwen 3.7 Max is $7.50/M output tokens, second only to Opus. Qwen 3.7 Plus and Qwen 3.6 Plus on Alibaba Model Studio are roughly $0.40 input / $1.60 output per million tokens (per the current Qwen API platform rate from §6.2's audience fact-check). DeepSeek V4 Flash is free on Nous Portal with the cache mechanic; Kimi 2.5 is $0.50 input / $2 output per million tokens on OpenRouter; Kimi 2.7 isn't priced separately because it's an efficiency drop, not a new model tier.

4. Willingness to follow instructions. Qwen 3.6 Plus wins on the "preserved thinking across turns" framing (§6.2) — the always-on reasoning trace reduces contradictions on long-horizon tasks. Kimi 2.7's short-prompt rule (§6.1) is a different shape of "follow instructions": over-specifying actively hurts output, so the model expects minimal briefs and plans from there. GLM 5.1's "thinking inside tool calls" pattern (§6.4) is the structural reason it self-corrects mid-execution — a kind of willingness to follow the tool-result feedback, not just the prompt. DeepSeek V4 Flash is in the "you have to set the TUI reasoning to xhigh" camp (§6.3) — it will follow instructions, but only if the platform defaults aren't benchmarking it below its real level.

The orchestrator vs executor split

The channel's Top AI Models for Hermes Agent (Tier List) splits models into two roles: orchestrator (the brain that plans and reasons across many turns) and executor (the hands that call tools reliably, cheaply, and fast). Each of the four Chinese labs gets a slot:

  • Kimi 2.5 — orchestrator-capable, specifically because the agent swarm can "self-direct a swarm of like about 100 sub agents" and coordinate up to 1,500 tool calls without a predefined workflow. The §6.1 rule: use Kimi in the orchestrator slot with /swarm enabled, not as a hosted wrapper.
  • Qwen 3.6 Plus — orchestrator, with the "always-on reasoning trace, preserved thinking across sessions" framing that the public.ai_models row grounds. The §6.2 rule: the "free on Hermes" framing is retired, but the orchestrator slot is intact. Plan around the Qwen API platform rate (~$0.40 / $1.60 per million tokens) for the 3.6 Plus workload.
  • DeepSeek V4 Flash — executor, the highest-consumed token on Hermes Agent this month. The §6.3 rule: set the TUI reasoning to xhigh and verbose to verbose first; route to V4 Pro for codebases you don't understand and one-shot creative work; Flash handles everything else.
  • GLM 5.1 — executor, with the §6.4 caveat that the 30%→$72/mo Z.AI price move is the most important mid-2026 update. The routing rule post-hike: GLM 5.1 stays in the executor slot for one-shot coding builds; for chat-shaped work and "boring but expensive" refactors, route to V4 Flash (cheaper, cached) or Qwen 3.7 Plus (multimodal, $0.40/M).

The routing table

For each common agent workload, the channel's pick among the four Chinese labs and the non-Chinese fallback:

  • Python refactor on a 200–500-line file. DeepSeek V4 Flash first (cache trick, ~4x cheaper than Pro, identical answers on the channel's five-test hands-on). Escalate to V4 Pro only if Flash misses something specific. Don't route to Qwen Max or GLM 5.1 here — both are overkill on price.
  • Repo map on a FastAPI / local 50-file project. DeepSeek V4 Flash first (~1 minute on a local 50-file project, ~12 minutes on a 50-file FastAPI repo). Escalate to GPT 5.4 or Gemini 3.1 Pro for "show me how this codebase fits together" tasks where reasoning quality matters more than cache.
  • One-shot space shooter / Warhammer Invaders build. GLM 5.1 first — the §6.4 "one-shotted a space shooter that Opus 4.7 choked on" framing is the executor-slot call. Light plan at $7–10/mo, not Pro at $72/mo. Disable web search inside Claude Code.
  • Long-doc summarization (3+ academic papers). DeepSeek V4 Flash first (~2 minutes, 1M context, hybrid attention, cache trick). Save papers as PDF/HTML into a dedicated folder; don't paste Google Scholar URLs.
  • 3D Chinese architecture / Figma screenshot → working code. Qwen 3.7 Max first if you can pay $7.50/M output tokens and the build is terminal-only; Qwen 3.7 Plus for hybrid GUI+CLI reasoning. Local Qwen 3.5 only for privacy-sensitive heartbeat tasks.
  • Long-horizon autonomy (overnight refactor, agent swarm). Qwen 3.7 Max first (8m 53s on the ancient Chinese 3D building) or Kimi 2.7 with /swarm enabled (9 of 100 agents on the poem-to-game test). Run on a local machine with a GPU for graphics-intensive builds.
  • Cron / scheduled tasks. DeepSeek V4 Flash first (the channel's framing: reliable scheduled task execution, fewer errors than alternatives). Kimi 2.7 with /swarm for multi-step research cron jobs. The 5-hour rolling counter on each lab's plan tier is the structural rate limiter; pick a plan with a window that fits your cron schedule.
  • Privacy-sensitive workload (PII, healthcare, legal). Local Qwen 3.5 on LM Studio first. Cap at the 9B / 6 GB build on a 3060; ~37 tok/s. Disable thinking for short factual prompts. The car-wash sanity check is the canary for reasoning degradation.
  • GPU-bound graphics-intensive build. Local machine with a GPU, not a VPS. Kimi 2.7 with /swarm is the channel's "playable" test result; the same prompt on a VPS produced unplayable output. The model isn't the bottleneck — the host is.
  • Orchestrator planning (multi-step, multi-hour). GPT 5.4 first, then Gemini 3.1 Pro, then Qwen 3.6 Plus. Kimi 2.5 with /swarm for research-heavy planning. Don't put GLM 5.1 in the orchestrator slot.
  • Vision / image / Figma-to-code / dense OCR. Qwen 3.7 Plus first (native vision, GUI+CLI hybrid reasoning, 64.7 Baby Vision). Don't use DeepSeek V4 Flash — the "pretty inconsistent" vision call from §6.3 is structural, not a benchmark miss.
  • One-shot creative / writing. GPT 5.4 or Claude Fable 5 (the channel's "genius level" framing). Qwen 3.7 Max can do it; the "weird, very Chinese" framing from §6.2's audience comments is the canary. Avoid Kimi 2.7 on heavy one-shot creative.
  • Q&A / chat / explanation. GPT 5.4, Gemini 3.1 Pro, or Claude Sonnet 4.6. Don't route chat to GLM 5.1 (regressed vs GLM 5), DeepSeek V4 Flash (skip the cache, no advantage), or Qwen 3.7 Max (overkill on cost).

The non-Chinese fallbacks

The channel's routing table does not pretend the four Chinese labs cover every workload. Three non-Chinese fallbacks are load-bearing:

  • GPT 5.4 — top orchestrator in the channel's tier list, "designed with native agentic workflows in mind." Use for one-shot complex builds and any task where reasoning quality dominates cost.
  • Gemini 3.1 Pro — ties as top orchestrator, adds native video and audio input, the go-to for screen recordings and structured dashboard extraction.
  • Claude Fable 5 with loop syntax — the channel's "genius level" model, on Frontier Coding Diamond 29.3% vs GPT 5.5's 5.7%. Use inside Cursor / Claude Code / a coding IDE, with the harness command loop until it's done plus an explicit validation rule. Cap spend at ~$15 per project; finish before the cheap window closes June 21–22.

The decision rule: if the four Chinese labs don't have a clean fit, route to a non-Chinese fallback rather than forcing a square peg into a round hole. The "use MiniMax M2.7 because it's cheap" argument applies to Chinese labs too — pick the model that wins the workload at the lowest cost, not the model with the highest benchmark score.

The three pattern innovations, side by side

Each of the four Chinese labs in §6 has a structural pattern innovation that defines its slot in the agent stack. Looking at them side by side is the cleanest way to see why the routing table assigns each one a different job:

  • Kimi's "agent swarm" — a single orchestrator that can "self-direct a swarm of like about 100 sub agents" and coordinate up to 1,500 tool calls without a predefined workflow. The §6.1 framing: a research-heavy multi-step prompt is where Kimi earns its slot. The order-of-operations rule (/swarm on, then prompt, then plan) is the load-bearing setup.
  • Qwen's "preserved thinking" — Qwen 3.6 Plus retains its internal chain-of-thought reasoning across ALL prior turns in a session, not just the current one. The §6.2 framing: long-horizon agent tasks where contradictions across turns are the actual failure mode. The public.ai_models row grounds it: "always-on reasoning trace, preserved thinking across sessions, reduces contradictions in long tasks."
  • DeepSeek's "thinking inside tool calls" — the model reasons while deciding which tool to invoke, self-corrects mid-execution based on tool results. The §6.3 framing: agent loops with many tool calls (40+ tool environments are typical). The cache trick is the price amplifier; the pattern is the quality amplifier.
  • GLM 5.1's "self-tests instead of just spitting code" — the model iterates inside the same session when something breaks. The §6.4 framing: one-shot coding builds where the model is given a brief and expected to deliver a working artifact. The "thinking inside tool calls" pattern overlaps with DeepSeek's; the difference is the iteration loop.

The pattern innovations are not mutually exclusive — Kimi can use the swarm pattern on a thinking-inside-tool-calls workload, and Qwen 3.6 Plus's preserved thinking benefits any agent loop. But the primary innovation per lab maps cleanly to the §6.5 routing table:

  • Research-heavy multi-step → Kimi (swarm)
  • Long-horizon autonomous refactor → Qwen 3.6 Plus / 3.7 Max (preserved thinking)
  • Long agent loop with many tool calls → DeepSeek V4 Flash (thinking inside tool calls)
  • One-shot coding build → GLM 5.1 (self-tests + iterates)

That mapping is the structural backbone of the routing table. The plan-tier decisions and the workload-by-workload assignments in §6.5 are downstream of which pattern the workload needs.

When the routing table needs to change

The §6.5 routing table is a snapshot. Three things in mid-2026 are likely to invalidate parts of it within the next quarter:

  1. Z.AI's open-weights drop. Z.ai has promised GLM 5.1 weights but hasn't published them. If they ship, the local-GLM path becomes viable for the privacy-sensitive workload in §6.2's local-Qwen 3.5 slot. Re-benchmark the local builds; the channel's stance is "treat the promise as unconfirmed until the repo publishes."
  2. Kimi 2.8 or 2.9. Kimi 2.7 is a reasoning-efficiency drop, not a capability jump. The next Kimi major is likely to push the orchestrator-capability axis. Re-evaluate Kimi's slot in the orchestrator tier when the next release drops; the swarm pattern may stop being the load-bearing feature.
  3. DeepSeek V5. The channel has been waiting for V5 since the "Chinese AI labs are copying Claude" video. If V5 lands above V4 Pro on the car-wash prompt and the cost ratio holds, the routing table's "V4 Pro for codebases you don't understand" rule may flip to "V5 Pro for codebases you don't understand, V4 Flash for everything else." Pin DeepSeek V5 to your evaluation queue the week it drops.

The deeper point: the channel's four-lab coverage is a living routing table. The model cards in §6.1–§6.4 give you the static picture; the routing table in §6.5 gives you the dynamic decisions. Re-run the routing table monthly. The KimiClaw version pin, the Qwen 3.6 Plus free-period end, and the §6.4 Z.AI price hike are the three cautionary tales of mid-2026 — vendor terms change faster than your config does.

The data-trust and hosting-region axis

The four labs are not equivalent on data-trust and hosting-region, and the channel's coverage flags this on every model card. A short table to make the differences explicit:

  • Kimi (Moonshot). Servers in China mainland, USD billing. The §6.1 verdict: "we don't know what sort of data they're keeping" — the honest default is to assume Moonshot retains prompt data and not to send anything sensitive through KimiClaw. The OpenRouter route inherits this posture.
  • Qwen (Alibaba). Alibaba Cloud Model Studio is hosted in Alibaba's regions (Singapore, Frankfurt, US East, US West among others — pick at signup). The §6.2 verdict: privacy posture is comparable to other major cloud providers, and the local Qwen 3.5 path is the only "data stays on your machine" option.
  • DeepSeek. Hosted on Nous Portal in addition to DeepSeek's own infrastructure. The §6.3 verdict: data-retention posture is documented at the Nous Portal level (free tier has its own terms), and the V4 Flash cache mechanic interacts with the data-retention policy — cached reads are nearly free, but cached prompts are also retained per the portal's TOS.
  • GLM 5.1 (Z.ai / Zhipu). Z.ai's servers; Zhipu is a public company with documented infrastructure. The §6.4 verdict: data-retention is more transparent than Moonshot's because of public-company disclosure requirements, but the §6.4 China-factor context (Tsinghua lineage, non-Nvidia Chinese chips) means the data flow is China-region by default. The local path requires open weights, which haven't shipped.

The routing-table rule: for privacy-sensitive workloads (PII, healthcare, legal), default to the local Qwen 3.5 build or to a non-Chinese hosted model. Don't route sensitive data through KimiClaw. Don't route sensitive data through GLM 5.1 until the open weights drop and you can self-host. DeepSeek V4 Flash is the cleanest of the four for non-sensitive workloads because the cache mechanic is well-documented; Kimi 2.5/2.7 via OpenRouter is acceptable for non-sensitive work but the China-mainland hosting is the binding constraint.

The non-Chinese fallbacks (GPT 5.4, Gemini 3.1 Pro, Claude Fable 5) all have US/EU hosting with documented data-retention policies. If your workload is privacy-sensitive and the workload fits the Chinese-lab routing rules, the local Qwen 3.5 build is the only path that gives you both. Otherwise route to a non-Chinese fallback.

A final word on the four-lab framing

The channel chose to cover these four labs specifically, and the choice is not arbitrary. Kimi, Qwen, DeepSeek, and GLM are the four Chinese AI labs whose models the channel has actually wired into production agents — the four whose model cards survive the "is it real" question. Other Chinese labs (Baidu's Ernie, Tencent's Hunyuan, ByteDance's Seed) are mentioned in passing in the §2.8 cross-cutting coverage and in Course 7: The AI Industry Beat, but the channel's agent-stack coverage concentrates on the four where the routing decisions compound.

If you read the §6.1–§6.4 model cards and the §6.5 routing table as a unit, the takeaway is structural: each of the four Chinese labs has a load-bearing pattern innovation that defines its slot, and the four slots are not interchangeable. The Kimi swarm is for research-heavy multi-step work. The Qwen preserved thinking is for long-horizon autonomous refactors. The DeepSeek thinking-inside-tool-calls is for long agent loops with many tool calls. The GLM 5.1 self-tests-and-iterates is for one-shot coding builds. Pick the pattern the workload needs, then pick the lab that owns it. The §6.5 routing table is the decision rule; the four model cards are the receipts.

The plan-tier summary

A quick map of which plan tier to pick for each of the four labs:

  • Kimi 2.5 / 2.7 — OpenRouter pay-per-use ($0.50 input / $2 output per million tokens). Skip the KimiClaw wrapper; self-host on a $2/mo Zeabur VPS.
  • Qwen 3.6 Plus — Alibaba Cloud Model Studio pay-per-use (~$0.40 input / $1.60 output per million tokens per the current API platform rate). The "free on Hermes" framing is retired.
  • Qwen 3.7 Max — Alibaba Cloud Model Studio pay-per-use, $7.50 per 1M output tokens. The token plan is "credits per seat per month" — estimate precisely or stay on pay-per-use. Toggle "free quota only" before your first request.
  • Qwen 3.7 Plus — same Model Studio pay-per-use, ~40% cheaper than Max on token cost. The default for IDE work.
  • Local Qwen 3.5 — LM Studio (Mac / Windows), 9B / 6 GB on a 3060. Free after hardware investment; pay in electricity and time.
  • DeepSeek V4 Flash — Nous Portal free tier, with the cache trick. Set Hermes TUI reasoning to xhigh and verbose to verbose.
  • DeepSeek V4 Pro — Nous Portal Pro tier, ~$0.87/M output tokens. Use for codebases you don't understand and one-shot creative work.
  • GLM 5.1 — Z.ai Light plan at $7/mo yearly or $10/mo monthly first. Pro at $72/mo only after the Light-plan ceiling breaks and you've re-priced against V4 Flash / Qwen Plus / MiniMax M2.7.

Try it yourself

The hands-on goal: build a single routing table that says, for each of your common agent workloads, which of the four Chinese labs (or non-Chinese fallback) you wire in. The §6.1–§6.4 model cards give you the four families; this capstone adds the workload axis and the plan tier axis so you have a config file, not a vibe.

  1. List your common agent tasks. Refactor. Repo map. Doc summary. One-shot creative. Vision. Cron. Long-doc research. Privacy-sensitive. GPU-bound. The full grid the channel tests.
  2. For each task, assign a model per §6.1–§6.4 and the routing table above. Kimi → orchestrator with /swarm for research; Qwen → Plus for IDE work, Max for terminal-only overnight, 3.5 local for heartbeat; DeepSeek → V4 Flash for cached refactors, V4 Pro for codebases you don't understand; GLM 5.1 → executor slot for one-shot coding builds (Light plan first, Pro at $72/mo only if the Light ceiling breaks).
  3. Pick a plan tier per model. Kimi → OpenRouter pay-per-use. Qwen → Alibaba Model Studio pay-per-use (or LM Studio local). DeepSeek → Nous Portal free tier for Flash, Pro plan for Pro. GLM → Z.ai yearly at $7/mo for testing, monthly at $10/mo if you've measured the quota, $72/mo Pro only after the Light-plan ceiling breaks.
  4. Set the "default tier" foot-guns before you benchmark. §6.3 (DeepSeek): Hermes TUI reasoning xhigh, verbose verbose. §6.2 (Qwen): Alibaba "free quota only" toggle. §6.1 (Kimi): /swarm on before the first prompt, not after. §6.4 (GLM): disable web search inside Claude Code.
  5. Cap the local-Qwen build at the disable-think-on-short-prompts rule from §6.2. The car-wash sanity check is the canary.
  6. Reserve one model per Chinese lab as the "promotion path." §6.1 Kimi 2.5 → 2.7 if you have a swarm workload. §6.2 Qwen 3.6 Plus → 3.7 Max if the budget survives. §6.3 DeepSeek V4 Flash → V4 Pro on codebases you don't understand. §6.4 GLM 5.1 → still in executor slot, not orchestrator.
  7. Re-run the routing table monthly. The Z.AI price hike in §6.4 is the cautionary tale — the table you wrote in March may already be wrong by April. The KimiClaw version pin and the Qwen 3.6 Plus free-period end are the same class of issue: vendor terms change faster than your config does.

Common pitfalls

  • Treating the four Chinese labs as interchangeable. Kimi 2.5 is an orchestrator with a swarm; Qwen 3.7 Max is a text-only flagship for terminal-only overnight; DeepSeek V4 Flash is the cached-executor default; GLM 5.1 is the one-shot coding-build specialist. The routing table assigns each one a slot, and the slots are not fungible.
  • Trusting the "Qwen 3.6 Plus free on Hermes" title in 2026. The free period ended. The orchestrator-slot framing holds; the price doesn't. Plan around $0.40/M input, $1.60/M output.
  • Paying $39/month for KimiClaw when you can self-host for $2. The §6.1 verdict: "you get way more value for $2 than for $20 here." The wrapper is convenience, not features.
  • Adopting GLM 5.1 on the Pro plan at $72/mo without re-pricing. The §6.4 anchor. The Light plan at $7–10/mo is the budget tier; Pro is the catch-up-after-Opus-failed tier. Re-benchmark against MiniMax M2.7 / Qwen Plus / DeepSeek V4 Flash on your representative workload before committing.
  • Routing V4 Flash into the orchestrator slot. Flash loses on orchestration (§6.3) and the 14-minute sub-agent delegation test is the canary. Reserve V4 Pro / GPT 5.4 / Gemini 3.1 Pro for orchestrator planning; Flash is the executor.
  • Using GLM 5.1 for chat / Q&A work. King AI's review confirmed answering quality regressed vs GLM 5, and Z.ai itself only markets 5.1 to coders. Route the Q&A half to a different model.
  • Forgetting the "free quota only" toggle on Alibaba Model Studio. The 1M free tokens are real, but the card auto-charges the moment the quota is gone. The toggle is the only thing between "free trial" and "surprise bill."
  • Running graphics-intensive builds on a VPS. The Kimi 2.7 broken poem-to-game on a VPS is the canary. The model isn't the bottleneck — the host is.
  • Benching on Qwen 3.7 Max for short one-shot prompts. At $7.50 per 1M output tokens, Max is "one of the most expensive flagship models." The cost is justifiable on multi-hour agent runs, not on quick chat. Route chat to Plus or to a cheap model.
  • Reading V4 Flash as a contradiction to the "you don't need Opus" thesis. They're the same thesis at different price points. V4 Flash is the route-against-flagship version of the Minimax 2.7 / M3 argument from Course 2 §2.3 — pick the model that wins the workload at the lowest cost, not the model with the highest benchmark score.
  • Letting the "thinking inside tool calls" framing of DeepSeek trick you into routing everything to it. The pattern is real, but Flash loses on vision, one-shot creative, and orchestration. Use the pattern where it wins; route around it where it doesn't.
  • Defaulting to Z.ai's "our competitors are giving you slop" framing as the channel's read. It's a Z.ai quote, not a channel quote. The channel reads it as "they know they can't beat Anthropic on quality, so they're beating them on price" — a strategic move tied to Opus's regression, not a quality claim. The model is genuinely good; the price increase is a market read.

Sources

This is the aggregated Source list for all of §6. Every video in the four Chinese-lab sub-articles is listed here, plus the Top AI Models for Hermes Agent (Tier List) and Best Model for Openclaw (WildClaw Benchmarks!) videos that ground the routing decisions.

  • KimiClaw Review - Is it Worth it? — 3,783 views · video_id: 0WClbjO59HI · watch
  • Kimi K2.7 Review (MUST Use Agent Swarm) — 206 views · video_id: nzG5KXBAYxs · watch
  • KimiClaw Setup Guide (Openclaw on Kimi 2.5) — 5,899 views · video_id: gOL73ONY0J8 · watch
  • Qwen 3.7 Max is ACTUALLY INSANE! (Real Tests and Review) — 5,470 views · video_id: 2gDB-2ifLPw · watch
  • Qwen 3.7 Plus is SO POWERFUL! (Real Tests and Review) — 4,290 views · video_id: 5L4W_KI3ca0 · watch
  • Qwen 3.6 Plus is FREE on Hermes Agent (USE it like this) — 3,577 views · video_id: Nqs_5RLg6QA · watch · summary/transcript null in DB · section grounded via public.ai_models.qwen-3-6-plus and 9 top-liked viewer comments in public.youtube_comments (free-period-ended correction)
  • Qwen 3.5 Setup on Your Local Computer (Step-by-Step Guide) — 6,145 views · video_id: 4d1TOu-1Umk · watch
  • Qwen 3.5 in YOUR BROWSER (Setup Guide) — 4,150 views · video_id: HM2W-lvUMok · watch
  • Qwen 3.5 Local Model Review (Is it Good?) — 1,946 views · video_id: yh3oWLVPYYw · watch
  • DeepSeek v4 Flash + Hermes Agent = Surprisingly STRONG — 4,893 views · video_id: s3Q9hvdlrmo · watch
  • URGENT: GLM5.1 released and its Amazing (and cheap) — 5,809 views · video_id: JR-3e-BLWu0 · watch
  • Glm 5.1 Test : Making a Retro Style Game — 98 views · video_id: 3N0Pe3dkwBE · watch
  • Top AI Models for Hermes Agent (Tier List) — 8,107 views · video_id: Af7Fg1m7Rw · watch — referenced for the orchestrator vs executor slot framing in §6.5
  • Best Model for Openclaw (WildClaw Benchmarks!) — 4,574 views · video_id: 31Ij4Cum5tg · watch — referenced for the GLM 5.1 37% score and the 4.5 / 4.6 family benchmark numbers
  • Supabase queries used to ground the article:
    -- Source videos for §6.1–§6.4
    SELECT video_id, title, views, summary_content, summary_key_takeaways
    FROM public.videos
    WHERE video_id = ANY(ARRAY[
      '0WClbjO59HI', 'nzG5KXBAYxs', 'gOL73ONY0J8',
      '2gDB-2ifLPw', '5L4W_KI3ca0', 'Nqs_5RLg6QA',
      '4d1TOu-1Umk', 'HM2W-lvUMok', 'yh3oWLVPYYw',
      's3Q9hvdlrmo',
      'JR-3e-BLWu0', '3N0Pe3dkwBE'
    ]);
    
    -- Viewer comments on the 3.6 Plus "free" video (N=20, all 2026-05+)
    SELECT comment_id, author_name, published_at, like_count, text_display
    FROM public.youtube_comments
    WHERE video_id = 'Nqs_5RLg6QA'
    ORDER BY like_count DESC NULLS LAST, published_at DESC;
    
    -- Tier table row that backs the Qwen 3.6 Plus orchestrator-slot framing
    SELECT slug, name, vendor, tier, short_description, strengths, weaknesses
    FROM public.ai_models
    WHERE slug = 'qwen-3-6-plus';
    
    -- AI briefing for the Qwen3.6-27B dense-beats-397B-MoE context
    SELECT id, title, published_at, excerpt, content
    FROM public.ai_updates
    WHERE id = '390df74c-64c0-4d52-8960-5c69c1c14edb';
    
    against project ttxdssgydwyurwwnjogq.

NOTE on pricing, version numbers, and roadmap claims: the $39/mo Allegretto plan price for KimiClaw, the OpenClaw 2.13 version pin on KimiClaw, the ~30% thinking-token cut on Kimi 2.7 vs 2.6, the OpenRouter Kimi 2.5 pricing of ~$0.50/M input and $2/M output, the 100-agent swarm budget and 9-agent picked-for-game-build, the 5x quota on the Allegro plan, the 8m 53s / 18-tool-call / 100% success-rate Qwen 3.7 Max benchmark, the $7.50/M output token price on Model Studio, the 1M free tokens / auto-charge "free quota only" toggle, the 9B / 6 GB local Qwen 3.5 build, the ~37 tok/s on a 3060, the ~2 minute car-wash loop with thinking on, the 64.7 Baby Vision score on 3.7 Plus, the 40% lower token cost on Plus vs Max, the 32,000-token Plus/Max decision rule, the audience-confirmed "free period ended" correction on Nqs_5RLg6QA, the $0.40 / $1.60 per million Qwen API platform price a current viewer reports, the #10-of-87 intelligence rank and #4-at-~134-output-tokens/second speed rank for V4 Flash, the 1M-token context with hybrid attention, the ~35% / ~4x cost ratio vs V4 Pro, the 14-minute 3-sub-agent research run, the ~2-minute 3-paper long-doc summarization, the cached-reads-are-nearly-free mechanic, the "xhigh" reasoning and "verbose" Hermes TUI settings, the 60 iterations / ~500 lines of Python safe envelope, the $10/mo monthly and $7/mo yearly Z.ai Light plan prices, the 30%→$72/mo Z.AI coding plan price hike (the §6.4 anchor), the 47.9 vs 45.3 vs 35.4 Z.ai coding-eval numbers, the 25%-of-5-hour-quota / 6-million-token burn, the 2.5k-token web-search stall, the 37% WildClaw 60-test score, the executor-slot call in the Hermes tier list, and the "our competitors are giving you slop" Z.ai marketing quote are all drawn from the source videos, public.ai_models.qwen-3-6-plus, public.ai_updates 2026-04-23, and public.youtube_comments cited above. These are time-stamped claims — re-check the official Moonshot, Qwen, Alibaba Cloud Model Studio, DeepSeek, Nous Portal, Z.ai / Zhipu, and Hermes Agent documentation if you read this article after a new release.