The single most useful pattern in the channel's coverage of the auxiliary slot isn't a model — it's a wiring diagram. BYOK stands for Bring Your Own Key: you subscribe to an agent platform (Hermes Agent, Mavis, Mavis Desktop, the OpenClaw family), then plug in a third-party model API key from any compatible provider. The magic is that the free-tier providers — MiniMax, Z.AI, Xiaomi Mimo — all work, and prompt caching is already configured for the named providers. That means a 24/7 agent that costs you $0/month, with prompt caching on, request counts visible, and the agent's own skill loop on top.

This is the structural reason the auxiliary slot is worth a course: the BYOK pattern is what turns "free" into "tier-list tier." Without it, free models are toy demos. With it, free models are the substrate of a production-grade agent.

The article walks through what BYOK is in Hermes Agent, why the three named providers (MiniMax, Z.AI, Xiaomi Mimo) are the entry points, what the prompt caching claim actually buys you, and how to wire a $0/month stack end-to-end.

What you'll learn

  • BYOK (Bring Your Own Key) is the Hermes Agent pattern: subscribe to the agent, then plug in a third-party model key. The agent handles orchestration, skills, cron, and Kanban; the model handles the actual completions.
  • The three free-tier providers the channel names explicitly are MiniMax, Z.AI (Zai), and Xiaomi Mimo. All three work. All three have free tiers. MiniMax is the recommended default because it's the Mavis substrate and has the deepest integration with the Mavis verifier pattern.
  • Prompt caching is already configured for the named providers — no JSON hacks, no manual config file edits, no per-provider hand-tuning. The channel's framing: this is what you couldn't do on OpenClaw without editing the JSON file by hand.
  • Request counts and per-model spend are explicit in Hermes. The contrast: Anthropic's Claude "doesn't tell you how many credits you've burned" — the channel's anti-pattern. Hermes shows you the numbers.
  • The 15-turn self-evolving skill loop is on by default in any BYOK setup, with no paid model required. The structural unlock: your stack gets better the longer it runs, even on a free model.
  • The Claude Code caveat: if you're already paying $200/mo for a Claude Max plan and don't mind locked-in limits, Claude Code is still a viable option. For everyone else, Hermes + BYOK wins on value.

What BYOK actually is

BYOK is the pattern where the agent platform is decoupled from the model vendor. You sign up for the agent (Hermes Agent, in this case), then plug in a model API key from a third party. The agent handles the runtime — orchestration, skills, cron jobs, Kanban, the dashboard — and the model provider handles the completions. The two contracts are independent: you can swap model providers without reinstalling the agent, and you can swap agents without changing your model key.

The structural advantage is cost: the agent platform has its own pricing (Hermes is open-source, Mavis is $10/mo, Mavis Desktop is the same), and the model provider has its own pricing (MiniMax, Z.AI, Mimo are free tiers; GPT 5.4 is $50–75/mo; Opus is $200+/mo). BYOK means you pay the agent price and the model price separately, which makes the cost arithmetic transparent. The contrast: Claude Code (Anthropic's CLI) doesn't support BYOK, so you're locked to whatever Anthropic gives you that week. The channel's framing from the Hermes vs OpenClaw video: "Cloud Code's only edge: you cannot BYOK, so you're locked to whatever Anthropic gives you that week — and limits were 'quietly reduced' the week prior."

The three named free-tier providers

The channel's coverage names three free-tier providers that work in the BYOK pattern:

  1. MiniMax — the recommended default. The same company that ships Mavis, the desktop multi-agent product. The structural reason this is the recommended default: MiniMax is the substrate for Mavis, so the integration is native and the verifier pattern is built in. The Mavis video (86UIZVWkvF8) is the channel's most-viewed on this stack at 30,626 views. The MiniMax token plan bundles text, image, and video (Hailuo) under a single subscription, and the entry tier is $10/mo for the paid plan. The free tier exists for the BYOK pattern specifically — you wire the key, the agent uses it, you pay $0.
  2. Z.AI (Zai) — Z.AI's free tier is the second option. The Z.AI coding plan was $30/mo and recently doubled to $72/mo (the channel's coverage flags this as a pricing move tied to Claude's regression — see Course 2 §2.7). The free tier is the BYOK entry point; the paid coding plan is for users who want the bundled GLM 5.1 features.
  3. Xiaomi Mimo — the Mimo V2 Pro free promotional period, covered in §7.1. The Mimo free tier is the third option. The channel's framing: try Mimo now while it's free, but have a backup model wired up because the free period will end.

The three providers aren't interchangeable. The functional differences:

  • MiniMax has the deepest integration with the Mavis verifier pattern. If you want the adversarial verifier (workers produce, separate agent reviews from first principles without shared conversation history), MiniMax is the cleanest path.
  • Z.AI / GLM 5.1 is the strongest on raw coding tasks. The DeepSeek GLM 5.1 numbers from the WildClaw benchmark coverage are 75%+ on coding tasks — the highest in the channel's coverage. If your workload is code-heavy, Z.AI is the right pick.
  • Mimo V2 Pro is the "high-volume king." If your workload is document processing, batch transforms, or skill-generation sweeps, Mimo is the right pick. The free period makes it the cheapest entry point during the promo.

The channel's overall recommendation: start with MiniMax for the Mavis substrate and verifier pattern, then add Mimo V2 Pro as a high-volume auxiliary, then add Z.AI for code-heavy workloads. All three work in the same BYOK pattern; the differences are in the use case, not the wiring.

The Hermes vs OpenClaw case

The Hermes vs OpenClaw: Why Everyone Is Migrating video (6,116 views) is the channel's migration case for Hermes Agent. The load-bearing claims for §7.2:

  • Hermes Agent is essentially OpenClaw rebuilt by Nous Research — the team that designed the Hermes architecture shipped it as an open-source fork with the features OpenClaw should have had months ago.
  • Built-in migration: the Hermes GitHub repo has a "Migrating from OpenClaw" section that transfers soul, memories, and settings. The creator's team ran this against their existing agent stack (a primary agent called Stark plus several others) and it worked.
  • 15-turn self-evolution: every 15 turns the agent audits its own performance and rewrites its skills. The 15-turn loop is on by default in any bring-your-own-key setup, no paid model required.
  • BYOK + prompt caching without JSON hacks: plug in MiniMax, Z.AI, or Xiaomi Mimo keys (all free tiers) and prompt caching is already configured. The contrast: on OpenClaw, the creator had to edit the JSON file manually to enable prompt caching.
  • Visibility into request counts and per-model spend: explicit in Hermes. The contrast: Anthropic's Claude "doesn't tell you how many credits you've burned" — the channel's anti-pattern.

The Claude Code caveat from the same video: the only case for not switching is if you're already paying $200/mo for a Claude Max plan and don't mind locked-in limits. Claude Code's "only edge" is the Anthropic integration; the BYOK pattern doesn't apply. For everyone else, Hermes wins on value.

Prompt caching — and why "already configured" matters

Prompt caching is the technique where the model provider caches the system prompt and any prefix that doesn't change between requests, so subsequent requests only pay for the delta. The savings are substantial: a 5,000-token system prompt that gets re-sent on every turn costs 5,000 input tokens per turn without caching, and 200 input tokens per turn with caching. For a long-running agent, prompt caching is the difference between a $50/mo bill and a $5/mo bill on the same workload.

The channel's claim: prompt caching is already configured for MiniMax, Z.AI, and Xiaomi Mimo in Hermes Agent. The structural reason this matters: on OpenClaw, the creator had to edit the JSON file manually to enable prompt caching. On Hermes, it ships pre-configured. The exact mechanism is provider-specific — the channel doesn't publish the specific cache TTLs or the per-provider implementation — but the user-facing experience is the same: plug in the key, run the agent, prompt caching is on.

The Mavis verifier pattern (see Course 2 §2.3) is the related trick: the verifier agent doesn't share conversation history with the worker, so the cached prefix is smaller and the cache hit rate is higher. For high-volume tasks where the verifier runs on the same model as the worker, prompt caching compounds.

The DeepSeek v4 Flash worked example

The DeepSeek v4 Flash + Hermes Agent = Surprisingly STRONG video (4,893 views) is the worked example for the BYOK + prompt caching claim. V4 Flash's prompt caching is the explicit reason it became the highest-consumed token on Hermes Agent. The pattern the source video describes: if you find yourself re-pasting the same context across runs, switch to V4 Flash and let the cache do the work.

V4 Flash is a paid model, but the prompt caching claim generalises to the free-tier providers. The structural argument: on Hermes, prompt caching is pre-configured for MiniMax / Z.AI / Mimo. Run the same long-context workload on a free provider with caching, and the cost-per-token is dominated by the cached reads, which are 5–10x cheaper than uncached input. For a 24/7 agent, this is the difference between a sustainable free stack and a bill you can't afford.

Wiring a $0/month stack end-to-end

The hands-on path: stand up Hermes Agent on a $3/mo VPS, plug in Mimo V2 Pro (or MiniMax / Z.AI) as the BYOK provider, run a high-volume task, watch the cache hit rate, confirm request counts are visible.

  1. Subscribe to a free-tier provider. Sign up for Mimo V2 Pro (News Portal, Kilo.ai, or OpenRouter), MiniMax (Mavis tier), or Z.AI (free tier). Get the API key. The exact sign-up flow varies by provider; the structural pattern is the same.
  2. Install Hermes Agent on a VPS. See Course 3 §3.2 for the install guide. A 4 GB / $3-per-month VPS is enough for a 24/7 agent.
  3. Wire the BYOK pattern. In Hermes, navigate to Settings → Models and paste the provider's API key. Select the provider from the model dropdown. Confirm the model name appears in the chat interface.
  4. Verify prompt caching is on. Run a multi-turn conversation with a long system prompt. Check the per-model token breakdown in the dashboard's analytics tab (see Course 3 §3.4). The cached reads should be a substantial fraction of the input.
  5. Set the orchestrator and executor to the same free provider. For the $0/month stack, route both orchestrator and executor to Mimo V2 Pro. Add Gemini 3 Flash as the auxiliary (also free).
  6. Run a high-volume task. Document processing, batch transforms, skill-generation sweeps. The Mavis verifier pattern (see Course 2 §2.3) catches the bad outputs.
  7. Monitor the cache hit rate and request counts. The dashboard shows both. If the cache hit rate is low, check that prompt caching is actually enabled for the provider. If the request count is climbing faster than expected, check that the agent isn't looping or duplicating requests.
  8. Plan the transition. The free period will end for Mimo (see §7.1). Have MiniMax M2.7 wired up as the backup. Test it on the same high-volume task while Mimo is still free.

The Claude Code caveat — and when not to BYOK

The channel's case for BYOK is strong, but the same coverage includes a caveat: the only case for not switching to Hermes is if you're already paying $200/mo for a Claude Max plan and don't mind the locked-in limits. The structural argument: Claude Code is a fine CLI, and the $200/mo plan includes Opus 4.6 / 4.7 access, but you can't BYOK, and the limits have been "quietly reduced" in the weeks before the video.

The Hermes vs Claude Code trade-off, as the channel frames it:

  • Choose Claude Code if: you're already on a $200/mo Claude Max plan, you don't mind the locked-in limits, and the Opus 4.6/4.7 capability is worth the spend for your specific workload.
  • Choose Hermes + BYOK if: you want cost transparency, you want to swap model providers without reinstalling, you want prompt caching pre-configured, or you're not already paying Anthropic prices.
  • Choose Mavis if: you want a packaged desktop app with the verifier pattern built in, and the $10/mo entry tier is acceptable.

The channel's overall framing: for Starter, Plus, or Max plan users running side projects, Hermes wins on value. For $200/mo Claude Max subscribers with a stable workload, Claude Code is still a viable option. The default in 2026 is BYOK on Hermes.

The 15-turn self-evolving skill loop

The 15-turn self-evolution mechanic is on by default in any BYOK setup, with no paid model required. Every 15 turns, the agent audits its own performance and rewrites its skills. The structural implication: the longer your agent runs, the better it gets, even on a model that's a poor choice for one-shot tasks.

The channel's framing from the Hermes vs OpenClaw video: "Hermes runs an evolving mechanism every 15 turns where the agent audits its own performance and rewrites its skills. The creator previously had to do this by hand — opening code, reading scripts, troubleshooting — just to improve thumbnail quality. Hermes does it by default with any bring-your-own-key setup, no paid model required."

The implication for the free-tier stack: Mimo V2 Pro's 55% first-pass reliability is fine if the skill loop is rewriting the bad skills every 15 turns. A skill that's wrong 45% of the time and right 55% of the time gets audited, the failures get analysed, the skill gets rewritten, and the next 15-turn cycle has a better skill. The compounding effect is the structural reason the free tier is enough for a long-running agent — the agent gets better the longer it runs, and the gains are model-portable (skills survive a model swap).

The full BYOK config, step by step

The hands-on path for the BYOK pattern on Hermes Agent, with the exact commands the channel's coverage implies:

1. Sign up for the provider.

For MiniMax, the entry point is the Mavis product page (https://mavis.example — verify on the live site). The free tier exists for the BYOK pattern; the $10/mo Mavis Assistant tier is the paid on-ramp. Get the API key from the Mavis dashboard.

For Z.AI (Zai), the entry point is the Z.AI coding plan page. The free tier is the BYOK entry; the $30–72/mo coding plan is for users who want the bundled GLM 5.1 features. Get the API key from the Z.AI dashboard.

For Xiaomi Mimo, the entry point is the News Portal (https://news.example — verify on the live site) or Kilo.ai. The free period is promotional. Get the API key from the News Portal dashboard.

2. Install Hermes Agent on a VPS.

See Course 3 §3.2 for the full install guide. A 4 GB / $3-per-month VPS is enough. Patch the two fresh-VPS gotchas (sudo apt update, say "no" to the OpenClaw import prompt) before pasting the install command.

3. Wire the BYOK pattern.

In Hermes, the BYOK wiring happens in ~/.hermes/config.yaml (or via the dashboard's Settings → Models panel). The exact fields:

models:
  - name: mimo-v2-pro
    provider: xiaomi
    api_key: <your-mimo-key>
    base_url: https://api.xiaomi.example/v1  # verify on live site
    prompt_caching: enabled  # pre-configured for named providers
  - name: minimax-m2.7
    provider: minimax
    api_key: <your-minimax-key>
    base_url: https://api.minimax.io/v1
    prompt_caching: enabled
  - name: z-ai-glm-5.1
    provider: zai
    api_key: <your-zai-key>
    base_url: https://api.z.ai/v1
    prompt_caching: enabled

The prompt_caching: enabled field is the channel's claim — pre-configured for the named providers. Verify on your own setup by checking the dashboard's analytics tab. If cached reads are zero, the field isn't being respected by the provider.

4. Verify the model swap.

In an active Hermes session, type /model mimo-v2-pro. The model dropdown should show the new model. Run a one-line test prompt. If the response lands, the BYOK wiring is correct.

5. Verify the cache hit rate.

Run a multi-turn conversation with a long system prompt (e.g. 5,000 tokens). Check the dashboard's analytics tab. The cached reads should be a substantial fraction of the input — typically 80%+ for a system prompt that doesn't change. If the cache hit rate is low, check the provider's documentation for the cache-enable flag.

6. Set the orchestrator and executor to the same free provider.

For the $0/month stack:

orchestrator:
  model: mimo-v2-pro
executor:
  model: mimo-v2-pro
auxiliary:
  - model: gemini-2.5-flash  # default in Hermes
  - model: gemini-3-flash  # for web search

7. Run a high-volume task.

Document processing, batch transforms, skill-generation sweeps. Note the success rate, the cache hit rate, and the request count. The Mavis verifier pattern (see Course 2 §2.3) catches the bad outputs.

8. Monitor and migrate.

Set a calendar reminder for the end of the Mimo free period. Test the backup model (Minimax M2.7) on the same high-volume task while Mimo is still free. The transition is smoother if the backup is already validated.

Try it yourself

The hands-on goal: stand up a $0/month Hermes Agent stack on a VPS, with Mimo V2 Pro (or MiniMax / Z.AI) wired in as the BYOK provider, and confirm the prompt caching claim.

  1. Pick a free-tier provider. Mimo V2 Pro (cheapest during the free period), MiniMax (best Mavis integration), or Z.AI (best for code-heavy workloads). Sign up for the free tier, get the API key.
  2. Install Hermes Agent on a $3/mo VPS. See Course 3 §3.2 for the install guide. Patch the two fresh-VPS gotchas (sudo apt update, say "no" to the OpenClaw import prompt) before you paste the install command.
  3. Wire the BYOK pattern. In Hermes, navigate to Settings → Models and paste the provider's API key. Select the provider from the model dropdown. Confirm the model name appears in the chat interface.
  4. Verify prompt caching is on. Run a multi-turn conversation with a long system prompt. Check the per-model token breakdown in the dashboard's analytics tab. Cached reads should be a substantial fraction of the input — if not, check the provider's documentation for the cache-enable flag.
  5. Set the orchestrator and executor to the same free provider. For the $0/month stack, route both to Mimo V2 Pro. Add Gemini 3 Flash as the auxiliary.
  6. Run a high-volume task. Document processing, batch transforms, skill-generation sweeps. Note the success rate, the cache hit rate, and the request count.
  7. Add the Mavis verifier. Workers on Mimo (free), verifier on Gemini 3 Flash (also free). The verifier catches the bad Mimo output, doubles the effective reliability for $0.
  8. Plan the transition. Test your backup model (Minimax M2.7, GPT 5.4) on the same task while Mimo is still free. The transition is smoother if the backup is already validated.

Common pitfalls

  • Plugging in a paid provider without checking the free tier first. MiniMax, Z.AI, and Mimo all have free tiers. If you're paying for the paid tier, you're missing the whole point of the BYOK pattern.
  • Assuming "free" means "unlimited." The free tiers have rate limits. Mimo's free period is a promotional subsidy; MiniMax's free tier is bundled with the Mavis plan; Z.AI's free tier is a teaser for the paid coding plan. Read the rate-limit documentation before you start a 24/7 cron job.
  • Forgetting to enable prompt caching. The channel's claim is that prompt caching is "already configured" for the named providers. The structural guarantee is provider-specific — verify on your own setup by checking the analytics tab. If cached reads are zero, caching isn't on.
  • Routing the orchestrator to a free model and the executor to a paid one. This inverts the channel's recommendation. The orchestrator is the planning slot, and a smart model is worth paying for there. The executor is the high-volume slot, and a free model is the right tool. The $0/month stack runs both on free; the transition plan moves the orchestrator to GPT 5.4 and the executor to Minimax M2.7.
  • Not using the Mavis verifier pattern. The verifier is what makes 55% acceptable. Without it, the bad outputs ship. Wire the verifier on a different model (Gemini 3 Flash, also free) and you've doubled the reliability for $0.
  • Trusting the 55% first-pass reliability for production-critical work. The verifier catches the bad outputs, but the verifier is itself a model. For code that handles money, auth, or production data, the final review pass on GPT 5.4 or Opus is non-negotiable. The free tier is the executor, not the auditor.
  • Skipping the migration plan. Mimo V2 Pro's free period will end. The migration is smoother if you've already tested the backup model (Minimax M2.7, GPT 5.4) on the same workload. Don't discover that your fallback doesn't work the day Mimo goes paid.
  • Locking in tooling that only works with one provider. Skills carry over, but skills that depend on provider-specific tool-call behaviour don't. Keep the integration thin — provider-portable skills are an investment, Mimo-specific skills are a sunk cost.
  • Not verifying the cache hit rate. Prompt caching is the reason V4 Flash is the highest-consumed token on Hermes. If your cache hit rate is low, you're paying full input-token rates for what should be cached reads. The dashboard's analytics tab is the place to check.
  • Reading "BYOK" as a magic unlock. BYOK is a wiring pattern, not a capability upgrade. The model still has to do the work. The free tier is enough for high-volume workloads, but a one-shot complex task may need a paid model. Match the model to the workload.

Sources

  • Hermes vs OpenClaw: Why Everyone Is Migrating — 6,116 views · video_id: 2NbfOOD2i1E · cited: BYOK pattern, MiniMax / Z.AI / Xiaomi Mimo as named providers, prompt caching pre-configured, 15-turn self-evolution on by default, request counts explicit, Claude Code BYOK caveat
  • Minimax Mavis: The BEST Multi-Agent Platform for Beginners — 30,626 views · video_id: 86UIZVWkvF8 · cited: MiniMax as Mavis substrate, $10/mo entry tier, text/image/video bundled token plan, verifier pattern without shared conversation history
  • Xiaomi MiMo V2 Pro Review: FREE AI Model That Rivals Claude Opus?video_id: liSNV7kPnYg · cited: Xiaomi Mimo free period, News Portal / Kilo.ai / OpenRouter entry points
  • DeepSeek v4 Flash + Hermes Agent = Surprisingly STRONG — 4,893 views · video_id: s3Q9hvdlrmo · cited: prompt caching as the reason V4 Flash is the highest-consumed token, cached reads 5–10x cheaper than uncached input
  • Top AI Models for Hermes Agent (Tier List) — 8,107 views · video_id: Af7Fg1m7hRw · cited: auxiliary slot, Mimo V2 Pro as high-volume king, Gemini 2.5 Flash as default baked into Hermes
  • Best Model for Openclaw (WildClaw Benchmarks!) — 4,574 views · video_id: 31Ij4Cum5tg · cited: WildClaw numbers, Mimo V2 (Xiaomi) free extended access for ~6 days via Kilo Code
  • Supabase querySELECT video_id, title, views, summary_content, summary_key_takeaways FROM public.videos WHERE video_id = ANY(ARRAY['2NbfOOD2i1E','86UIZVWkvF8','liSNV7kPnYg','s3Q9hvdlrmo','Af7Fg1m7hRw','31Ij4Cum5tg']); against project ttxdssgydwyurwwnjogq. The "prompt caching is already configured" claim, the "MiniMax / Z.AI / Xiaomi Mimo" provider list, the "15-turn self-evolution" mechanic, and the "Anthropic doesn't tell you how many credits you've burned" contrast are all sourced from the summary_content and summary_key_takeaways columns of the relevant rows.
  • public.ai_models — confirmed rows: xiaomi-mimo (Mimo V2 Pro, vendor Xiaomi), minimax (MiniMax M2.7), minimax-m2-5 (MiniMax M2.5), glm-5-1 (Zhipu AI / Z.AI). The pricing_info column is null for every row pulled — the free-tier claims in this article come from the video transcripts and the Mimo V2 Pro review.
  • public.ai_updates — searched 2026-06-17 with title ~* '(hermes|byok|prompt caching|mimo|minimax|z.ai)' against the ai_updates table. The Hermes v0.12.0 "The Curator" release (AI Briefing 2026-05-01) mentions "4 new providers" as part of autonomous skill maintenance, which is the structural reason prompt caching ships pre-configured for new providers. The specific BYOK provider list is in the video transcripts.