The auxiliary slot is what the channel's tier list calls "support only" — the cheapest, most-overlooked models in the agent stack. These are the models that don't try to be the smartest. They try to be the right tool for a narrow, high-volume workload: a web search, an image analysis, a quick classification, a one-shot HTML page, a batch transform. The auxiliary slot is where the cost math gets interesting, because the workloads that don't need flagship intelligence are a much larger fraction of any agent's total work than the workloads that do.
This article walks through the auxiliary slot in the Top AI Models for Hermes Agent (Tier List) video, the four models the channel names explicitly, and the cost / performance matrix that puts them in their respective slots. The article also covers the high-volume king (Mimo V2 Pro) and the open-weight niche picks (Elephant Alpha, Trinity Large Preview), so you have the full picture of what's available in the auxiliary tier.
What you'll learn
- The auxiliary slot in the channel's three-slot model (orchestrator / executor / auxiliary) is the "support only" tier — the cheapest, most-overlooked models, often free or near-free.
- The four models the channel names explicitly: Gemini 2.5 Flash (default in Hermes), Gemini 3 Flash (free Google Search grounding and URL context reading), Mimo V2 Pro (high-volume king, free via News Portal), and the open-weight niche picks (Elephant Alpha, Trinity Large Preview).
- The cost / performance matrix: auxiliary models are free or low-cost, but they're not interchangeable. Gemini 2.5 Flash is the default for chat-adjacent tasks; Gemini 3 Flash is the right tool for web search and URL reading; Mimo V2 Pro is the right tool for high-volume document processing.
- The auxiliary slot is the right place to put a model that would be a poor choice for orchestrator or executor work. A 55% first-pass reliability is fine for an auxiliary; the orchestrator and executor demand higher.
- Hot-swapping between auxiliary models is the channel's recommended workflow. Use Mimo for high-volume execution, Gemini 3 Flash for web search, Gemini 2.5 Flash for chat, and switch mid-session with
/model. - The auxiliary slot is also the right place to put open-weight niche picks. Elephant Alpha (100B params, 256K context) and Trinity Large Preview are open-weight models that you self-host or route through OpenRouter for specific tasks.
The three-slot model: orchestrator / executor / auxiliary
The channel's three-slot model is the structural argument for why the auxiliary tier matters. The slots are:
- Orchestrator (the brain): plans multi-step work, holds state across many turns, decides which executor to call and when. Cost: $50–200+/month.
- Executor (the hands): reliably calls tools, follows formatting instructions, doesn't get clever. Cost: $10–72/month.
- Auxiliary (support): specialised tasks, niche use cases, web search, image analysis, quick classifications. Cost: often free or included.
The auxiliary slot is the cheapest, but it's not the least important. A long-running agent's workload breaks down roughly as: 20% orchestrator, 50% executor, 30% auxiliary. If the orchestrator and executor are well-chosen, the auxiliary slot is where the cost savings compound — 30% of the workload at $0/month is a real number.
The four models the channel names explicitly for the auxiliary slot, in priority order:
- Gemini 2.5 Flash — the default baked into Hermes. Check
nano config.yaml. This is the chat-adjacent auxiliary that runs by default; you don't need to configure it, just leave it on. - Gemini 3 Flash — adds free Google Search grounding and URL context reading. A replacement for the $10/month News Research built-in tools.
- Mimo V2 Pro — the high-volume king, most-used model on OpenRouter that month, free via the News API on the news portal website.
- Elephant Alpha (100B params, 256K context) and Trinity Large Preview — open-weight niche picks for self-hosting or specialised tasks.
The fifth slot, implicit in the tier list: Mimo V2 Flash (the lightweight Mimo variant), which the channel describes as "the only thing it's good at is making HTML web pages in one shot." For the narrow workload of one-shot HTML generation, Mimo V2 Flash is the right tool. For anything else, use the full V2 Pro.
The cost / performance matrix
The auxiliary slot's cost / performance matrix is the easiest to reason about, because the workloads are narrow. The channel's coverage lays out the matrix as:
| Model | Cost | Strengths | Weaknesses | Best For |
|---|---|---|---|---|
| Gemini 2.5 Flash | Free (baked in) | Default, fast, reliable | Limited capability | Chat-adjacent tasks |
| Gemini 3 Flash | Free | Google Search grounding, URL reading | Not a primary model | Web search, URL context |
| Mimo V2 Pro | Free (during promo) | High-volume, agentic-trained | 55% reliability, free period ends | Document processing, batch |
| Mimo V2 Flash | Free (during promo) | One-shot HTML | Narrow use case | HTML generation |
| Elephant Alpha | Free (open-weight) | 100B params, 256K context | Self-hosting required | Privacy-critical, niche |
| Trinity Large Preview | Free (open-weight) | Open-weight | Niche | Specific workflows |
The matrix's structural argument: at the auxiliary tier, all of the recommended models are free. The cost-per-success math is dominated by the verifier pattern (see §7.2) — workers run on a free model, verifier runs on a different free model, and the auxiliary slot's effective reliability is the product of the two.
The hot-swap workflow, in practice:
# Chat-adjacent tasks (default)
/model gemini-2.5-flash
# Web search and URL reading
/model gemini-3-flash
# High-volume execution
/model mimo-v2-pro
# Niche task with an open-weight model
/model elephant-alpha
The /model mid-session hot-swap mechanic (see Course 3 §3.1) is what makes the auxiliary slot practical. The session state survives the swap, so you can run a high-volume task on Mimo, switch to Gemini 3 Flash for a web search, and continue the original task without losing context.
Mimo V2 Pro as the high-volume auxiliary
The Mimo V2 Pro story is covered in detail in §7.1, but the auxiliary-slot framing is worth restating. The channel's coverage positions Mimo V2 Pro as the auxiliary model for high-volume workloads:
- Document processing: large PDFs, reports, documentation. The model is trained for agentic use cases, so tool calls integrate cleanly with Hermes' skill registry.
- Batch operations: processing many similar items. The 55% first-pass reliability is fine if the verifier pattern catches the bad outputs.
- Skill building: generating reusable Hermes skills. The self-evolving skill loop means skills carry over to future sessions and survive model switches.
- Testing: trying agentic workflows on real workloads without paying for GPT or Opus.
- High-volume tasks: thousands of operations. The cost savings compound at scale.
- Non-critical work: tasks where failures are acceptable.
The auxiliary slot is the right place for Mimo V2 Pro because the workloads are narrow enough that 55% is enough, and the cost is $0 during the free period. For orchestrator or executor work, 55% is too low. For auxiliary work, it's fine.
Gemini 2.5 Flash as the default
Gemini 2.5 Flash is the default baked into Hermes. The channel's framing: "check nano config.yaml" — it's already configured, you don't need to set it up. The role of Gemini 2.5 Flash is the chat-adjacent auxiliary: small talk, quick classifications, simple JSON formatting, anything that doesn't need a smart model but does need a model.
The advantage of Gemini 2.5 Flash as the default is reliability. It's not the smartest model, but it's consistent, fast, and free. For workloads that don't need flagship intelligence, Gemini 2.5 Flash is the right tool. The channel's recommendation: don't switch it out unless you have a specific reason.
Gemini 3 Flash for web search and URL reading
Gemini 3 Flash is the auxiliary slot's web search and URL context reader. The channel's coverage: "adds free Google Search grounding and URL context reading, a replacement for the $10/month News Research built-in tools." The structural reason this is in the auxiliary slot: web search and URL reading are narrow, high-volume tasks that don't need flagship intelligence. The right tool is a free model with the right capabilities.
The Gemini 3 Flash workflow, in practice: the orchestrator (GPT 5.4 or Mimo) decides that a web search is needed; the agent hot-swaps to Gemini 3 Flash; the search runs; the result is fed back to the orchestrator. The cost is $0 for the search, and the orchestrator stays on the model that planned the task.
Elephant Alpha and Trinity Large Preview
The open-weight niche picks are the auxiliary slot's escape hatch for self-hosting and privacy-critical workflows. The channel's coverage:
- Elephant Alpha: 100B parameters, 256K context. Self-hostable behind a privacy wrapper. The right tool for workflows where data residency or privacy is a hard requirement.
- Trinity Large Preview: open-weight, niche. The right tool for specific workflows where the model is a particularly good fit.
The structural argument for open-weight auxiliary models: the auxiliary slot is the cheapest tier, and open-weight models are the cheapest option. If you can self-host, the cost is hardware only — no per-token fees, no rate limits, no API keys to manage. The trade-off: you're responsible for the model serving, which is its own operational burden.
The channel's overall recommendation: start with Gemini 2.5 Flash + Gemini 3 Flash + Mimo V2 Pro as the free-tier stack. Add Elephant Alpha or Trinity Large Preview only if you have a specific workflow that needs an open-weight model.
The self-hosting decision
The open-weight auxiliary models raise the self-hosting decision: is the operational burden of running a model server worth the cost savings and the privacy benefits? The channel's coverage doesn't go deep on the self-hosting workflow for Elephant Alpha or Trinity Large Preview (these are auxiliary models, and the channel's focus is on the orchestrator and executor slots). The high-level trade-off:
- Self-hosting wins for: data-residency requirements (EU, regulated industries), high-volume workloads where the per-token cost of a hosted model would dominate, workflows where the model's behaviour needs to be auditable in-house.
- Self-hosting loses for: low-volume workflows (the hardware cost is fixed, and a $3/mo VPS is cheaper than a GPU box), workflows where the model's behaviour is the vendor's responsibility (uptime, scaling, security patches), workflows where the team doesn't have ML ops capacity.
The hardware requirements for Elephant Alpha (100B params) are non-trivial: at 4-bit quantization, the model is ~50GB, which means a single A100 80GB or two A6000s. The cost of that hardware is $1,000–$2,000/month on a cloud GPU provider, or a $10,000+ upfront capex on owned hardware. Trinity Large Preview is smaller, but the same trade-off applies. The channel's recommendation: don't self-host an open-weight model just to save the auxiliary tier's $0/month cost. Self-host only if you have a specific privacy or volume requirement that justifies the operational burden.
The Nemotron 3 Super and Step 3.5 Flash alternates
The channel's full tier list (covered in Course 20: AI Model Tier Lists & Comparisons) names two more auxiliary candidates that don't appear in the Hermes-tier-list auxiliary section:
- Nemotron 3 Super: NVIDIA's open-weight model. Strong for coding agents, 128K context, self-hostable behind a Nemo Claw privacy wrapper. The right tool for "pro developers, privacy-critical workflows" per the channel's tier list.
- Step 3.5 Flash: open-source, free, good for reinforcement learning (RL) workflows. The right tool for "self-improvement workflows, RL environments" per the channel's tier list. Limited general-purpose use.
The structural argument for adding these to the auxiliary stack: the auxiliary slot is broad enough to support multiple models, and the right tool for a given task depends on the workload. A long-running RL training workflow wants Step 3.5 Flash. A privacy-critical coding agent wants Nemotron 3 Super. A high-volume document processing workflow wants Mimo V2 Pro. The auxiliary slot is a collection, not a single pick.
The channel's overall recommendation: start with the three free-tier hosted models (Gemini 2.5 Flash, Gemini 3 Flash, Mimo V2 Pro), then add open-weight models as specific workflows demand. Don't self-host speculatively.
The 55% reliability and the auxiliary slot
The auxiliary slot is the right place for a 55% first-pass reliability model. The structural argument: the auxiliary slot's workloads are narrow and high-volume, and the cost of a 55% miss is small. A 55% miss on a web search is "the search didn't return the right result, run it again." A 55% miss on a one-shot HTML generation is "the page didn't render right, run it again." A 55% miss on a document classification is "the wrong label, run it again." All of these are recoverable. None of them are load-bearing.
The orchestrator and executor slots don't have the same property. A 55% miss on an orchestrator's planning step is "the agent made a wrong decision and the entire workflow is now off-track." A 55% miss on an executor's tool call is "the agent called the wrong tool and the user sees a broken workflow." These are not recoverable at the same cost.
The auxiliary slot is also the right place for the Mavis verifier pattern (see Course 2 §2.3). Workers run on a free auxiliary model, verifier runs on a different free auxiliary model, and the output only ships if both pass. The cost is $0, and the effective reliability is higher than any single model in the slot.
Tier list cross-reference
The channel's full tier list (from the Top AI Models for Hermes Agent (Tier List) video, also covered in Course 20: AI Model Tier Lists & Comparisons) organises the auxiliary slot as:
- Tier C — Specialized/Auxiliary (the channel's published tier labels): Gemini 3 Flash, Step 3.5 Flash, Nemotron 3 Super, Trinity Large Preview, Elephant Alpha
- Tier B — Solid Budget Options (with auxiliary uses): Mimo V2 Pro (Xiaomi), Minimax M2.7
The structural argument: Mimo V2 Pro is the high-volume king, and it's borderline between Tier B and Tier C. For the auxiliary slot specifically, it's the right pick for high-volume document processing. For the executor slot, Minimax M2.7 is the right pick for general execution work.
The Best Model for Openclaw (WildClaw Benchmarks!) coverage adds the WildClaw numbers to the auxiliary slot:
- Mimo V2 Pro: ~55% success rate, $26/run (when paid), 500 min wall-clock
- Minimax M2.7: ~45% success rate, $8/run, 500 min wall-clock
- Grok: ~40% success rate, $15/run, 94 min wall-clock (the speed outlier)
The structural read: Mimo V2 Pro is the auxiliary pick for high-volume reliability, Minimax M2.7 is the executor pick for cost, and Grok is the speed pick when latency matters more than quality.
Try it yourself
The hands-on goal: stand up a $0/month auxiliary stack on your Hermes Agent, with each auxiliary model wired in for its specific workload, and confirm the hot-swap mechanic works.
- Verify Gemini 2.5 Flash is the default. Check
nano config.yamlin your Hermes install. Gemini 2.5 Flash should be the default auxiliary. If it's not, add it. - Wire Gemini 3 Flash for web search. Add Gemini 3 Flash to the model list. Configure a skill that hot-swaps to Gemini 3 Flash when a web search or URL read is needed.
- Wire Mimo V2 Pro for high-volume work. Add Mimo V2 Pro to the model list. Configure a skill that routes high-volume document processing to Mimo.
- Test the hot-swap mechanic. In an active Hermes session, type
/model gemini-3-flashto switch to web search mode. Run a search. Switch back with/model gemini-2.5-flash. Confirm the session state survives the swap. - Add the Mavis verifier. Workers on Mimo, verifier on Gemini 3 Flash. The two models form a free verifier pattern.
- Test the 55% reliability on a real workload. Run a batch of 100 document classifications on Mimo. Count the correct ones. The number should be near 55% — that's the channel's measurement, and it's the one to trust.
- Add an open-weight model if you have a self-hosting setup. Elephant Alpha or Trinity Large Preview. Route a niche task to the open-weight model. Confirm the integration works.
- Monitor the cost. The auxiliary stack should be $0/month. If you see charges, check that you didn't accidentally route a task to a paid model.
Common pitfalls
- Routing a load-bearing workflow to an auxiliary model. The auxiliary slot is for narrow, high-volume, recoverable tasks. Production-critical code, money-handling workflows, anything that affects user trust — keep those on the orchestrator or executor slots.
- Trusting the 55% reliability for one-shot tasks. Auxiliary models are fine for high-volume recoverable workloads. For a single critical decision, the 55% miss rate is too high. Use a higher-tier model.
- Skipping the verifier pattern. A 55% first-pass reliability is acceptable only if the verifier catches the bad outputs. Without the verifier, the 45% ships.
- Switching out Gemini 2.5 Flash without a reason. It's the default for a reason. It's consistent, fast, and free. Don't replace it with a paid model unless you have a specific workload that needs more.
- Treating Mimo V2 Pro as a permanent free tier. The free period will end. Test the backup model (Minimax M2.7, GPT 5.4) on the same workload while Mimo is still free.
- Using Mimo V2 Flash for anything except one-shot HTML. The channel's blunt verdict: "the only thing that it's good at is making HTML web pages in one shot." For any other workload, use Mimo V2 Pro or another auxiliary.
- Self-hosting an open-weight model without a privacy use case. The operational burden of self-hosting is real. If you don't have a data-residency or privacy requirement, the free-tier hosted models (Gemini 2.5 Flash, Gemini 3 Flash, Mimo V2 Pro) are easier.
- Not using the hot-swap mechanic. The auxiliary slot's value comes from picking the right model for each task. If you're running one model for the entire workflow, you're not using the auxiliary slot — you're using a single model with extra steps.
- Ignoring the cache hit rate. Prompt caching is the reason the auxiliary stack is $0. If the cache hit rate is low, you're paying full input-token rates for what should be cached reads.
- Reading "auxiliary" as "less important." The auxiliary slot is the cheapest tier, not the least important. 30% of a long-running agent's workload is in this slot, and the cost savings compound at scale.
Sources
- Top AI Models for Hermes Agent (Tier List) — 8,107 views ·
video_id: Af7Fg1m7hRw· cited: auxiliary slot definition, Gemini 2.5 Flash as default baked into Hermes (nano config.yaml), Gemini 3 Flash with free Google Search grounding and URL context reading, Mimo V2 Pro as the high-volume king and most-used model on OpenRouter, Elephant Alpha (100B params, 256K context) and Trinity Large Preview as open-weight niche picks, Mimo V2 Flash good only at one-shot HTML - Best Model for Openclaw (WildClaw Benchmarks!) — 4,574 views ·
video_id: 31Ij4Cum5tg· cited: Mimo V2 Pro 55% success rate, $26/run cost, 51% Opus / $80 cost, GPT 5.4 close second, Mimo V2 (Xiaomi) free extended access, Grok 94 min vs ~500 min - AI Model Tier List for Agentic Workflows (April 2026) —
video_id: kOZzRRQHqR8· the full auxiliary + executor + orchestrator ranking - DeepSeek v4 Flash + Hermes Agent = Surprisingly STRONG — 4,893 views ·
video_id: s3Q9hvdlrmo· cited: V4 Flash as the highest-consumed token on Hermes, prompt caching as the explicit reason - Hermes vs OpenClaw: Why Everyone Is Migrating — 6,116 views ·
video_id: 2NbfOOD2i1E· cited: BYOK pattern, MiniMax / Z.AI / Xiaomi Mimo as named free-tier providers, 15-turn self-evolution on by default - Supabase query —
SELECT video_id, title, views, summary_content, summary_key_takeaways FROM public.videos WHERE video_id = ANY(ARRAY['Af7Fg1m7hRw','31Ij4Cum5tg','kOZzRRQHqR8','s3Q9hvdlrmo','2NbfOOD2i1E']);against projectttxdssgydwyurwwnjogq. The auxiliary slot definition, the "Mimo V2 Pro is the high-volume king" claim, the "Gemini 2.5 Flash is the default baked into Hermes" claim, and the WildClaw numbers are all sourced from thesummary_contentandsummary_key_takeawayscolumns of the relevant rows. public.ai_models— confirmed rows:xiaomi-mimo(Mimo V2 Pro, vendor Xiaomi),gemini-2-5-flash(Google),gemini-3-flash(Google),elephant-alpha(open-weight, 100B params),trinity-large-preview(open-weight). The auxiliary slot's model names cross-match these rows. Thepricing_infocolumn isnullfor every row pulled — the "free" claim for Gemini 2.5 Flash, Gemini 3 Flash, and Mimo V2 Pro comes from the video transcripts, not from the DB.public.ai_updates— searched 2026-06-17 withtitle ~* '(auxiliary|gemini flash|mimo|tier list|wildclaw)'against theai_updatestable. The closest match isAI Briefing 2026-04-24(Hermes v0.11.0 release notes), which mentions the auxiliary model list expansion as part of the v0.11 release arc. The specific auxiliary slot definition and the Mimo V2 Pro placement are in the video transcripts.