Hermes Kanban: durable, retryable, inspectable tasks - Sub-agents & the Kanban

The first two articles covered the primitive (sub-agents) and the pattern (orchestrator-and-workers). This article is the infrastructure that makes the pattern production-grade: the Hermes Kanban. The Kanban is the channel's answer to the question every multi-agent user hits: how do I know my workers actually did the work, and what happens when they fail? The answer is structural — tasks live on a board with named-profile workers, every task has a parent-child dependency, the worker logs are first-class, and a failed task retries automatically until it succeeds.

The load-bearing feature is the parent-child retry loop. A normal orchestrator notifies you when a run fails; the Kanban retries. The channel has logged a build that needed 6 runs and a separate test that needed 81. That is the killer feature — it is what makes the Kanban the workhorse orchestrator for multi-agent work, and it is the reason this article anchors on it.

What you'll learn

The Kanban is not just a UI — every assignee is a named profile under ~/.hermes/profiles/<name>/ with its own SOUL.md and config, and the parent agent's API key is not inherited. You set inference_provider and the key on every profile by hand.
The setup is a four-step sequence: hermes update, init the Kanban DB, hermes gateway start, then hermes profile create <role> for every assignee. On older builds, these commands do not exist.
Finished reports live in ~/hermes/kanban/, not ~/hermes/profiles/ — the worker logs and the final markdown file land in different trees, and the dashboard only flips to done on a successful write.
The Kanban's killer feature is the parent-child retry loop. A normal orchestrator notifies you when a run fails; the Kanban retries. The creator logged a build that needed 6 runs and a separate test that needed 81.
The worker logs are the audit trail. Every retry, every failure, every successful run is logged. Before assuming a pipeline is broken, read the worker logs.
The Kanban is durable across gateway crashes — child tasks wait for the parent, and the parent retries on next dispatch. A failed pipeline picks up where it left off.

The Kanban, in one diagram

The Kanban is a live dashboard inside the Hermes runtime where multiple named agents — each with its own role — collaborate on a project. The contrast the creator draws in the source video is the load-bearing framing (R_aLVXYzDac):

A standard orchestrator's sub-agents are disposable workers that hand in their homework and disappear. A Kanban profile is a persistent, configurable agent living under ~/.hermes/profiles/<name>/ with its own SOUL.md and a config file. They are not the same thing.

The structural difference: disposable sub-agents have no memory across tasks, no persistent role, no retry history. Kanban profiles have all three. The parent-child retry loop is what makes the "all three" meaningful — a worker that failed 5 times is still a worker, with its 5 failures in the log and a 6th attempt queued.

The setup sequence

The four-step setup (R_aLVXYzDac):

Run hermes update. The release was ~8 hours old at recording, so every command below is gated on first running hermes update. On older builds, the commands do not exist.
Initialise the Kanban DB. This creates the SQLite store for tasks, worker logs, and the parent-child dependency graph.
Start the gateway with hermes gateway start. The gateway is the runtime that dispatches parent tasks to child workers. On VPS, wrap it in a systemd service so it survives logouts (see §7.4).
Create a profile per assignee. Run hermes profile create <role> (e.g. researcher, backend-dev) for every role on the board.

The friction point the creator flagged on stream (R_aLVXYzDac): the researcher profile has no inference provider configured out of the box and does not inherit the parent agent's API key. You set inference_provider and the API key on every profile by hand. The ~/.hermes/profiles/researcher/.env file is where the key lives.

Resuming the right session

The TUI resume command is hermes --profile <name> --resume <session_id> (R_aLVXYzDac). The default hermes sessions list only shows the main agent's sessions — a subtle but common confusion. If you do not specify --profile <name>, you are looking at the wrong session list.

Where the artefact lives

Finished reports save to ~/hermes/kanban/, not ~/hermes/profiles/ (R_aLVXYzDac). The worker logs and the final markdown file land in different trees. The creator initially looked in the wrong folder and accused the agent of gaslighting him. The dashboard only flips to done on a successful write, so an empty/missing file in the profiles tree does not mean the run failed — the artefact is in the kanban tree, not the profile tree.

The four use cases, numbered Story 1–4

The creator enumerates the patterns (R_aLVXYzDac):

Story 1 — solo dev shipping a feature (one assignee; what was demoed in the source video). Beginner default. One worker, one task, one artefact. The retry loop is the safety net.
Story 2 — fleet farming (multiple schemas × multiple assignees). Same pattern, scaled. Multiple workers, multiple tasks, all writing to the same schema.
Story 3 — multi-role pipeline with retries — described as "sort of N8N territory" with inspectable nodes. This is the cron + Kanban pattern from §7.4.
Story 4 — circuit breaker / crash recovery — the parent-child dependency graph means a crashed pipeline picks up where it left off, not from scratch.

The recommendation: stay on Story 1 until it works twice in a row, then graduate. The "spin up the Kanban and watch" hands-off flow belongs to Story 3, not Story 1.

The parent-child retry loop: the killer feature

This is the anchor of the article. The Kanban's core advantage over a normal orchestrator/sub-agent flow is the parent-child dependency. If a run fails midway, the Kanban retries (fKoPRL0dhyk).

The 6-run Space Shooter build. The creator logged a Space Shooter game build that took 6 runs before he terminated it. A vanilla orchestrator would have notified him and stopped at run 1. The Kanban kept retrying — the worker logs show 6 attempts, with the failure modes on each one, and the 6th run was the one that produced the artefact (fKoPRL0dhyk).

The 81-run test. The creator remembers a separate test that hit 81 runs before succeeding. The test was a deliberately-hard integration that pushed the worker past its normal failure modes. On a normal orchestrator, the 2nd failure would have produced an alert; on the 3rd, the alert would have been escalated; by the 81st, the workflow would have been manually terminated. The Kanban's parent-child retry loop just kept going — the 81st run produced the artefact, and the worker logs show the 80 failure modes that preceded it (fKoPRL0dhyk).

The structural point: the retry loop is not "try the same thing 81 times." The Kanban's retry mechanism is parameterised — the parent task tracks which sub-tasks succeeded, which failed, and what the failure mode was. The next retry attempt is informed by the prior failures. The 81-run test was not 81 identical attempts; it was 81 progressively-informed attempts that converged on a solution.

The audit trail

The retry history is in Worker Logs in the Kanban UI (fKoPRL0dhyk). The full history lives there — every retry, every failure, every successful run. The channel's rule: check Worker Logs before assuming a pipeline is broken. The default workflow on a long-running Kanban is to let it run overnight and check the logs in the morning; a pipeline that ran 12 times overnight and succeeded on the 12th is a successful pipeline, not a broken one.

The relationship to disposable sub-agents

The retry loop is what makes the Kanban fundamentally different from disposable sub-agents. A disposable sub-agent fails once and is gone; the orchestrator gets a failure notification and has to decide whether to spawn a new one. A Kanban profile fails and retries; the parent task tracks the retry, the worker logs preserve the failure history, and the retry attempt is informed by the prior failure.

The two patterns compose. A multi-agent workflow that uses both — orchestrator-and-workers from §7.2 + Kanban profiles from this article — gets the planning/synthesis benefits of the orchestrator pattern and the durability/retry benefits of the Kanban. That is the production-grade multi-agent setup the channel is converging on.

The example report

The source video's demo used a single researcher assignee to map the AI funding landscape (R_aLVXYzDac). The worker pulled live TechCrunch articles, ran 14 attempts (the first 7 crashed), and produced a structured markdown report dated May 4, 2026. The report saved to ~/hermes/kanban/, not ~/hermes/profiles/.

The "first 7 crashed" is the example of the retry loop in action. A vanilla orchestrator would have shown 7 failed attempts and given up; the Kanban showed 7 failed attempts, queued an 8th, the 8th crashed, queued a 9th, …, and the 14th attempt produced the artefact. The Worker Logs show all 14 attempts.

NOTE: the example report is dated May 4, 2026 in the source video. If you are reading this later, treat that as a snapshot of the demo run, not a feature guarantee.

The "vibe-coded slop" fix, applied to the Kanban

The Kanban is a structural fix for the "vibe-coded slop" problem from §7.1. The four sub-agent benefits from §7.1 — quality, parallelism, context optimisation, validation — all map to Kanban features:

Quality — Kanban profiles have persistent SOUL.md and config; workers don't drift across tasks
Parallelism — parent tasks dispatch child tasks in parallel; the gateway tracks the dispatch
Context optimisation — each profile has its own inference_provider and its own context window; the cheap models stay in the smart zone
Validation — the retry loop is built-in validation; a worker that produces slop is retried until it produces a clean artefact

The fifth benefit from §7.1 — reduced hallucination — is what the retry loop addresses most directly. A hallucinated URL is caught on retry when the worker tries to fetch it; a hallucinated fact is caught on the next run when the synthesis step rejects it. The Kanban's job is to keep retrying until the artefact is clean.

The profile vs. the worker: a structural distinction

The profile is not a worker in the disposable sense. From the multi-board update video (fKoPRL0dhyk):

Sub-agents "get spawned, they do their job, and then they're gone." Kanban profiles are persistent agents with memory and system prompts, not disposable sub-agents.

The distinction matters for two reasons:

Memory persists. A Kanban profile that worked on task A and succeeded has its SOUL.md and config preserved for task B. The profile's "knowledge" of what works grows across tasks. A disposable sub-agent's "knowledge" is gone the moment the run completes.
Audit is per-profile, not per-run. A Kanban profile's worker logs span every task it has ever run. A disposable sub-agent's logs are per-run; cross-task correlation requires manual assembly.

The two patterns — disposable sub-agents and persistent profiles — are different tools for different jobs. The Kanban's persistent profile is the right choice for recurring work (daily news briefing, nightly build audit, weekly competitor scan). The disposable sub-agent is the right choice for one-shot work (a single research report, a one-off content piece).

What the Kanban is not

Three things the Kanban is not, worth flagging up front (R_aLVXYzDac, fKoPRL0dhyk):

Not a chat surface. The Kanban is a task board, not a chat app. Communication between profiles happens via the parent task and the worker logs, not via messages.
Not a one-click "spin up and watch" flow on Story 1. The setup requires manual profile creation per assignee; the dashboard does not auto-create profiles.
Not a replacement for the cron layer. The Kanban is the orchestrator for the child tasks; the cron (see §7.4) is the parent. The two layers are different surfaces that often get conflated.

Try it yourself

The hands-on goal: stand up a Story 1 Kanban with a single worker, prove the parent-child retry loop, and verify the artefact lands in the right place.

Update first. Run hermes update. Confirm hermes gateway start and hermes profile create <role> are present — if not, you are on the old build.
Pick one project, not a cron. Create a dedicated workspace folder (e.g. AI News/). Do not start with a scheduled pipeline — the cron pairing is Story 3 and is covered in §7.4.
Initialise the board and gateway. Run the Kanban DB init, then hermes gateway start. On VPS, wrap the gateway in a systemd service so it survives logouts.
Create one profile by hand. Run hermes profile create researcher (or your role name). Then edit ~/.hermes/profiles/researcher/ to set inference_provider and paste in the API key — the parent agent's key is not inherited.
Smoke-test the worker. Drop one task with a single web_search-only assignee that writes a single markdown file. The first run is a schema test, not a content test. If the first 7 attempts crash (the source video saw exactly that), let the retry loop do its job.
Find the artefact. Look in ~/hermes/kanban/, not ~/hermes/profiles/. The dashboard only flips to done on a successful write, so an empty profiles tree is normal.
Read the Worker Logs. Click into the task in the Kanban UI. Confirm the full retry history is preserved. The 6-run Space Shooter build from the source video had 6 attempts in the logs; your smoke test will have however many it took.
Resume correctly. Use hermes --profile <name> --resume <session_id>. Do not use hermes sessions list — that command shows the main agent only.
Graduate slowly. Only when Story 1 works twice in a row, add a second assignee. Only when that works, move to Story 3 (cron + multi-role pipeline) — and only after the §7.4 bugs are addressed.

Common pitfalls

Skipping hermes update. If hermes kanban boards, hermes gateway start, or hermes profile create is missing, you are on the old Kanban — the multi-board UI will throw on board #2.
Assuming the parent agent's API key is inherited. Profile .env files start empty. Remove empty api_key fields from config.yaml and copy the real key into each profile .env.
Looking for the report in ~/hermes/profiles/. Artefacts land in ~/hermes/kanban/. The worker logs and the final markdown file live in different trees.
Using hermes sessions list to resume a Kanban profile. Use hermes --profile <name> --resume <session_id> — the default command only shows the main agent's sessions.
Assuming a long retry is a failure. The Kanban's killer feature is the parent-child retry loop. The creator logged a build that took 81 runs to succeed. Check Worker Logs before declaring a pipeline dead.
Treating profiles as disposable sub-agents. Profiles are persistent roles with their own soul.md and memory. Sub-agents spawn, run, and die; profiles stick around. Different model.
Skipping the smoke test. Story 1 is a schema test, not a content test. If the first run fails 7 times, the schema is wrong, not the worker. Fix the schema before iterating on the prompt.
Bolting on paid API endpoints before the smoke test runs. Get the free web_search-only researcher working first (TechCrunch etc. is enough for the default profile), then expand.
Expecting a one-click "spin up the Kanban and watch" flow on Story 1. That level of automation is Story 3. Story 1 requires manual profile creation per assignee.
Reading the 81-run retry number as a config bug. It is the intended behaviour. The Kanban's job is to keep retrying until the artefact is clean. A pipeline that took 81 runs to succeed is a pipeline that succeeded.

Sources

Hermes Agent Kanban Setup Guide (Multi-Agent Task Board) — 16,341 views · video_id: R_aLVXYzDac · cited: 4-step setup sequence, profile directory tree, artefact output path, 4 use cases (Story 1-4), 14-attempt demo, "first 7 crashed" example
Hermes Agent Kanban UPDATE: Multiple Boards Setup — 3,350 views · video_id: fKoPRL0dhyk · cited: 6-run Space Shooter build, 81-run test, Worker Logs audit trail, profile vs. sub-agent distinction
Source MD — /home/ubuntu/boxai3/docs/courses/_archive-2026-06-18/03-hermes-agent.md §3.3 (kanban-board-the-heart-of-hermes), full. Every concrete claim in this article is sourced from §3.3: the 4-step setup, the profile directory tree, the kanban artefact path, the 4 use cases, the 14-attempt demo, the 6-run Space Shooter build, the 81-run test, the Worker Logs audit trail, and the profile-vs-sub-agent distinction.
Cross-reference — Course 3 §3.3 for the same Kanban from the "what is Hermes" angle.