Teaching the agent an API (paste the docs, let it iterate) - Integrations & APIs

Subtopic 8.1 was the 6-step process at the 30,000-foot view. Subtopic 8.4 zooms in on step 4 — the moment where you hand the agent the API documentation and let it work. This is where most of the friction lives. The "find docs, get key, store in .env, save as skill" steps are mechanical. The "teach the agent" step is the one where the agent either gets it right, hallucinates a result, or tries to scrape the website instead of using the API.

The channel's framing for why this step works at all: "you don't need to understand the API — your agent does." The LLM is the API client. Your job is the docs delivery service. The 75 / 20 / 5 reliability split from §8.1 is what you're navigating here — 75% of the time the agent reads the docs and implements the call correctly first try. The 25% is where you actually do work.

This article walks through the mechanics of teaching an agent an API: the prompt structure, the docs URL vs. pasted docs decision, the iteration loop when the agent gets it wrong, and the failure modes specific to step 4 (agent skips the docs, agent hallucinates, agent tries to scrape).

What you'll learn

The prompt structure for step 4 is rigid: paste the docs URL, paste the endpoint shape, name the env var (never the value), and tell the agent the four concrete next steps. Skipping any of these drops the reliability.
Paste the docs URL, not the docs content. LLMs read URLs as well as they read text — the URL is enough to fetch the full docs page. Pasting the entire docs content is wasteful and prone to truncation. The exception is when the docs are paywalled, behind a login, or behind Cloudflare — in that case, paste the content or upload a file.
Name the env var, never the value. The prompt says "the API key is in YOUTUBE_TRANSCRIPT_API_KEY" — never "the API key is yt_live_abc123." The agent reads env vars on demand. Pasting the value defeats the entire .env discipline from §8.3.
Tell the agent the four concrete next steps. "Read the docs → make a skill → test it → save the skill." Without the explicit next steps, the agent may read the docs and stop, or implement without testing, or implement and forget to save.
The iteration loop has three failure modes — agent skips the docs, agent hallucinates a result, agent tries to web-scrape. Each has a specific fix.
The 20% "needs a correction" case is mostly small things — wrong header name, missing query parameter, wrong content type. The cure is "show me the request you made" and a one-line correction.
The 5% "needs a retry" case is "agent pretended to call the API." The cure is starting a fresh session with a more explicit prompt: "use the API, not web scraping; show me the actual HTTP request before you say you're done."

The step-4 prompt, annotated

The canonical step-4 prompt from §8.1, with annotations explaining each piece:

Hey, I want you to learn how to use the YouTube Transcript API.   ← (1) Plain-English intent

Here's the API documentation:                                     ← (2) Docs URL, not content
https://youtubetranscript.io/docs

The API endpoint is:                                              ← (3) Endpoint shape (optional but helpful)
POST https://api.youtubetranscript.io/v1/transcript

Request format:                                                   ← (4) Request body shape
{
  "video_id": "VIDEO_ID_HERE",
  "format": "text"
}

Headers:                                                          ← (5) Headers, especially auth
Authorization: Bearer YOUR_API_KEY

I've stored the API key in the environment variable              ← (6) Env var name, NEVER the value
YOUTUBE_TRANSCRIPT_API_KEY.

Please:                                                           ← (7) Four explicit next steps
1. Read up on this API documentation
2. Make a skill to fetch transcripts
3. Test it by getting the transcript for video ID "dQw4w9WgXcQ"
4. Save the skill for future use

Each numbered piece has a job:

(1) Plain-English intent. Tells the agent what the goal is, not just what to do.
(2) Docs URL. The load-bearing reference. The agent fetches this and reads it. URL > pasted content unless the docs are behind auth.
(3) Endpoint shape. Optional but helpful — gives the agent the HTTP verb and URL so it doesn't have to dig.
(4) Request body shape. The JSON schema. Saves the agent a round-trip through the docs.
(5) Headers. Especially the Authorization: Bearer line — auth is where most first-attempt failures happen.
(6) Env var name. Never the value. The agent reads env vars on demand.
(7) Four explicit next steps. Tells the agent the workflow you want — read, build, test, save. Without these, the agent may stop after reading.

The 75 / 20 / 5 split assumes all seven pieces are present. Drop the request body shape and you might be in the 20% (agent guesses wrong). Drop the env var name and you're definitely in the 5% (agent asks you for the key).

URL vs. pasted content

The default is URL. The exception is when the URL doesn't work.

Use the URL when:

The docs are on a public, no-auth page
The page loads reliably for the agent (no Cloudflare, no JS challenges)
The page is reasonably sized (under a few thousand words)

Paste the content when:

The docs are behind a login (Notion, internal wikis)
The page is blocked by Cloudflare or anti-bot measures
The agent has previously failed to fetch the URL
The docs are very large and you only need a section

Use a file when:

The docs are a PDF or HTML file you can upload
The docs are too large to paste in chat (over ~50K tokens)
You want to reference the same docs across multiple sessions

The "use a file" path is also the cure for the "agent can't access documentation" failure mode — download the docs as HTML or PDF, upload the file to the agent, and tell it to read the file.

Name the env var, never the value

This is the §8.3 anchor in the step-4 prompt. The wrong way:

The API key is "yt_live_abc123def456ghi789". Please use it.

The right way:

The API key is in the environment variable YOUTUBE_TRANSCRIPT_API_KEY. Please use it.

The two prompts look identical in effect — the agent uses the key either way. The difference is what happens next:

The wrong way pastes the value in chat. The bot has it in context now, may use it once, may forget on reset, may paste it in a Notion page or Discord log, may actively delete it. The .env discipline is broken.
The right way references the env var by name. The agent reads os.environ["YOUTUBE_TRANSCRIPT_API_KEY"] (or whatever the equivalent is in your agent's runtime). The value stays in .env, persists across sessions, doesn't leak into chat.

The host's framing in the API integration video: "AI models are trained not to reveal environment variables." That's the security property of the right way. The wrong way defeats it.

The four explicit next steps

The numbered list at the end of the prompt is non-negotiable. Without it, the agent may:

Read the docs and stop. "OK, I've read the docs. What would you like me to do?" — the agent is waiting for the next instruction.
Implement without testing. The agent builds the skill, declares it done, but never actually calls the API. The 5% hallucination case.
Implement, test once, and forget to save. The integration works in the current session but isn't persisted as a skill. Lost on context reset.
Implement the wrong thing. The agent reads the docs and picks a different endpoint or a different request shape than you intended.

The four explicit steps close all four gaps:

Read up on this API documentation — forces the read step
Make a skill to fetch [specific data] — forces implementation
Test it by getting [example] — forces the test step (catches hallucinations)
Save the skill for future use — forces persistence

The "test by getting [example]" step is especially load-bearing. Without an explicit test, the agent may not actually call the API. With an explicit test (a specific video ID, a specific query), the agent has to produce a result, and you can verify it's real.

The iteration loop

The 75 / 20 / 5 split is what you're navigating. Here's the iteration loop for each case.

The 75% case — works first try

You send the step-4 prompt. The agent reads the docs, implements the call, tests with the example video ID, returns a transcript. You verify the transcript matches the video. You send: "save this as a skill named fetch_youtube_transcript." Done.

This is the case where the prompt structure pays off. Every numbered piece was load-bearing — without any one of them, you might be in the 20%.

The 20% case — needs a correction

The agent reads the docs and tries, but the result is wrong. The wrongness is usually small:

Wrong format parameter — the agent requests format: "json" but the docs say "text". Fix: "the format parameter should be text, not json. Try again."
Wrong header name — the agent sends X-API-Key: <token> but the docs say Authorization: Bearer <token>. Fix: "use the Authorization header with Bearer <token>, not X-API-Key."
Missing query parameter — the agent forgets &include_timestamps=true. Fix: "add the include_timestamps query parameter."
Wrong content type — the agent sends the body as form-encoded when the docs want JSON. Fix: "send the body as JSON, not form-encoded. Set Content-Type: application/json."

The fix is one line each: a specific correction, not a re-prompt. The agent usually gets it right on the second try.

The 5% case — needs a retry

The agent fails in a way that requires restarting. Three failure modes:

Agent hallucinates a result. The agent returns a transcript that looks plausible but doesn't match the video. Or the agent returns an error message that doesn't match any actual API response. Fix: "show me the actual HTTP request you made and the response you got." If the agent can't produce them, start a fresh session.
Agent tries to scrape instead of using the API. The response includes a 403 Forbidden, a Cloudflare block page, or HTML content instead of JSON. Fix: "stop scraping. Use the API. The endpoint is POST https://api.youtubetranscript.io/v1/transcript and the key is in YOUTUBE_TRANSCRIPT_API_KEY."
Agent skips the docs entirely. The agent guesses at the endpoint and request shape and gets it wrong. Fix: re-send the step-4 prompt with the docs URL more prominent. Or restart with: "read this URL first, then implement: [URL]."

The fresh-session cure is underused. If the agent is confused, lost, or stuck in a hallucination loop, the fix is to start over. Context is cheap; sessions are cheap. Don't try to recover from a confused state — just restart.

Try it yourself

The hands-on goal: take a single API integration from step 4 through step 6, catch all three iteration cases (75 / 20 / 5), and confirm the saved skill survives into a new session.

Pick an API you haven't integrated yet. CoinGecko, OpenWeather, Twitter, Spotify — any HTTP API with bearer-token auth and a free tier.
Find the docs URL. Open the API's developer portal, find the "Getting Started" or "Authentication" page, copy the URL.
Get the API key. Sign up, generate a key, store in .env as SERVICE_NAME_API_KEY=value.
Send the step-4 prompt. Follow the structure from this article — plain-English intent, docs URL, endpoint shape, request body, headers, env var name, four explicit next steps.
Watch for the 75% case. If the agent returns a clean result on the first try, verify it matches reality (compare to a manual API call via Postman or curl). Then send: "save this as a skill named fetch_service_data."
Catch the 20% case deliberately. If the first try has a wrong format parameter or wrong header, send the one-line correction and watch the second try succeed. Note which piece of the prompt you could have included to avoid the correction — that's your prompt template refinement.
Catch the 5% case deliberately. Send a fresh step-4 prompt but skip the explicit test step ("test it by getting..."). Watch the agent implement without testing. Then send: "did you actually call the API? show me the HTTP request." The agent will not be able to produce one. This is the hallucination failure mode. Cure: start a fresh session with the full step-4 prompt.
Verify the saved skill. Open a new chat session. Send: "fetch [example data] using your saved skill." The agent should auto-invoke the skill without you mentioning the API name. If it asks "what API?" the skill wasn't saved properly — go back to step 5.
Audit the chat history. Open the chat log and search for the API key value. It should not appear anywhere — only the env var name should be in the conversation. This is the §8.3 audit for this specific integration.

Common pitfalls

Pasting the API key value in chat. Defeats the entire .env discipline. Always name the env var, never the value.
Skipping the explicit test step. Without "test it by getting [example]," the agent may implement and declare success without ever calling the API. The 5% hallucination case becomes 30%.
Skipping the explicit save step. Without "save the skill for future use," the working integration is gone on context reset.
Letting the agent web-scrape when the API is the right tool. If the response includes 403, Cloudflare, or HTML content, the agent is scraping. Redirect: "use the [Service] API, not scraping. The env var is X."
Trying to recover from a confused agent by prompting harder. If the agent is stuck in a hallucination loop, start a fresh session. Context is cheap.
Including too many endpoints in the step-4 prompt. Pick one endpoint for the initial integration. The agent can read the docs and learn the others later. Don't dump the entire API surface in the first prompt.
Forgetting to verify the result. "Looks plausible" is not verification. Compare to a manual API call (Postman, curl) for the canonical example. If the manual call returns X and the agent returns Y, the agent is hallucinating.
Naming the skill something generic like "api_skill." The agent's auto-invoke is keyword-triggered. A skill named fetch_youtube_transcript is more discoverable than api_skill. Name skills after what they do.
Reusing the example video ID. The agent may memorize the example and return a cached transcript. Use a real video URL you actually care about for the test.
Skipping the iteration loop documentation. When you hit a 20% case and the correction is one line, save the correction. Future-you will hit the same API and the same edge case.
Trusting the agent's "I'm done" without a verification artifact. The agent should produce a real transcript, a real price, a real weather report — something you can check against an independent source. "I'm done, the skill is saved" is not an artifact.

Sources

OpenClaw + API Guide (Step-by-Step) — few thousand views · video_id: 2YPV2OmPZyo · the spine of the article. The step-4 prompt structure, the "you don't need to understand the API, your agent does" framing, and the 75 / 20 / 5 reliability split are all sourced from this video.
OpenClaw + API Integration Guide (Step-by-Step) — video_id: not-listed (cross-reference to archived Course 21, full version used as source) · the broader walkthrough that the step-4 prompt is condensed from. Adds the "test the connection" rule and the explicit failure-mode cures.
Best Practices and "AI models are trained not to reveal env vars" — cross-referenced from Course 9: Security & Best Practices §9.1. The "name the env var, never the value" rule is the §8.3 anchor in the step-4 prompt.
Skills-as-step-6 anchor — video_id: obET69yycFc (cross-listed to §8.5). The "save the skill for future use" instruction is the handoff into §8.5.
The "first try fails, people give up" framing — sourced from the OpenClaw + API Guide video transcript. The 75/20/5 split is the operationalization of that framing.