In earlier chapters, you saw the components that make up a Hermes agent: profiles, skills, memory, toolsets, gateway, and cron. You understand what each piece does in isolation. But what actually happens when you send a message to your agent? How does a prompt become an action, and how does that action become a result you can use?
The answer is the agent loop. Every Hermes agent runs the same fundamental cycle: assemble context, call the model, pick a tool, execute it, feed the result back, and repeat. This loop is what separates an agent from a chatbot. A chatbot generates text and stops. An agent generates text, acts on it, learns from the result, and keeps going until the task is done.
Understanding this loop matters because it determines what your agent can actually do, where things can go wrong, and where you need to step in with an approval before the agent proceeds.
When you send a message to a Hermes agent — whether through the terminal, Slack, Telegram, or any other channel — the following cycle begins:
The agent gathers everything it needs before calling the model: your profile's system prompt, relevant skills, memory documents, context files from the working directory, and the full conversation history. This becomes the input to the model.
The assembled prompt is sent to your configured model provider (OpenAI, Anthropic, OpenRouter, or any compatible endpoint). The model processes the prompt and returns a response.
If the model decides it needs to take an action — search the web, read a file, run a command — it returns a tool call instead of (or alongside) plain text. The agent examines the model's response for tool calls.
The agent runs the requested tool. It passes the tool's input parameters, executes it, and collects the output. If the tool is marked as dangerous (like running a shell command), the approval system checks whether it can proceed.
The tool's output is injected back into the conversation as a new message. The agent now has the tool result in its context, alongside the original prompt and prior turns.
The agent sends the updated conversation back to the model. If the model wants to call another tool, the cycle repeats. If the model returns only text — no tool calls — the loop ends, and the agent sends the response back to you.
This cycle — observe (assemble context), think (call model), act (run tool), repeat — is the engine behind every Hermes agent. A simple question might complete in a single pass. A complex task like researching keywords and saving the results might take several loops through the cycle before finishing.
A tool call is the moment an agent moves from generating words to doing something. When the model decides it needs information it does not have — or needs to take an action it cannot perform with text alone — it returns a structured tool call instead of a plain text response.
In Hermes, tools are organized into . Each toolset contains related tools with shared dependencies. The web search toolset includes tools for searching, fetching pages, and reading results. The file toolset includes tools for reading, writing, and searching files. You control which toolsets each profile can access — a research specialist gets web search and file writing; an SEO manager gets file reading and writing but not shell execution.
When the model returns a tool call, the Hermes runtime handles it. It checks whether the requested tool is available to the current profile, passes the input parameters, runs the tool, and returns the output. The model never executes tools directly — the runtime acts as the intermediary. This separation is what makes approvals and security checks possible.
Not every tool call runs immediately. Some actions are considered dangerous — running a shell command, deleting files, or making network requests to unfamiliar endpoints. These actions go through the .
Hermes has three approval modes:
When the agent wants to run a dangerous command, it pauses and asks you for permission. You approve or deny the action. If you do not respond within the timeout period (60 seconds by default), the action is denied. This is the safest mode and the one you should start with.
An auxiliary LLM evaluates the dangerous command and decides whether it is safe to run. This reduces the number of manual interruptions, but it adds a second model call per approval — which costs more and can occasionally allow a risky action that a human would have caught.
All dangerous commands run without human review. Even in this mode, a hardline blocklist prevents the most destructive actions (like deleting system directories or running fork bombs). But most dangerous commands proceed without checks. Only use this mode if you fully trust the agent's tool access and context.
There is also a YOLO mode that bypasses all approval layers. Even in YOLO mode, the hardline blocklist stays active — it blocks actions that could damage the system (like rm -rf /). But every other dangerous command proceeds without review. YOLO mode is useful for fully trusted automation scripts. It is dangerous for interactive sessions where the agent might misinterpret a prompt and take an irreversible action.
The approval system is part of a larger defense-in-depth model with seven layers — user authorization, dangerous command approval, container isolation, credential filtering, context file scanning, cross-session isolation, and input sanitization. You do not need to understand all seven layers to start using Hermes. But you should know that approvals default to the safest setting, and that you control how much autonomy the agent has.
Every turn in the agent loop adds to the conversation history. The model sees the full history — your messages, the agent's responses, tool calls, and tool results — every time it is called. This is how the agent maintains context across turns within a single session.
But conversation history has a limit. Every model has a context window — a maximum amount of text it can process in a single call. As the session grows longer, the history takes up more of that window. Eventually, the agent runs into the limit: the model cannot process the full history plus the current prompt plus tool results all at once.
Hermes handles this through session compression. When the conversation exceeds a threshold, the agent summarizes earlier turns into a compact form, preserving the key decisions and results while discarding the full text. The compressed history replaces the original turns, freeing up context window space for new work.
This is not memory. Memory persists across sessions (stored in files like MEMORY.md). Session history exists within a single session and is compressed to stay within the model's context limits. When the session ends, the history is saved for future reference through session search — but the next session starts fresh, with memory and skills loaded instead of the full prior conversation.
In Chapter 3, you defined two profiles: an SEO manager that holds strategy context and a research specialist that finds keyword data. Now let us see what happens when you ask the research specialist to find keywords — step by step through the agent loop.
You send a message through Slack: "Find 20 keyword ideas for our next blog post about AI agent teams." The agent loop begins.
The research specialist's system prompt (role definition), its keyword research skill, its memory (which sources worked well last time, what niche you focus on), and your message are combined into the model input.
The model processes the assembled prompt and decides: to find keyword ideas, I should search the web.
The model returns a tool call to the web search tool, with a query like "AI agent teams keywords SEO".
The agent runs the web search tool. It returns results — titles, URLs, and snippets for the top matches.
The search results are added to the conversation. The model now has the raw data in its context.
The model sees the raw search results and decides to search for more specific long-tail variations. It calls the web search tool again with a refined query. The loop continues: tool call, execution, result injection, back to the model.
After gathering enough data, the model calls the file writing tool to save the filtered keyword list. Because file writing is not classified as dangerous, it proceeds without approval.
The model has completed the task. It returns a plain text response: "Found 22 keyword ideas. I saved the top 20 to keywords.md, filtered by relevance and search volume estimate. Here are the top 5..." The agent sends this response back to your Slack thread.
Now the research specialist has found the keywords and saved them. The next step — drafting a content brief or writing the article — is the SEO manager's job. Before the SEO manager proceeds, you review the keyword list and decide: approve the top picks, or ask the research specialist to refine the list.
This is the approval workflow in practice. Not every action needs human review — the research specialist searched the web and saved a file without asking permission. But the transition from research to content production is a judgment call that you want to make yourself. You are the one who knows whether the keywords match your current content strategy, whether the search volume estimates are reasonable, and whether the angles are worth pursuing.
In Hermes terms, this kind of judgment gate is not a built-in approval layer (those guard against dangerous commands). It is a process design choice — you decide which steps in your workflow require human sign-off. The agent handles the research loop autonomously; you step in at the points where founder judgment matters.
The agent loop does not always complete cleanly. A tool call might fail — the web search returns no results, the file cannot be found, or the API endpoint returns an error. When this happens, the error message is injected into the conversation just like a successful tool result. The model sees the error and decides what to do next.
In most cases, the model adjusts its approach. If a web search returns nothing for a specific query, the model might try a broader query or a different source. If a file read fails because the path is wrong, the model might search for the correct path. The loop keeps running — the model gets the error, thinks about it, and either retries or takes a different path.
There are limits. The agent has a maximum number of turns per session — after a certain number of loop iterations, it stops and reports that it could not complete the task. This prevents infinite loops where the model keeps trying the same failing action. And if the model provider itself goes down (the API returns a network error), the loop cannot continue at all — the agent reports the failure and waits for you to try again.
The key insight: error handling is built into the loop. The model does not crash when a tool fails. It reads the error message, adds it to its context, and decides what to do next. This is why an agent loop is more resilient than a single-shot prompt — it can recover from individual tool failures without starting over.
Understanding the agent loop changes how you design your agent team. Here are the practical implications:
The tools available to a profile determine what the agent loop can do. If your research specialist has no web search tool, it cannot research keywords — no matter how good its instructions are. Match toolsets to the role.
The model picks tools based on the descriptions it sees. A clear skill that says "search for keywords using the web search tool, filter results by relevance, save to a file" helps the model call the right tools in the right order. A vague skill leads to unpredictable loops.
Use the approval system for dangerous commands. But also use skill instructions and output contracts to create judgment gates at the points where you want to review work before it proceeds. Both are ways to keep the agent loop from running past the point where your input matters.
Every tool result adds to the conversation. If your agent calls tools in a loop without compressing or filtering results, it will hit the context window limit and start losing earlier context. Skills that specify "return only the top 5 results" or "save full results to a file and return a summary" help manage this.
At the start of each turn — before the first model call — the agent prefetches memory documents and injects them into the prompt. This prefetched memory is cached and reused across all tool-result iterations within that turn, so the agent does not re-fetch memory on every loop iteration. But a new turn does refresh memory, so updates to memory files become visible in the next turn. Skills, on the other hand, can be loaded mid-turn if the model decides a new skill is relevant.
The agent loop is the engine. Your profile design, skill documents, tool access, and approval settings are the steering. The engine runs on its own — but you decide where it goes, what it can reach, and when it needs to stop and ask for directions.
Your research specialist runs a keyword search, gets 50 results, and calls the file writing tool to save all 50. The next step is drafting content — but you want to review the keywords first. Where in the agent loop would you insert a judgment gate, and how would you enforce it?
A terminal session with the research specialist might look like this:
$ hermes --profile research-specialist
You: Find 20 keyword ideas for a blog post about AI agent teams
Agent: I'll search for keyword ideas related to AI agent teams. Let me run a few searches.
[Tool call: web_search("AI agent teams keywords SEO")]
[Tool result: 10 results returned]
[Tool call: web_search("best AI agent frameworks for teams")]
[Tool result: 10 results returned]
Agent: I found 22 keyword ideas. I saved the top 20 to
keywords.md, filtered by relevance and search volume estimate.
Here are the top 5:
1. "AI agent team setup" — high relevance, medium volume
2. "multi-agent systems for SEO" — niche, low competition
3. "how to build an AI agent team" — informational, high volume
4. "AI agent workflow automation" — commercial intent
5. "agent teams vs single agent" — comparison angle
Waiting for your review before the SEO manager drafts the brief.The agent loop ran multiple tool calls (web searches, file write), injected each result back into the conversation, and finished with a summary — pausing at the judgment gate where you review the keywords.