Chapter 7

Agentic Behavior

In earlier chapters, you saw the components of a Hermes agent: profiles, skills, memory, toolsets, gateway, and cron. But what actually happens when you send a message? How does a prompt become an action, and how does that action become a result you can use?

The answer is the agent loop: assemble context, call the model, pick a tool, execute it, feed the result back, and repeat. A chatbot generates text and stops. An agent generates text, acts on it, learns from the result, and keeps going until the task is done.

Understanding this loop matters because it determines what your agent can do, where things go wrong, and where you need to step in.

The core cycle

Observe, think, act, repeat

When you send a message to a Hermes agent — through the terminal, Slack, Telegram, or any other channel — this cycle begins:

Prompt assembly. The agent gathers the profile's system prompt, relevant skills, memory, context files, and conversation history into the model input.
Model call. The assembled prompt is sent to your configured model provider, which processes it and returns a response.
Tool selection. If the model decides it needs to act — search the web, read a file, run a command — it returns a tool call alongside or instead of plain text.
Tool execution. The agent runs the requested tool, passing input parameters and collecting output. If the tool is marked dangerous, the approval system checks whether it can proceed.
Result injection. The tool's output is injected back into the conversation, giving the model the result alongside prior context.
Repeat or finish. The updated conversation goes back to the model. If it wants another tool, the cycle repeats. If it returns only text, the loop ends and the response is sent to you.

A simple question completes in a single pass. A complex task might loop several times. In your SEO workflow, the research specialist runs this loop: search → filter → save → repeat until done.

How tools work

Tool calling: from text to action

When the model needs information it does not have — or needs to act beyond text — it returns a structured tool call instead of a plain response. Tools are organized into . You control which toolsets each profile can access — a research specialist gets web search and file writing; an SEO manager gets file reading and writing but not shell execution.

The model never executes tools directly — the Hermes runtime acts as the intermediary. It checks whether the requested tool is available to the current profile, passes the parameters, runs the tool, and returns the output. This separation is what makes approvals and security checks possible.

Guardrails

When the agent needs your permission

Not every tool call runs immediately. Dangerous actions — shell commands, file deletion, network requests to unfamiliar endpoints — go through the . Hermes has three modes:

Manual (default)

The agent pauses and asks you to approve or deny. If you do not respond within 60 seconds, the action is denied. Start here.

Smart

An auxiliary LLM evaluates the command and decides. Fewer interruptions, but a second model call per approval — costs more and can occasionally allow a risky action a human would catch.

Off

All dangerous commands run without review. A hardline blocklist still prevents the most destructive actions (like rm -rf /). Only use this if you fully trust the agent's tool access and context.

Approvals are part of a seven-layer defense-in-depth model — user authorization, dangerous command approval, container isolation, credential filtering, context file scanning, cross-session isolation, and input sanitization. You control how much autonomy the agent has; the default is the safest setting.

Context management

Session compression

Every turn adds to the conversation history the model sees. As the session grows, history consumes more of the context window — the maximum text the model can process in one call. When the conversation exceeds a threshold, Hermes compresses earlier turns into a compact summary, preserving key decisions and results while discarding the full text. This is not memory (which persists across sessions in files like MEMORY.md). Session history exists within a single session and is compressed to stay within the model's context limits. When the session ends, the next session starts fresh — with memory and skills loaded, not the full prior conversation.

Resilience

What happens when something goes wrong

When a tool call fails — no search results, file not found, API error — the error message is injected into the conversation like a successful result. The model sees the error, adjusts its approach, and either retries or takes a different path. The loop keeps running. The agent has a maximum turns-per-session limit to prevent infinite retry loops, and if the model provider itself goes down, the loop cannot continue at all.

Error handling is built into the loop. The model does not crash when a tool fails — it reads the error, adds it to context, and decides what to do next. This is why an agent loop is more resilient than a single-shot prompt: it can recover from individual tool failures without starting over.

Takeaway

Designing around the loop

Understanding the agent loop changes how you design your agent team. The practical implications:

Tool access shapes behavior. The tools available to a profile determine what the loop can do. If your research specialist has no web search tool, it cannot research keywords — no matter how good its instructions. Match toolsets to the role.
Skill descriptions guide tool selection. The model picks tools based on the descriptions it sees. A clear skill that says "search for keywords, filter by relevance, save to a file" helps the model call the right tools in the right order. A vague skill leads to unpredictable loops.
Approvals are process design, not just security. Use the approval system for dangerous commands. Use skill instructions and output contracts to create judgment gates at points where you want to review work before it proceeds. Both keep the loop from running past the point where your input matters.
Context window is finite. Every tool result adds to the conversation. Skills that specify "return only the top 5 results" or "save full results to a file and return a summary" prevent context window overflow.
Memory is prefetched per turn, not per loop iteration. Before the first model call in a turn, the agent prefetches memory and injects it into the prompt. This cached memory is reused across all tool-result iterations within that turn. A new turn refreshes memory, so updates to memory files become visible then. Skills can be loaded mid-turn if the model decides a new skill is relevant.

The agent loop is the engine. Your profile design, skill documents, tool access, and approval settings are the steering. The engine runs on its own — but you decide where it goes, what it can reach, and when it needs to stop and ask for directions.

Your research specialist runs a keyword search, gets 50 results, and calls the file writing tool to save all 50. The next step is drafting content — but you want to review the keywords first. Where in the agent loop would you insert a judgment gate, and how would you enforce it?

A terminal session with the research specialist:

bashVerified

$ hermes --profile research-specialist

You: Find keyword ideas for a blog post about AI agent teams

[Tool: web_search("AI agent teams keywords SEO")]
[Result: 10 results returned]
[Tool: web_search("best AI agent frameworks")]
[Result: 10 results returned]

Agent: Found 22 keywords. Saved top 20 to keywords.md.
Top 5:
1. "AI agent team setup" — high relevance
2. "multi-agent systems for SEO" — low competition
3. "how to build an AI agent team" — high volume
4. "AI agent workflow automation" — commercial intent
5. "agent teams vs single agent" — comparison angle

Waiting for your review before the SEO manager drafts.

The loop ran multiple tool calls, injected results back, and finished with a summary — pausing at the judgment gate.

Installation and Hosting Skills, Instructions, and Sharp Agents