Every chapter before this one focused on building something: a profile, a skill, a cron job, a full agent team. This chapter focuses on keeping what you built running — and making sure the wrong things do not happen along the way.
Hermes runs on your machine, with access to your files, your API keys, and your shell. That power is what makes it useful. It is also what makes security and reliability non-negotiable. An agent that can run commands on your server can also run the wrong commands. An agent that holds your API keys can also leak them. A cron job that fails silently can leave you thinking work happened when it did not.
This chapter covers the defenses Hermes has built in, the things you need to do yourself, and the operational habits that keep your setup reliable over time.
Hermes uses a defense-in-depth model. That means no single layer is expected to catch everything — each layer adds protection, and a failure in one layer is caught by the next. Here are the seven layers, from the outside in:
You do not need to configure all seven layers yourself. Layers 2 through 7 are active by default. Layer 1 (user authorization) requires you to set up allowlists for your gateway platform, which Chapter 5 covered. The rest are built in.
Hermes stores API keys and other credentials in ~/.hermes/.env, not in config.yaml. This separation matters: config.yaml contains your agent's preferences (model, toolsets, approval mode) and may be shared or exported. The .env file contains secrets and should never be shared, committed to Git, or included in a profile export.
Secret redaction is on by default. When Hermes logs tool output, model responses, or gateway messages, it scans for patterns that look like API keys, tokens, and credentials, then masks them. Short tokens are fully replaced. Longer tokens preserve the first six and last four characters so you can identify which key was used without seeing the full value. This redaction cannot be disabled at runtime — it is frozen at import time to prevent an agent from turning it off mid-session.
The practical rule: never put API keys in config.yaml, never commit .env to Git, and never share .env files in profile exports. Hermes separates these on purpose — follow the separation.
Before Hermes executes a command, it checks the command against a curated list of dangerous patterns. Things like deleting files, overwriting configs, installing packages, and running network operations all trigger approval. When a match is found, the agent pauses and asks you for permission.
Three approval modes are available, configured in ~/.hermes/config.yaml:
Every dangerous command requires explicit approval. The agent pauses, shows you the command, and waits. If you do not respond within the timeout (60 seconds by default), the command is denied. This is the safest mode and the one you should use unless you have a specific reason to change it.
An auxiliary LLM assesses each dangerous command. Low-risk commands (like running a Python print statement) are auto-approved. Genuinely dangerous commands are auto-denied. Uncertain cases escalate to a manual prompt. This reduces interruptions without fully disabling safety.
All approval prompts are disabled. Every command executes immediately, regardless of risk. This is equivalent to YOLO mode. Use only in disposable environments — CI containers, one-off sandboxes, environments you can recreate from scratch.
There is also YOLO mode, which you can toggle per-session with /yolo in the CLI or gateway. It bypasses all approval prompts for that session only. YOLO is useful for trusted, well-tested automation scripts where you know exactly what commands will run. It is dangerous for exploratory work where the agent might generate unexpected commands.
One critical detail: YOLO mode has a floor. The hardline blocklist is always on — even in YOLO mode, even with approvals set to off, certain commands are never allowed. Commands like rm -rf /, fork bombs, and raw block device overwrites are blocked unconditionally. There is no override flag. If you hit the blocklist, the tool call returns an error and nothing runs. If your workflow legitimately needs one of these commands, run it outside the agent.
Your agent depends on a model provider to think. If that provider goes down — rate limits, server overload, auth failures, connection drops — your agent stops. Provider fallback is the mechanism that keeps it running.
Hermes supports three layers of resilience:
Rotate across multiple API keys for the same provider. If one key hits a rate limit, the next key takes over. Useful if you have several keys for the same service.
Automatically switch to a different provider and model when your primary fails. Configured in config.yaml under fallback_model or fallback_providers. When the primary provider errors, Hermes tries the fallback on that same turn, then restores the primary for the next turn. Your conversation continues without manual intervention.
Side tasks like vision, context compression, and web extraction have independent provider resolution. If your main provider is down, these tasks can still work through a different route. They degrade gracefully — if no provider is available for compression, the session continues without summarization.
The easiest way to set up fallback is the interactive command:
hermes fallbackThis opens the same provider picker as hermes model. You can also add fallbacks directly in config.yaml. The key principle: always configure at least one fallback provider. Model providers are reliable but not infallible. A single rate limit event can halt an entire cron pipeline if no fallback exists.
Three operational problems come up repeatedly in long-running agent setups. Here is what each one looks like and what to do about it.
If you run the messaging gateway on a VPS, it will eventually restart — either because the server reboots, the process crashes, or you update Hermes. When the gateway comes back up, it reconnects to your messaging platforms (Telegram, Discord, Slack) and resumes listening. Pending messages queue on the platform side and arrive once the gateway is back. The risk is not data loss — it is downtime. Your agents cannot respond while the gateway is down. Solution: run the gateway with automatic restart (systemd or Docker restart policy). Set up monitoring so you know when it goes down, not just when it comes back up.
Cron jobs run unattended. If a job fails — the model provider is down, the API key is invalid, the working directory is wrong — it fails without a human watching. The job records the error in its status, but you will not see it unless you check. Solution: use the deliver field on cron jobs to send results (and errors) to a messaging platform. If the job succeeds, you get the output. If it fails, you get the error message in the same channel. No output at all is the worst case — it means the job might not have run at all. Also run hermes doctor periodically to catch configuration problems before they cause silent failures.
If config.yaml has a syntax error — an unclosed quote, a wrong indent, a missing colon — Hermes cannot parse it. The agent will fail to start, or it will start with default settings and ignore your custom configuration. This is especially dangerous because the failure is silent: you might not realize your approval mode, model, or toolset configuration was lost. Solution: after editing config.yaml, run hermes doctor to validate the configuration. The doctor command checks for parse errors, missing keys, invalid values, and common misconfigurations. It takes seconds and catches problems that would otherwise surface as mysterious agent behavior.
Everything that matters lives in one directory: . That includes your config, your profiles, your memory, your skills, your cron jobs, and your API keys. If you lose this directory, you lose your entire setup.
Hermes has a built-in backup command that creates a zip archive of the full ~/.hermes directory:
hermes backupThis produces a timestamped zip file in your home directory. The backup excludes the hermes-agent code repository (which you can re-clone), bytecode caches, Git metadata, and transient runtime files like gateway PID files. It includes everything else: config, profiles, memory, skills, cron jobs, session databases, and your .env file with API keys.
hermes backup --output /path/to/backup.zipTo restore from a backup, use the import command:
hermes import /path/to/backup.zipThe import overlays the backup onto your current ~/.hermes directory — existing files are overwritten, but files not in the backup are preserved.
Hermes updates frequently — the project is active, with regular releases that add features, fix bugs, and improve security. Updating is straightforward, but it is worth doing carefully when you have a running setup with cron jobs and gateway connections.
hermes updateThe update command pulls the latest code, reinstalls dependencies, and restarts the gateway if it is running. Before you update, follow this sequence:
Run hermes backup before every update. If the update changes config format, breaks a profile, or introduces a regression, you can restore from the backup. This takes seconds and saves hours.
Read the release notes for the version you are updating to. Look for breaking changes to config format, new required fields, or deprecated features. The hermes update command handles most migration automatically, but knowing what changed helps you spot problems faster.
Run hermes update. The command handles the git pull, dependency reinstall, and gateway restart. If the gateway is running, the update restarts it with the new code. Active sessions are preserved across the restart.
After the update, run hermes doctor. This catches configuration problems introduced by the update — new required fields, changed defaults, or format migrations that did not apply cleanly.
Test one manual session with each active profile. Confirm the model connects, tools work, and memory loads. Then check that your cron jobs are still listed and that the gateway is accepting messages.
hermes doctorThe doctor command is your diagnostic tool. It checks: Python and Node availability, API key configuration, model provider connectivity, toolset loading, profile integrity, and gateway status. Run it after any change that might affect your setup — updates, config edits, new profiles, or when something just feels wrong.
Security features and backup commands are necessary but not sufficient. The difference between a setup that runs for months and one that breaks weekly is operational habits. Here are the six that matter most:
Before updating Hermes, editing config.yaml, adding a new profile, or changing a cron schedule — run hermes backup. The command takes seconds. Restoring from a backup takes minutes. Rebuilding a lost setup takes hours or days.
hermes doctor catches configuration problems early, before they become silent failures in cron jobs or gateway sessions. Make it a reflex: update, then doctor.
Model providers go down. Rate limits happen. API keys expire. A single fallback provider in your config means your agent stays running when your primary provider fails. Without one, every provider outage halts your entire pipeline.
Silent failures are the most dangerous kind. If your Monday scheduled report fails and you do not find out until Friday, you have lost a week. Configure the deliver field on every cron job so errors arrive in your Slack, Telegram, or Discord channel immediately.
Smart approval mode is convenient for development. Manual mode is the right choice for production profiles with access to sensitive files and systems. The few extra approval clicks are cheaper than recovering from an auto-approved destructive command.
Memory grows. Skills accumulate. Outdated facts and stale procedures degrade agent quality over time. Review each profile's memory files monthly. Delete contradictions, remove outdated information, and keep files compact. Accurate context produces reliable output.
Hermes provides the mechanisms — approvals, redaction, fallbacks, backups, isolation. The mechanisms work automatically when they are configured. But no mechanism replaces judgment. You decide which approval mode to use. You decide whether to configure fallback providers. You decide whether to back up before an update. The security model gives you the tools. Using them consistently is your responsibility.
The reward for that consistency is a setup that runs reliably for months. The cost of skipping it is unpredictable failures at the worst possible time — a cron job that silently stops, an API key that leaks into a log, a config change that disables your approval mode without you noticing. These are not theoretical risks. They are the normal failure modes of any long-running system. The defenses in this chapter are how you avoid them.
Your cron job runs a scheduled report every Monday at 9 AM. You check on Wednesday and realize you never received Monday's results. The job has been failing silently since you changed your API key two weeks ago. What should you add to prevent this from happening again?