Every chapter before this one focused on building something. This one focuses on keeping what you built running — and making sure the wrong things do not happen along the way.
Hermes runs on your machine, with access to your files, your API keys, and your shell. That power is what makes it useful. It is also what makes security and reliability non-negotiable. An agent that can run commands on your server can also run the wrong commands. A cron job that fails silently can leave you thinking work happened when it did not.
Hermes uses a defense-in-depth model — no single layer catches everything, and a failure in one layer is caught by the next. Layers 2–7 are active by default. Layer 1 (user authorization) requires allowlist setup covered in Chapter 5.
Hermes stores API keys in ~/.hermes/.env, not in config.yaml. Config contains preferences (model, toolsets, approval mode) and may be shared or exported; .env contains secrets and must never be committed, shared, or included in a profile export. Secret redaction is on by default and cannot be disabled at runtime — it is frozen at import time to prevent an agent from turning it off mid-session. Hermes scans logs and output for API key patterns (vendor prefixes like sk-, sk-ant-, sensitive query parameter names, and long alphanumeric strings), then masks them — short tokens are fully replaced, longer ones preserve the first six and last four characters so you can identify which key was used. The full pattern list lives in agent/redact.py.
Before Hermes executes a command, it checks against a curated list of dangerous patterns — file deletion, writing to sensitive paths, package installation, network operations piped to shell, privilege escalation. When a match is found, the agent pauses and asks for permission. Chapter 7 covered how this works in the agent loop; here is the configuration:
Three approval modes (Ch7): manual (default, safest — every dangerous command requires explicit approval), smart (auxiliary LLM assesses risk — low-risk auto-approved, dangerous auto-denied, uncertain escalated to you), off (disposable environments only — every command executes immediately). YOLO mode toggles per-session with /yolo and bypasses all approval prompts for that session only. The hardline blocklist is always on — even in YOLO mode, even with approvals off, commands like rm -rf /, fork bombs, and raw block device overwrites are blocked unconditionally. There is no override flag.
Your agent depends on a model provider to think. If that provider goes down, your agent stops. Hermes provides three layers of resilience: credential pools rotate across multiple API keys for the same provider so a rate-limited key passes to the next; primary model fallback automatically switches to a different provider and model on the same turn, then restores the primary for the next turn; auxiliary task fallback gives side tasks like vision and context compression independent provider resolution so they degrade gracefully even when the main provider is down. Always configure at least one fallback — a single rate limit event can halt an entire cron pipeline without one.
hermes fallbackGateway restarts. The messaging gateway will eventually restart — server reboots, process crashes, Hermes updates. Pending messages queue on the platform side and arrive once the gateway is back. The risk is not data loss, it is downtime: agents cannot respond while the gateway is down. Run the gateway with automatic restart (systemd or Docker restart policy) and set up monitoring so you know when it goes down, not just when it comes back up.
Silent cron failures. Cron jobs run unattended. If a job fails — provider down, invalid API key, wrong working directory — it records the error in its status, but you will not see it unless you check. Use the deliver field on every cron job to send results and errors to a messaging channel. No output at all is the worst case — it means the job might not have run. Also run hermes doctor periodically to catch configuration problems before they cause silent failures.
Broken config files. If config.yaml has a syntax error — unclosed quote, wrong indent, missing colon — Hermes cannot parse it. The agent fails to start, or starts with defaults and silently ignores your customization. After every config edit, run hermes doctor to validate. It takes seconds and catches problems that would otherwise surface as mysterious behavior.
Everything that matters lives in . Config, profiles, memory, skills, cron jobs, API keys — lose this directory and you lose your entire setup.
hermes backupProduces a timestamped zip in your home directory. Excludes the hermes-agent code repository, bytecode caches, and transient runtime files. Includes everything else: config, profiles, memory, skills, cron jobs, session databases, and your .env file with API keys.
hermes backup --output /path/to/backup.ziphermes import /path/to/backup.zipImport overlays the backup onto your current ~/.hermes directory — existing files are overwritten, but files not in the backup are preserved.
Hermes updates frequently. The update command handles the git pull, dependency reinstall, and gateway restart. Active sessions are preserved across the restart. Before you update, follow this sequence:
hermes updatehermes backup. Takes seconds, saves hours if the update introduces a regression.hermes update handles git pull, dependency reinstall, and gateway restart.hermes doctor catches config problems introduced by the update: missing fields, changed defaults, failed migrations.hermes doctorThe doctor checks Python/Node availability, API key configuration, model provider connectivity, toolset loading, profile integrity, and gateway status. Run it after any change that might affect your setup.
Security features and backup commands are necessary but not sufficient. The difference between a setup that runs for months and one that breaks weekly is operational habits.
deliver on every job so errors arrive immediately.Hermes provides the mechanisms — approvals, redaction, fallbacks, backups, isolation. They work automatically when configured. But no mechanism replaces judgment. You decide which approval mode to use, whether to configure fallbacks, whether to back up before an update. The six habits in this chapter are how you use those mechanisms consistently.
The reward is a setup that runs reliably for months. The cost of skipping it is unpredictable failures at the worst possible time — a cron job that silently stops, an API key that leaks into a log, a config change that disables your approval mode without you noticing. These are not theoretical risks. They are the normal failure modes of any long-running system. The defenses in this chapter are how you avoid them.
Your cron job runs a scheduled report every Monday at 9 AM. You check on Wednesday and realize you never received Monday's results. The job has been failing silently since you changed your API key two weeks ago. What should you add to prevent this from happening again?