How we set up and configure Hermes.
Hermes is a briefing agent that reads a founder's calendar, inbox, and work signals and drafts the day. Here is how we actually set one up for a client, and where the hard decisions live.
What Hermes is.
Hermes is a briefing agent. Every morning it reads a founder's calendar, inbox, and the signals coming out of their tools, then writes the day in their own voice. Not a digest of unread items. A judgment call about what matters, who is waiting, and what is about to slip.
The output is a short brief, not a dashboard. Here are the three things that move the company today. Here is the meeting you are underprepared for. Here is the thread that has gone quiet for nine days and is now a risk. A founder reads it in ninety seconds and knows where to point their attention.
Generating text was never the hard part. The hard part is deciding what Hermes is allowed to see, what it is allowed to do, and how confident it has to be before it acts. Those decisions are the configuration.
The four configuration decisions.
Every Hermes deployment comes down to four choices. We make them deliberately with the client rather than defaulting our way through them.
First, which signals it reads. More sources is not better. A founder's inbox plus calendar plus their issue tracker is usually enough to write a sharp brief. Adding every Slack channel and every notification mostly adds noise the model has to fight through. We start narrow and widen only when the brief is visibly missing something.
Second, which tools it can touch. Reading is cheap and low-risk. Writing is where trust is spent. Third, the voice it writes in. This is not cosmetic; a brief that sounds like the founder gets read, and one that sounds like a SaaS notification gets ignored. Fourth, where the permission line sits. That line is the difference between an agent that drafts and an agent that sends.
A private retrieval index.
Hermes cannot reason about a company it does not understand. So before the first brief, we build a private retrieval index over the client's own material: internal docs, Slack history, email threads, planning notes. It stays inside the client's boundary and is never used to train anything.
The index is what lets Hermes know that the "Q-thing" in a Slack message is the contract renewal, that a name on the calendar is a board member and not a vendor, that a thread is quiet because the deal already closed. Retrieval gives the brief context a generic model would have to guess at.
Freshness matters more than size. A six-month-old doc that contradicts this week's decision is worse than no doc. We weight recency and let stale material fall out of reach rather than poison the brief.
Permissioned tool access through MCP.
Hermes reaches the client's tools through MCP servers — the Model Context Protocol, Anthropic's open standard for connecting a model to tools and data. Linear, Notion, Stripe, and GitHub each sit behind their own server, and Hermes gets exactly the scopes that server grants and nothing more.
This is where the permission line becomes concrete. Hermes can read Linear issues and name the ones blocking a launch, but creating or closing them is a separate, narrower grant we add only when a founder asks. It can read Stripe to flag a failed payment from a key account; it has no path to move money. GitHub and Notion follow the same shape: broad read, deliberate and minimal write.
Scoping at the MCP layer instead of in the prompt is the point. A prompt that says "don't send anything" is a suggestion. A tool the agent was never handed is a guarantee. We put the boundary in the wiring, not the instructions.
The weekly eval-and-retrain loop.
A briefing agent lives or dies on its judgment, and judgment drifts. So Hermes ships with a weekly loop. The founder marks briefs that landed and briefs that missed: the meeting it should have flagged, the thread it over-weighted, the tone that felt off.
That feedback becomes an eval set. Each week we run the current configuration against the accumulated examples before any change goes live, so we can see whether a tweak to the voice or the signal mix actually improves the brief or just moves the errors around. Vibes do not ship. The eval set decides.
Retraining here mostly means tuning the configuration, not fine-tuning a model: adjusting which signals carry weight, sharpening the voice, moving the permission line as trust builds. The loop is what keeps Hermes useful in month six instead of just impressive in week one.
How it deploys.
Hermes runs on the Claude Agent SDK. That gives us the agent loop, tool calling, and the MCP integration without rebuilding the plumbing, so the work stays on the configuration specific to each founder.
Claude Opus 4.8, the current frontier model, does the morning reasoning where the judgment is hardest: deciding what matters and writing it in voice. Lighter passes run on Sonnet 4.6 where the task is more mechanical. The model choice follows the difficulty of the call, not a default.
Provider fallbacks sit underneath. If a request hits a refusal, a 429 rate limit, or a 529 overloaded response, it reroutes rather than failing the brief. A founder should never open their morning to an error message where the brief was supposed to be.
Where the tradeoffs actually bite.
The recurring tension is reach versus trust. Every new signal and every new write scope makes Hermes more capable and more able to embarrass you. We bias toward read-only and narrow until a client has watched the agent be right for a few weeks.
The other tradeoff is voice versus safety. A brief that sounds exactly like the founder is the goal, but the closer Hermes gets to their voice, the more careful we are about anything it can send under their name. Sounding like someone and acting as them are different permissions, and we keep them separate.
A prompt that says don't send anything is a suggestion. A tool the agent was never handed is a guarantee.