404.Technologies

Initializing

AI-native
product lab.

We take the hard problems in AI products — the agents, the systems, the parts that break in production — and ship them. A few client engagements a year.

See work

Services

Four ways the lab creates leverage.

Scroll to stack. AI systems, product surfaces, frontend rescue, launch paths. Pick the one whose constraint feels familiar.

Production agents shipped
12+
MCP servers in production
4
Operator response-time gain
~40%
Engagements per year
5–8

Source: 404 Technologies internal metrics, 2026. Track record across own products + selective client builds.

01 Executive · Bespoke

Agentic agents for executives and operators.

Custom Hermes and OpenClaw agents trained on your operating model. Briefings, ops drafts, decision packets: drafted by the agent, signed by you.

  • Hermes briefing agent (calendar, inbox, signals)
  • OpenClaw ops agent (workflows, drafts, decisions)
  • Custom voice + tone fine-tuned to the founder
  • Private knowledge base + retrieval over docs/Slack/email
  • Permissioned tool access (Linear, Notion, Stripe, GH)
  • Weekly eval review + agent retraining loop
Best for
Founders, COOs, and chiefs of staff who need leverage without hiring a team.
Deliverable
Two production agents, retrieval index, eval harness, ops runbook.
Timeline
6-10 weeks. Live agent in week 3, refined through week 10.
02 Own product · Build

AI systems that answer to operators.

Voice, chat, routing, evals, logs, fallbacks, and dashboards built around the messy parts of real work.

  • Multi-model routing with provider fallbacks
  • Operator console + role-based access
  • Eval harness with regression tracking
  • Prompt versioning + diff review
  • Audit log + replay for every call
  • SMS / voice / chat surface adapters
Typical output
Agent workflow, operator dashboard, eval loop, launch runbook.
Stack
Anthropic · OpenAI · Postgres · Vercel
Timeline
4-8 weeks. Production-ready, not demo-grade.
03 Frontend · Design

Frontend + design for products that ship.

Design systems, polished interfaces, motion, and interaction craft. Built in code, not Figma slides. Tokens, components, real responsive behavior, real states.

  • Token-based design system + theming
  • Component library you actually reuse
  • Hi-fi flows, hero pages, marketing surfaces
  • Onboarding, empty, error, success states
  • Motion + micro-interactions (GSAP / CSS)
  • Responsive logic across every breakpoint
  • Accessibility pass (WCAG AA baseline)
  • UX copy + microcopy refinement
Best for
Founders with working software that needs a sharper public or product face.
Deliverable
Design system, shipped components, hi-fi flows, motion spec.
Timeline
3-6 weeks. Includes one revision round.
04 Backend · Architecture

Backend + architecture for AI-native products.

Schema, services, auth, billing, jobs, observability, infra. Architecture that survives traffic, audits, and the second product the founder hasn't pitched yet.

  • Data model + schema (Postgres, vector stores)
  • Auth + org / role / permission model
  • Stripe billing + idempotent webhooks
  • API design (REST, RPC, streaming)
  • Background jobs + retry semantics
  • Observability (logs, traces, alerts)
  • Migrations with rollback path
  • Multi-tenant + infra hardening
Best for
Teams that need a backbone that won't need to be rewritten in a year.
Deliverable
Production architecture, migration plan, runbook, observability stack.
Timeline
4-10 weeks scoped per surface.

About

Three builders. Zero account managers. Products you can use today.

404 Technologies is an AI-native product lab — a three-person studio that designs, builds, and operates AI agents and AI-native SaaS in production. We don’t hand off decks or prototypes: we run agentic systems like Hermes and OpenClaw daily, operate our own products, and keep the client slate deliberately small.

Studio

OrderFlowAI, WizPrompt, and the Mike Will Made It Media Hub were built in this lab and run on this stack — alongside a short list of client engagements.

Proof beats pitch. No discovery-phase theater, no forty-slide strategy doc — working software, a direct line to the people building it, and replies measured in hours.

Founder
Andrew “404kidwiz” Morris
Team
Andrew Morris, Aremintto Morris & S. Farrow
Based
Atlanta, GA · America / Dubai / Switzerland
Stack
Claude · MCP · Next.js · Postgres · Vercel
Engagements
5-8 client projects per year
Status
Accepting briefs

What we build with

Google Gemini OpenAI Next.js Postgres Vercel Stripe Claude Code MCP GSAP TypeScript

Thinking

Published notes from the lab.

Operator's hand resting on a glowing dark tablet in a dim studio at night

Ship the workflow before the pitch.

Most AI integrations are demos with better lighting. We only trust the system after it survives real operators, real edge cases, and a week of being ignored by its creator.

The useful test is boring: can someone who did not build it use it when the room is busy, the data is imperfect, and the happy path is gone? That is where the interface, model routing, and fallback plan either become a product or reveal themselves as theatre.

Full article →
Stacked translucent frosted-glass panels with warm amber core and violet edge glow

Build the thing you wish existed.

We run our own products first, then take client work with fewer assumptions and sharper opinions. If the lab cannot operate its own stack, the advice gets theatrical fast.

Owning the stack changes the taste. Suddenly uptime, billing, support, retries, empty states, and onboarding are not abstract best practices. They are the work. That pressure makes the design less precious and the engineering less theoretical.

Full article →
Branching version-control git graph of glowing nodes and violet light trails on near-black

Prompts need version control.

WizPrompt exposed the obvious failure mode: production behavior changing because a prompt changed in place. Once prompts are treated like code, the debugging gets honest.

A prompt without history is a production incident waiting for a name. Diffs, eval runs, staged deploys, and rollback paths do not make AI less creative. They make it possible to trust when the product is no longer being babysat.

Full article →
Grid of glowing teal and rose status indicators on a dark brushed-metal panel

Evals before vibes.

Most AI features ship on gut feel. The model "seemed" right in staging, the demo went well, production is different. Evaluation sets are not optional infrastructure. They are the handbrake.

The pattern that works: write your golden set before you write your prompt. Define 20 inputs, their expected outputs, and a rubric for pass/fail. Run every prompt change against it. When the model regresses, you will know the commit that broke it. When the model improves, you will have proof. Vibes are a starting point. Evals are what turns a demo into a product.

Full article →
Constellation of floating geometric tool primitives linked by luminous violet threads

Claude Code is not just another IDE plugin.

Claude Code, OpenAI Codex, Cursor, Google Antigravity. The difference is not the model. It is the primitives: how tools are defined, how agents compose, and how context is scoped.

Every agentic coding tool lets you talk to a model. That is table stakes. What varies is the depth of the agent contract: how tools get registered, how sub-agents are chained, how hooks intercept actions, how the agent reads the filesystem, calls APIs, opens a browser, and knows when to stop. Claude Code with a well-designed CLAUDE.md and a custom skill is a different class of tool than an autocomplete extension. It runs commands, reads outputs, iterates, and escalates. The right tool is the one whose agent primitives match the workflow you are building.

Full article →

Field notes

Dark studio workbench with cyan and violet light traces converging across a glowing surface

Setting up Hermes is architecture, not a download.

Hermes is a briefing agent that reads a founder's calendar, inbox, and signals and drafts the day. The setup that matters isn't an install step — it's the operating model we encode around it.

Configuration comes down to four decisions: which signals it reads, which tools it can touch, the voice it writes in, and where the permission line sits. We scope a private retrieval index over docs, Slack, and email; wire permissioned tool access (Linear, Notion, Stripe, GitHub) through MCP servers; tune voice and tone to the founder; and run a weekly eval-and-retrain loop so it stays calibrated. It ships on the Claude Agent SDK against current Claude models, with provider fallbacks underneath. The agent drafts; the founder signs.

Full article →
Interlocking translucent server connectors and glowing data sockets in cyan and violet

Running MCP servers in production.

Building your first MCP server is a weekend. Running one in production is the rest of the quarter — the gap is auth, scope, and the failure modes nobody demos.

Treat an MCP server like any other production surface. OAuth and least-privilege scopes per tool, secrets that never reach the model's context, rate limits and idempotency on anything that writes, versioning so a tool change doesn't silently break an agent mid-session, and traces on every call. The Model Context Protocol makes capabilities portable across agents — which is exactly why a sloppy server becomes everyone's incident.

Full article →
Branching luminous pathways rerouting through a dark void with a warm rose fallback branch

Multi-model routing that survives an outage.

One model is a single point of failure. Routing by task with real provider fallbacks is the difference between a blip and a page at 2am.

Route by job, not by habit: a fast model for classification and extraction, a frontier model — Claude Opus 4.8 today — for hard reasoning, cheaper tiers for bulk work. Put provider fallbacks underneath so a refusal, a rate limit, or a 529 reroutes inside the same request instead of failing the user. Add cost and latency budgets, prompt caching on the stable prefix, and graceful degradation when the best option is unavailable. The user should never know which model answered.

Full article →
Grid of glowing teal and rose status cells on a dark brushed-metal panel

Evals that catch regressions before users do.

A prompt change with no eval is a production incident waiting for a name. The fix is a golden set and a gate in CI.

Write the golden set before the prompt: representative inputs, expected outputs, and a pass/fail rubric per case. Run every prompt and model change against it, gate the deploy on the result, and feed real production traces back in so the set grows toward the cases that actually break. When the model regresses you get the commit that did it; when it improves you get proof instead of a vibe.

Full article →
Layered translucent telemetry trace lines and waveforms over a dark dashboard

Observability for agents in production.

You can't debug what you can't see. Agents need traces over prompts, tools, and outputs — and a live read on what they cost.

Structure traces around the loop: every prompt, every tool call, every model response, with token and cost attribution per step and per tenant. Alert on what actually hurts — runaway loops, climbing latency, failure spikes, cost drift — and redact PII at the boundary so the logs are safe to keep. Pair it with a token-and-cost dashboard an operator can read without a data team. Observability is what turns the vague 'the agent feels slow' into a fix.

Full article →
Developer's workbench with glowing cyan and violet wiring linking calendar, inbox, and chat nodes

Build your own briefing agent.

You don't need our product to get the idea. A full, followable guide to building a Hermes-style briefing agent on the Claude Agent SDK — calendar, inbox, and Slack into a daily brief, read-only.

The whole build, end to end: scaffold the Claude Agent SDK, wire your calendar, inbox, and Slack through MCP servers, keep it read-only with a tool whitelist, write the brief prompt, and schedule a 7am run. Real commands, grounded in the current SDK and MCP docs. The last twenty percent — voice, retrieval, evals — is what turns a demo into something a founder trusts.

Full article →

Questions

Common questions, honest answers.

Start Here

Have something difficult to build?

The difficult ones — the parts that broke last time. Real problems only. Replies within 48 hours.

Open slots
2 of 8
Intake
Q3 2026
Reply
within 48h
Triage in one line Idle

An AI screener categorizes your inquiry and suggests the right next step. The brief form follows.

or book a call ↗