How we built the artifact system
The engineering behind Spawn's real-time artifact rendering — how agents produce Word docs, dashboards, and live apps in a streaming interface.
The artifact system is the part of Spawn that turns an agent's output into something you can actually use — a Word doc, a deck, a spreadsheet, a live web app, a React dashboard. It's also the part of our stack we've rewritten the most times. Here's the current design and why we landed on it.
The problem
When an agent finishes a task, the result is almost never "a wall of text in chat." It's a structured deliverable. The naive way to ship that is to let the agent write a file and hand you a download link. The problem is that files are dead. You can't watch them being built. You can't interact with them. You can't hand them back to the agent for another round.
We wanted something closer to a Google Doc that the agent was writing live — you can see every word land, jump in and edit, and ask the agent to revise.
The three layers
1. The artifact schema
Every artifact is a typed document: docx, pptx, xlsx, pdf, react-app, dashboard, markdown, image. Each type has a canonical representation the agent writes to — for Word, it's a JSON tree that mirrors python-docx. For React apps, it's a file tree with a package manifest. For dashboards, it's a DSL we compile to a live component.
The schema matters because it's the contract between the agent and the renderer. The agent doesn't write raw PPTX XML — it writes a stable JSON shape, and we handle serialization to the final format on our side.
2. Streaming deltas
An agent producing a 40-slide deck doesn't emit one giant blob at the end. It streams deltas — "add slide 3 with these elements," "update chart on slide 7," "fix the typo on slide 12." Every delta is an idempotent operation against the artifact schema, journaled to durable storage.
The frontend subscribes to the delta stream and reconstructs the artifact in real time. You watch slides populate as the agent thinks. Mid-stream edits from the user merge into the same journal.
“Files are dead. You can't watch them being built. You can't interact with them.
3. The renderer
For document types (docx/pptx/xlsx) we render on the server — the browser gets a pre-rendered HTML preview plus a link to the materialized binary. For interactive types (React apps, dashboards) we spin up a sandboxed iframe that executes the agent's code live.
The iframe is the tricky part. Every user app runs in its own ephemeral worker with scoped secrets, CORS isolation, and a proxy layer for outbound HTTP so we can rate-limit and audit. Boot time from agent code to running preview is under 900ms.
What we learned the hard way
- Don't version artifacts as files. Version them as event journals. Every revision is a replay.
- Never let the agent write the final binary format. Always go through a typed intermediate.
- Streaming is table stakes. Users feel the 2-second gap between "done thinking" and "done rendering" as a bug, not a loading state.
- Sandboxing is non-negotiable. We've caught dozens of prompt-injection attempts that would have otherwise run arbitrary code in the iframe.
What's next
We're working on collaborative artifacts — multiple users and multiple agents editing the same artifact simultaneously, with CRDT-backed conflict resolution. And we're building a public SDK so you can define your own artifact types with their own renderers. If you've got a format you want Spawn to produce natively, tell us.