Category-defining·Updated Apr 22, 2026·9 min read

AI agents that build their own tools

TL;DR

A real AI agent doesn't just plug into your stack — it ships the stack. It builds the CRM if you don't have one, the dashboard if you need one, the training pipeline if the work requires one. Runs → builds → ships → connects. One continuous loop.

Spawnlabs

The team at Spawnlabs

The narrowest definition of an AI agent — 'software that executes on a goal' — undersells what the good ones actually do. In production, a working AI agent isn't just intelligence hitting API endpoints. It's a system that builds the tools it needs, ships the artifacts the user wants, and connects the rest of the stack on the way.

This post makes that concrete: two agent loops, start to finish. Neither requires you to bring a stack. The agent ships the stack itself.

Loop one: a sales agent

You sign up and tell your new sales agent: run GTM for my seed-stage company. The agent's loop over the next few days looks like this:

Finds leads — pulls from LinkedIn, PeopleDataLabs, public filings, community signals. Returns a ranked list scored against your ICP.
Researches accounts — for each lead, builds a one-pager with buying signals and personalization angles.
Drafts outreach — writes the first-touch message in your voice, handles follow-ups on the right cadence.
Builds a CRM if you don't have one — detects you lack a CRM and ships a lightweight one in your workspace, wired to your inbox.
Ships landing pages per campaign — generates landing pages per ICP segment, publishes them, routes replies into the CRM.
Updates the pipeline — every reply, every call, every contract moves through stages automatically.
Posts a daily dashboard — end of day, the agent posts pipeline health, at-risk deals, and tomorrow's top 10 accounts.

Notice what the agent never asks you to do: integrate a CRM, buy a dashboard tool, set up landing page software, wire analytics. It builds what it needs to do its job.

Loop two: an ML research agent

You hire an ML research agent to figure out whether a new architecture is worth pursuing. Its loop:

Pulls papers — scans arXiv, Semantic Scholar, conference proceedings. Extracts relevant experiments.
Builds the training pipeline — writes the training code, data loading, augmentation, logging. Deploys to your Modal / SageMaker / runpod.
Ships the experiment dashboard — a real dashboard in your workspace with loss curves, sample outputs, per-config comparisons.
Triggers runs — kicks off experiments at the right cadence, watches logs, retries on failure.
Updates the dashboard — as runs complete, the dashboard populates with results you can actually compare.
Sends weekly results — a Slack digest summarizing what worked, what didn't, and the next experiments worth trying.

Again: you didn't need to bring MLOps. The agent built the pipeline, shipped the dashboard, and ran the loop.

What every good agent loop has in common

Four verbs, one continuous flow:

Runs — executes the recurring workflow
Builds — ships internal infrastructure (CRMs, dashboards, trackers, pipelines) when the work needs it
Ships — delivers external artifacts (landing pages, apps, reports, memos) for the user
Connects — integrates with the rest of your stack (Slack, Gmail, Notion, Stripe, 200+) in one click

The competition offers one or two of these. Chatbots only 'run' (in the loosest sense). Workflow tools integrate but don't build. Agent frameworks build but don't ship. The leverage is in all four running as one loop, which is what we mean by agent.

Why this framing matters

The old model of knowledge-work software: buy a CRM, buy a dashboard tool, buy a landing page builder, buy an MLOps platform, pay someone to stitch them all together.

The new model: your agent ships the tools on demand, because the agent is the builder, operator, and user of those tools simultaneously. Your stack is whatever the agent builds for the current problem.

This is why agent platforms that only wrap LLM calls feel thin. They skipped the 'build' and 'ship' verbs. A proper agent owns all four.

"
Most AI agents need you to bring the stack. Spawn agents ship the stack.
— Spawnlabs' framing

A few more role loops

Short versions, same pattern:

Marketing agent

Finds keywords → drafts briefs → ships landing pages → writes campaigns → builds the performance dashboard → iterates from the data. No marketing ops consultant needed — the agent assembled the full funnel.

Designer agent

Synthesizes user research → drafts microcopy → ships prototypes → builds the design-system audit → drafts launch docs → tracks drift. Not just Figma plugin assistance — a full research-through-launch loop.

Founder agent

Watches signals → drafts the investor update → builds the KPI board → ships the product landing page → triages inbox → briefs every candidate. The admin layer of running a company, automated.

Where this breaks down

To be honest: not every loop should ship its own stack. If your team already has a CRM and loves it, the agent should use it, not rebuild it. The 'builds' verb is a capability, not a mandate — the agent detects what's missing and fills the gap; it doesn't force new tools on you.

The point isn't 'agents replace your stack.' The point is 'the agent ensures the work gets done regardless of what stack exists.'

How Spawnlabs handles this

Spawn agents have four primitives: skills (what to do), tools (how to call things), code execution (build and run anything), and persistent memory (remember it all). The 'builds' and 'ships' verbs are made possible by code execution in a sandbox — the agent literally writes, deploys, and runs the tools it needs, then uses them.

This is why Spawn agents feel qualitatively different from 'ChatGPT with integrations' products. Not because the model is different — we use Claude under the hood. Because the primitives are built for agents that ship infrastructure, not agents that wait for yours.

#AI-agents-that-build-tools#AI-agents-that-build-apps#AI-agents-that-build-their-own-tools#self-extending-AI-agents#AI-agent-tool-builder#AI-agents-that-ship-apps#AI-agents-with-built-in-tools#AI-agents-that-build-dashboards

//COMMON QUESTIONS05

Can AI agents really build their own apps?

How does this differ from workflow automation like Zapier or n8n?

What happens if I already have a CRM / dashboard / pipeline?

How do you prevent agents from building the wrong thing?

Is this the same as AI app builders like Replit Agent?