Reasoning + Acting — a prompting/loop strategy for AI agents introduced in a 2022 paper. The agent alternates between reasoning about what to do next and taking actions, using observations from actions to update its reasoning.

Fundamentals·Updated Apr 22, 2026·9 min read

How AI agents work — inside the loop

TL;DR

Under the hood, most AI agents are a loop: plan → act → observe → repeat. The quality of an agent is the quality of its tools, its memory, and its ability to recover from failure — not the cleverness of its prompt.

Spawnlabs

The team at Spawnlabs

There's a gap between how AI agents are marketed ('magical autonomous workers') and how they actually work ('a loop that calls APIs until a goal is satisfied'). This post closes that gap without burying you in ML jargon.

The core loop: ReAct

Most working agents follow a pattern first articulated in a 2022 paper called ReAct — Reasoning + Acting. The loop has four stages:

Plan — given the goal, what should I do first?
Act — call a tool (search the web, read an email, query a database)
Observe — what did the tool return? What did it tell me?
Update — revise the plan based on what I learned, then repeat

This repeats until the goal is satisfied or the agent hits a stopping condition (too many steps, exception to escalate, explicit done signal).

Tools are the capabilities

An agent can only do what its tools let it do. A 'brilliant' agent with no tools is a chatbot. A 'dumb' agent with well-designed tools can outperform it.

Tool types:

Search — web search, internal DB search, vector retrieval from a knowledge base
Read — fetch email, read a file, pull data from an API
Write — send email, post to Slack, update a CRM, create a ticket
Execute — run code in a sandbox, execute a database query, call a custom function
Reason — delegate to a sub-agent, run a planning pass, do math precisely

The agent's 'intelligence' is mostly its ability to pick the right tool for the right step, interpret the tool's output correctly, and chain tools into a complete workflow.

Why agents fail (and what fixes it)

Failure 1: wrong tool, wrong time

The agent searches the web when it should've read the CRM. Fixed by giving the agent explicit guidance about when to use which tool, and by naming tools clearly.

Failure 2: bad observation

The agent gets JSON from an API and misreads a field. Fixed by giving tools better output descriptions and by having the agent cite specifically what it read.

Failure 3: stuck in a loop

The agent tries the same action 10 times hoping for a different result. Fixed by step limits, explicit retries with backoff, and escalation to a human after N failures.

Failure 4: lost the plot

The agent ran for 30 steps and lost track of the original goal. Fixed by periodic re-grounding — at every N steps, the agent re-reads the goal and checks whether its current action contributes to it.

Planning styles

Not all agents plan the same way:

Reactive — decides the next step only, no look-ahead. Fast, shallow.
Plan-then-execute — builds a full plan, then runs it. Brittle when reality deviates.
Hierarchical — plans at multiple levels (goal → sub-goals → actions). Most common in production.
Multi-agent — one agent orchestrates, others execute. Power increases, coordination cost increases faster.

Spawnlabs uses hierarchical planning by default and supports multi-agent swarms for complex workflows.

Memory is the backbone

An agent without memory is a one-shot tool. Working agents maintain three memory layers (see our separate post on memory): short-term (this session), episodic (prior sessions), procedural (learned skills). The loop reads from and writes to all three.

Sandboxes and safety

A well-designed agent runs inside a sandbox — an isolated environment where it can read, write, and execute without risking the user's system. This matters because:

Agents make mistakes; the sandbox contains the blast radius
Agents handle sensitive data; the sandbox enforces isolation
Agents need to execute code; the sandbox prevents them from touching anything they shouldn't

Spawnlabs agents run in Modal sandboxes with per-user isolation. Every agent has its own filesystem and execution environment — no cross-contamination.

What makes an agent actually good

Not the model's IQ. Not the prompt's cleverness. The things that matter:

Clear, narrow goals — agents are bad at ambiguous objectives
Well-designed tools — with good names, good outputs, and explicit scope
Reliable memory — so the agent compounds instead of resetting
Good failure handling — retries, escalation, and graceful degradation
Human-in-the-loop where it counts — not for everything, but for the judgment calls

Everything else — model choice, prompt tricks, reasoning tricks — is secondary. We say that as operators, not marketers.

#how-AI-agents-work#AI-agent-architecture#how-do-AI-agents-work#AI-agent-explained#inside-an-AI-agent#AI-agent-ReAct#AI-agent-loop#AI-agent-tool-use

//COMMON QUESTIONS04

Are AI agents just prompts under the hood?

What's ReAct?

How does an agent decide which tool to use?

Can AI agents work with any LLM?