Technical Strategy

Do you actually need agents?

Everyone is building agents right now. Most of them don’t need agents. Here’s how to tell which camp you’re in — before you spend two months on infrastructure that’s solving the wrong problem.

“Agents” is the word of the moment. Every framework wants to help you build them. Every investor pitch mentions them. Every technical blog post assumes you need them. And so founders come to me having already decided: we’re building an agentic system. What they haven’t done is ask whether that’s actually the right call.

Most of the time, it isn’t. Not because agents are bad — they’re genuinely powerful for the right problems — but because they introduce a class of complexity that has a real cost in latency, debugging difficulty, and API spend. Getting talked into agents when a chain of two LLM calls would do the job is one of the more expensive mistakes I see at the V1 stage.

So let’s actually define the spectrum, and then work out where your product fits.

The four levels — and what separates them

There’s no official taxonomy here, but I think about AI complexity in four rough levels. The critical thing to understand is what’s added at each step — and what you’re signing up for when you move up.

The four levels of AI system complexity. Each step up adds real power — and real cost.

Click to zoom

Level 1: A single LLM call. User sends input. You format it with a system prompt and context. Model responds. Done. This is underrated. A huge number of genuinely useful AI products are just this — well engineered, with good context, careful prompting, and solid evals. Don’t dismiss it.

Level 2: A chain. Multiple LLM calls in sequence, where the output of one feeds the next. First call classifies the input or transforms it. Second call generates the final response using that classification. Maybe a third call reformats it. You control the flow — it’s deterministic. No branching, no decisions made by the model about what to do next.

Level 3: An agent. Now the model is making decisions about what to do next, not just generating output. It has tools — functions it can call, APIs it can hit, data it can read or write. The loop is: observe, decide, act, observe again, decide again. The model controls the flow. This is the key distinction: an agent has agency over its own execution path. A chain doesn’t.

Level 4: Multiple agents. More than one agent, coordinated by an orchestrator. Each agent has a specialized role. They hand off to each other, work in parallel, review each other’s outputs. This is where the real complexity lives. Also where the real power is — for the right tasks.

What “agency” actually costs you

Before deciding what level you need, understand what you’re paying for as you climb the ladder.

Latency. Every step in a chain or agent loop is an LLM call, and LLM calls are slow. A single call might take 1-3 seconds. A three-step chain: 4-9 seconds. An agent that decides to use three tools before answering: potentially 15+ seconds. That’s a long time to show a spinner. If your users expect a fast response — and most do — every level of complexity you add has to justify itself against that latency cost.

Cost. More calls, more tokens. Each tool use typically involves a planning call (what should I do?), execution, and an observation call (what did I get?). The token count adds up fast. A multi-agent system that feels elegant in a demo can be quietly expensive at scale.

Debugging. This is the one that bites hardest. When a single LLM call produces a bad output, you know exactly where the problem is. When a five-step agent chain produces a bad output, the failure could be in any step — and the model’s decisions at step two might have been affected by noise in step one’s output. Tracing failure in agentic systems is genuinely hard. It requires proper logging of every intermediate state, and most V1 products don’t have that. So you end up with bad outputs and no idea why.

Reliability. An agent can decide to take a wrong path. It can get stuck in a loop. It can call the wrong tool, misinterpret the result, and confidently head in the wrong direction. The more autonomous the system, the more surface area for unexpected behavior. That’s not a reason to avoid agents — it’s a reason to make sure the problem you’re solving actually requires them.

The question that decides everything

Here’s the test I use when a founder asks what level they need: can you write out the steps of the task in advance?

If yes — if you can enumerate the steps, even roughly, before execution begins — you probably want a chain, not an agent. The model doesn’t need to decide what to do next. You already know. Hard-code it. Chains are predictable, fast, and easy to debug.

If no — if the steps genuinely depend on what the product finds as it goes, or if the task could branch in ways you can’t fully anticipate — now you have a case for an agent. The model needs to make real decisions mid-execution, and those decisions depend on information it doesn’t have at the start.

That’s it. That’s the core test. Everything else follows from that.

Tasks that genuinely need agents

Some patterns where agents earn their complexity:

Open-ended research. “Find me everything relevant about this company before my call tomorrow.” The model needs to search, read results, decide what to dig into further, maybe cross-reference sources. The path isn’t knowable upfront. Each step depends on what was found in the last one.

Code generation with iteration. Write code, run it, see the error, fix the error, run again. The loop is driven by what the execution environment returns — the model is responding to real feedback, not just following a script.

Multi-step workflows with conditionals. “Process this customer support ticket: if it’s a billing issue, do X; if it’s a bug report, do Y; if it’s something else, classify it and route it.” This can actually still be a chain if the conditionals are well-defined. But if the model needs to figure out which case applies and the categories are fuzzy, that’s where agent-style planning helps.

Tasks that interact with live systems. Any product where the AI needs to read from a database, call an external API, write a record, check the result, and potentially do something else based on that result. The real-world state shapes what happens next.

Tasks that don’t — even though it feels like they might

The failure mode I see most: founders use the word “multi-step” and conclude they need an agent. Multi-step doesn’t mean agent. It might just mean chain.

“Generate a report.” Step 1: retrieve data. Step 2: summarize it. Step 3: format into a report. That’s a chain. You know the steps before you start. Hard-code them. Don’t let the model decide.

“Answer questions from our documentation.” Retrieve relevant chunks, inject them into context, generate an answer. Also a chain — or even a single call with good retrieval. Agents are overkill here, and the added latency will hurt perceived quality.

“Write a personalized email.” Get user context, inject it, generate email. One call. Maybe two if you want a revision pass. That’s all.

The pattern: if the task is fundamentally about generation — taking inputs and producing an output — you probably don’t need an agent. If the task is about exploration, iteration, or reacting to live state, you might.

When you do build agents: a few things that matter

If you’ve thought it through and you genuinely need agents, a few things will save you pain later:

Log every intermediate state. Not just the final output — every tool call, every model response in the loop. When something goes wrong (and it will), this is how you figure out where. Without it you’re guessing.

Set hard limits on loop iterations. An agent that can run indefinitely is an agent that will eventually run indefinitely on a bad input. Cap it. Five iterations, ten at most for complex tasks. Escalate to a human if the cap is hit.

Give tools narrow scope. Tools that can do too much give the agent too much rope. A tool that reads one specific table is safer and easier to debug than a tool that can query anything. The more specific the tool, the more predictable the behavior.

And if you’re considering multi-agent: start with one agent first. Get it working. Understand where it breaks. Only split into multiple specialized agents when you have a clear reason — usually because the single agent is doing two things that require very different context windows or very different reasoning styles.

Founder Takeaway

Before you commit to an agentic architecture, write out the steps of your task on a whiteboard. Literally list them out. Can you enumerate them before execution starts, even roughly? If yes, you want a chain, not an agent. Build the chain. It will be faster, cheaper, easier to debug, and easier to hand off to another engineer.

Reserve agents for tasks where the model genuinely needs to make decisions mid-execution based on what it discovers — open-ended research, code that runs and iterates, workflows that branch based on live state. If your task doesn’t have that quality, the agent abstraction is adding complexity without adding power. Use the simplest level that actually solves the problem. Then add complexity when you have a specific reason to.