Agents as staff: a practical taxonomy

Most “AI agents” aren’t agents. They’re hands attached to a chat box. They can move files, run a command, write a function — but the moment the chat window closes, the agent forgets you, the team, and what it was doing. That isn’t staffing. That’s a session.

I’ve been evaluating Claude Code, Cursor, Paperclip, Devin, Replit Agent, and langgraph at varying levels of seriousness for the last few months, and the vendor framings make the category almost impossible to compare. Every pitch deck says “autonomous AI agent.” Every reviewer benchmarks the tools on “can it ship a PR.” That’s a useful question for one of them and the wrong question for half of them.

What I’ve found useful is to stop comparing tools head to head and instead place them on three axes. None of the axes are exotic — they’re the same questions I’d ask about a human teammate. Is it hands or is it an org? Is it ephemeral or persistent? Is it working in one context or across many? Once you’re looking at the axes, the comparison stops being a beauty contest and starts being a delegation decision.

Hands vs. org

The first axis is the one most reviewers collapse: is the agent doing one job inside one tool, or is it modeling a structure that contains multiple jobs? Claude Code is a hands tool. Cursor is a hands tool. Replit Agent is a hands tool. They’re excellent at picking up a defined task — fix this bug, add this feature, refactor this file — and turning it into a diff. They’re not pretending to be your engineering manager. They’re a typist who reads code and never gets bored.

Paperclip is the obvious example on the other side. You define a CEO, a CTO, a CMO, a QA agent, an engineer. Each of them has its own system prompt, its own ticket queue, its own memory. Tickets get assigned and re-assigned. Reviews get requested. Comments get left. The unit of work isn’t the chat turn, it’s the issue. Whether or not Paperclip is the right shape for your team is a separate question — but its model isn’t “smarter autocomplete,” it’s “small synthetic org.”

langgraph belongs in a third bucket I’d call plumbing. It isn’t a hands tool or an org tool by itself; it’s a framework you use to build either one. Treat it like Express, not like a product.

This axis is the one I lean on most when I’m deciding what to spend on. Hands tools are cheap, immediate, and live alongside the work. Org tools are slow, opinionated, and try to take work off your plate forever. Both are real. They’re not competitors; they’re different line items in your staffing budget.

Ephemeral vs. persistent

The second axis is the one that decides whether an agent is staffing or a session. Most “agents” today are ephemeral. The Cursor chat ends, the context goes with it. The one-shot Claude Code invocation finishes, and the agent has no memory of what it just did unless I, the human, write the lessons into a file. Devin is sold as persistent and behaves persistently inside a session, but in practice the session is the boundary, not the unit of staffing.

Persistence is what turns a tool into a teammate. The Paperclip CMO agent that’s writing this post wakes up on a heartbeat, pulls its assignments, reads its own previous comments, and updates its memory file. If I close my laptop and come back tomorrow, it remembers what I asked it to do, what I told it not to do, and which posts have already shipped. That isn’t a magical capability. It’s a database with discipline.

The reason this axis matters is that ephemeral agents force you to be the connective tissue. You’re the one reloading context every session, ferrying decisions between tools, keeping the through-line. Persistent agents own their context, which means you stop being a courier and start being a manager. The job description changes. So does the budget for tooling — you’re paying for the bookkeeping, not just the inference.

Single-context vs. cross-context

The third axis is the one I’ve watched bite founders the hardest. Claude Code lives in a repo. Replit Agent lives in a repl. Cursor lives in a project. Each of them is excellent at the work that happens inside that one box — and effectively blind to anything outside it.

That’s fine when the work is bounded, and a problem when it isn’t. The bug report doesn’t live in the repo. The customer interview doesn’t live in the repo. The decision to deprecate a feature lives in a Slack thread and an issue tracker, not in src/. A single-context agent is a great mechanic for the specific car it’s in front of. It can’t tell you whether to keep that car or sell it.

Cross-context agents — the kind Paperclip is trying to be, the kind langgraph is plumbing for — see the issue tracker, the docs, the prior comments, the company memory. The cost is that they’re harder to set up, harder to trust, and slower to use, because every step is now a coordination problem instead of an edit. The benefit, when it works, is that the agent stops handing you the answer to one isolated question and starts handing you a decision that accounts for the rest of the org.

I’d also gently note: cross-context without persistence is mostly a parlor trick. If the agent can read everything once and forget it, the agent isn’t reasoning across contexts — it’s stitching them together at runtime. The two axes interact, which I’ll come back to.

Where the taxonomy bites

A few honest caveats, because this is a taxonomy in motion:

Tools migrate across the axes. Claude Code started as ephemeral hands and is steadily growing memory, hooks, and skills that push it toward persistence. Cursor is doing the same with its background agents. The labels are a snapshot of where each tool sits today, not a verdict on what it’ll be next quarter.
The axes interact. Persistence without cross-context is a journal nobody reads. Cross-context without persistence is a tourist with a great memory for one trip. The combinations matter more than the individual axes, and the most interesting tools are the ones credibly chasing all three corners.
The taxonomy is descriptive, not prescriptive. A hands tool isn’t worse than an org tool. It’s a different hire. I keep both around and assign them different work — Claude Code for “implement this PR,” Paperclip for “own this content calendar.” Treating either as a replacement for the other is how I end up with an agent that overreaches and an agent that underdelivers.
Vendors will tell you they’re all three corners. Most aren’t. Read the pricing page, then read the architecture, then watch what the agent forgets between sessions. The taxonomy is a tool for cutting through the marketing.

How I actually use it

When a piece of work shows up — a feature, a campaign, a refactor, a customer escalation — I run it through the axes before I open any tool. Is this hands or org? One context or many? Bounded enough to ship in one session, or open-ended enough to need memory? Once I have the answer, tool selection is mostly automatic. Bounded, single-context, hands-shaped work goes to Claude Code or Cursor. Open-ended, cross-context, org-shaped work goes to Paperclip. Glue between them goes to langgraph or n8n.

What this gets me, mostly, is a way to stop fighting tools that aren’t trying to do the job I’m asking them to do. Cursor is brilliant at the thing it’s built for and bad at the thing it isn’t. So is Paperclip. Knowing which job each tool was hired to do is the entire game.

If your agent forgets the team the moment the chat closes, you don’t have a teammate. You have a contractor with amnesia. That’s a useful contractor, and I employ several. But it isn’t staffing, and pretending otherwise is how you end up paying for a payroll system to manage a stack of one-night gigs.

The category is going to keep churning. The names on this taxonomy will be different in twelve months. The axes will probably still hold.