
Skills, Tools, and Agents: A Founder's Guide to Building With AI in 2026
Skills, tools, and agents are not the same thing — and picking the wrong one is the fastest way to overcomplicate the AI product you're building. The decision tree we're using as we build at Baxter Labs, plus three failure modes we keep watching out for.
TL;DR: Skills, tools, and agents are not the same thing. Tools = new capabilities the model can call. Skills = packaged ways the model should work. Agents = a loop that takes actions over time with state. Picking the wrong abstraction is the fastest way to make whatever you're building harder than it needs to be. Here's the decision tree we're using as we build at Baxter Labs.
If you've spent any time in AI builder communities in the last 18 months, you've heard the words skill, tool, and agent used as if they were synonyms. They're not. Picking the wrong one for your problem is the single most common reason we see founders ship overcomplicated products that drift in production.
This is the framing we'd hand our past selves 18 months ago — and we're still pressure-testing it as we build.
The three words, defined sharply
Forget the marketing copy. Here's how we're using these terms as we build at Baxter Labs.
Tool
A tool is a function the model can call. It has a name, a typed input schema, a typed output, and one job. send_email, query_database, get_weather. The model decides when to call it; you decide what it does.
Tools are stateless from the model's perspective. The model sees a result and reasons about it. Anthropic's tool-use API, OpenAI's function calling, MCP's tools — all the same concept under different names.
Skill
A skill is a packaged way of working — instructions, conventions, and sometimes scripts — that the model loads when it encounters a relevant situation. Think of it as a runbook the model reads on demand. "When asked to write a blog post in the company voice, follow this checklist." "When debugging a Python ML pipeline, start by checking these five things."
Skills are about process, not function. They don't add new capabilities to the model — they shape how the model uses the capabilities it already has. Claude Code's Skills, OpenAI's Custom GPTs (sort of), and Cursor rules all live on this layer.
Agent
An agent is a loop. The model takes input, decides on an action (often by calling a tool), observes the result, and decides what to do next — possibly for many turns, possibly indefinitely. An agent has identity, memory, goals, and the authority to act over time.
A chatbot is not an agent. A chatbot that schedules your week, books your travel, and sends you the receipts is an agent. The difference is autonomy across time and surfaces.
The decision tree we're using
When we're scoping a new feature at Baxter Labs, this is what we run through:
- Does the model need a new capability it doesn't have? (Talk to a database, hit an API, send a message.) → Tool.
- Does the model already have the capabilities, but you want it to use them in a specific, repeatable way? → Skill.
- Does the work require multiple steps, possibly across hours or days, with state and decisions in between? → Agent.
Most products need at least two of three. The mistake we keep seeing (and have made ourselves) is using all three when one would do, or reaching for one when you needed all three.
How we're mixing all three at Baxter Labs
Aeiva (mostly tools + skills, no agent loop — by design)
Aeiva, the longevity app we're building (AI health twin under the hood), is conversational but not agentic. The user opens the app, the model reads their wearable data (tools: get_sleep, get_hrv, get_workouts), follows the "health coach response" skill (a structured way of communicating recommendations with appropriate uncertainty), and outputs a plan. No long-running loop. No autonomous action.
That's a deliberate call. A health product that takes autonomous actions is a liability we don't want to underwrite right now. We're choosing not to make Aeiva an agent — at least at this stage. Knowing when not to reach for the bigger abstraction is half the skill.
A voice agent for lead qualification (full agent + tools + skills)
Consider a voice agent that picks up the phone, qualifies a lead, books an appointment, and writes up the call in a CRM. That's unambiguously an agent — it loops over many turns, holds state across the conversation, decides between branches (qualified vs. not, hot vs. nurture), and takes real-world actions (calendar invites, CRM writes).
Underneath, it uses tools (book_appointment, create_crm_lead, transfer_to_human) and a skill that defines the conversational style for the use case (a real-estate voice agent talks differently than a medical intake one). Three layers working together, each doing exactly its job.
AEDT (the platform underneath robotic agents we're designing)
AEDT goes one level higher: it's the platform we're designing for robotic agents — for designing them, simulating them, deploying them. So AEDT is concerned with skills (how does a robot declare what behaviors it's good at?), tools (how does it call shared capabilities — vision, motion planning, perception — safely?), and the agent runtime itself (memory, evaluation, permissioning, sim-to-real handoff). As we design AEDT, the goal is for all three abstractions to compose cleanly across software and hardware. We're not there yet — we're building toward it.
Three failure modes we keep seeing
1. Tool overload
Founders give their model 47 tools and wonder why it picks the wrong one. Models get bad at tool selection past about a dozen options. Group related tools into a meta-tool with an enum, or split into multiple specialized agents. We've fallen into this ourselves more than once.
2. Skill as a substitute for tool
You can't write your way out of needing a real API. If the model needs to query a database, "tell the model to query the database in its prompt" isn't a skill — it's wishful thinking. Build the tool.
3. Agent when a single call would do
Most "AI agents" you see on Twitter are unnecessarily agentic. If the work is one input → one output and the user is willing to wait, you don't need a loop. You need a smart prompt with tools. Agents add complexity, evaluation surface, and failure modes — only reach for them when the work genuinely requires autonomy across time.
What's next
The next decade of AI software is going to be defined by how well teams compose these three layers. The frameworks (MCP, Anthropic's Agent SDK, Vercel's AI SDK, etc.) are all racing to make this composition easier. We'll cover the MCP layer specifically in the next post — it's the unification story for tools across providers, and it's a piece we're betting on heavily.
If you're building in this space and want to compare notes mid-build, my inbox is open. The fastest way to internalize the differences between these three layers is to design something real with all of them — pick a small workflow, identify which abstraction each step needs, and build it. We're learning the same way.
— Eshwar PK, Founder, Baxter Labs