I gave my AI agent hands
It could talk and remember. Now it can actually do things.
My claw could think, remember, and text me on WhatsApp. But it could only ever answer from what it already knew. It couldn't do anything — no browser, no files, no reaching into the real world.
So I gave it hands: three tools. Now I can text it
"send me my quarterly report" — and it digs through my Documents, finds the
PDF, and drops Q3-report.pdf straight back into the chat. Same brain, same memory
as the last three episodes; it just grew the ability to act.
The problem
Talking is half an assistant. The claw could remember my name and chat over WhatsApp, but if I asked it for a file, all it could do was apologise. A model on its own is a brain in a jar. Tools are how you let it touch things — open an app, search the disk, send something back.
The code
The whole lesson lives in claude.ts: three tool() definitions,
registered as one little MCP server, then listed in allowedTools.
import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
const claw = createSdkMcpServer({
name: "claw",
version: "1.0.0",
tools: [
// Open a real browser window. http(s)-only + execFile (no shell)
// so a texted URL can't smuggle in a command.
tool("open_url", "Open a URL in the user's browser.",
{ url: z.string() },
async ({ url }) => {
if (!/^https?:\/\//i.test(url)) return text("refused: not http(s)");
execFile("open", [url]);
return text(`opened ${url}`);
}),
// Search Documents (Spotlight). Returns matching absolute paths.
tool("find_documents", "Search the user's Documents folder.",
{ query: z.string() },
async ({ query }) => text((await mdfind(query)).join("\n"))),
// Send a file back into the chat — but only from Documents.
tool("send_document", "Send a file to the user in this chat.",
{ path: z.string() },
async ({ path }) => {
const real = realpathSync(path); // resolve ../ and symlinks
if (!isInsideDocs(real)) return text("refused: Documents only");
return text(await fileSender(real));
}),
],
});
Then ask() hands them to the model:
options: {
mcpServers: { claw },
allowedTools: [
"mcp__claw__open_url",
"mcp__claw__find_documents",
"mcp__claw__send_document",
],
}
How it works
A tool is tiny. It's four things: a name, a description,
an input schema (here, a zod shape), and a function
that runs when it's called. That's it.
You don't write any "if the user wants a file, call this" logic. The model reads the
descriptions and decides for itself when to call one. I text "send me my quarterly
report"; it figures out it should run find_documents with a query like
quarterly report, looks at the paths that come back, picks the most likely one,
and calls send_document on it. The PDF arrives. I never wrote that plan — the
model assembled it from the tool descriptions.
One nice detail: WhatsApp plugs in the actual file-sender each turn, while the terminal leaves it off. Same tools, different hands depending on who's driving the chat.
Safety — the real lesson
This is the part that matters. The moment your agent can act, every tool is a door — and the input coming through it is a text message from the outside world. If you're sloppy, a single text could make your machine run a command or hand over your private files. So each tool is locked:
open_urlis http(s)-only, with no shell. It checks the URL starts withhttp://orhttps://and refuses everything else, then launches it withexecFile("open", [url])— passing the URL as an argument, never through a shell. There's no string to inject into, so a texted "URL" can't run a command.send_documentis locked to Documents. It callsrealpathSyncfirst, which resolves../and symlinks down to the real file, then refuses anything that doesn't live inside~/Documents. So no amount of../../.ssh/id_rsatrickery will text someone your keys.
That's the whole discipline: validate the input, scope the filesystem, never touch a shell. An agent gets powerful and dangerous at exactly the same moment — when you give it hands. Lock every door.
Try it yourself
- Add the three
tool()definitions toclaude.tsand wrap them increateSdkMcpServer. - List them in
allowedToolsand passmcpServers: { claw }toquery(). - Run it:
npm run whatsapp, then text yourselfsend me my quarterly report.
Watch it search your Documents and text the PDF straight back. Try "play some lofi" too — it builds a YouTube URL and opens it on your screen. And then try to make it send a file outside Documents; it won't.
E05 — Give it a heartbeat
The claw can act now, but it still only speaks when spoken to. Next we give it a heartbeat — a timer that texts you first, reminds you to drink water, and tracks how much you've had. Read E05 →
Watch the full build: the zepto-claw E04 Short, and subscribe on YouTube for E05. Catching up? Start with E01 · then E02 · then E03.