zepto-claw · Episode 4

I gave my AI agent hands

It could talk and remember. Now it can actually do things.

Building your own claw — a series where we build a tiny AI agent one small piece at a time.

My claw could think, remember, and text me on WhatsApp. But it could only ever answer from what it already knew. It couldn't do anything — no browser, no files, no reaching into the real world.

So I gave it hands: three tools. Now I can text it "send me my quarterly report" — and it digs through my Documents, finds the PDF, and drops Q3-report.pdf straight back into the chat. Same brain, same memory as the last three episodes; it just grew the ability to act.

The problem

Talking is half an assistant. The claw could remember my name and chat over WhatsApp, but if I asked it for a file, all it could do was apologise. A model on its own is a brain in a jar. Tools are how you let it touch things — open an app, search the disk, send something back.

The code

The whole lesson lives in claude.ts: three tool() definitions, registered as one little MCP server, then listed in allowedTools.

import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";

const claw = createSdkMcpServer({
  name: "claw",
  version: "1.0.0",
  tools: [
    // Open a real browser window. http(s)-only + execFile (no shell)
    // so a texted URL can't smuggle in a command.
    tool("open_url", "Open a URL in the user's browser.",
      { url: z.string() },
      async ({ url }) => {
        if (!/^https?:\/\//i.test(url)) return text("refused: not http(s)");
        execFile("open", [url]);
        return text(`opened ${url}`);
      }),

    // Search Documents (Spotlight). Returns matching absolute paths.
    tool("find_documents", "Search the user's Documents folder.",
      { query: z.string() },
      async ({ query }) => text((await mdfind(query)).join("\n"))),

    // Send a file back into the chat — but only from Documents.
    tool("send_document", "Send a file to the user in this chat.",
      { path: z.string() },
      async ({ path }) => {
        const real = realpathSync(path);          // resolve ../ and symlinks
        if (!isInsideDocs(real)) return text("refused: Documents only");
        return text(await fileSender(real));
      }),
  ],
});

Then ask() hands them to the model:

options: {
  mcpServers: { claw },
  allowedTools: [
    "mcp__claw__open_url",
    "mcp__claw__find_documents",
    "mcp__claw__send_document",
  ],
}

How it works

A tool is tiny. It's four things: a name, a description, an input schema (here, a zod shape), and a function that runs when it's called. That's it.

You don't write any "if the user wants a file, call this" logic. The model reads the descriptions and decides for itself when to call one. I text "send me my quarterly report"; it figures out it should run find_documents with a query like quarterly report, looks at the paths that come back, picks the most likely one, and calls send_document on it. The PDF arrives. I never wrote that plan — the model assembled it from the tool descriptions.

One nice detail: WhatsApp plugs in the actual file-sender each turn, while the terminal leaves it off. Same tools, different hands depending on who's driving the chat.

Safety — the real lesson

This is the part that matters. The moment your agent can act, every tool is a door — and the input coming through it is a text message from the outside world. If you're sloppy, a single text could make your machine run a command or hand over your private files. So each tool is locked:

open_url is http(s)-only, with no shell. It checks the URL starts with http:// or https:// and refuses everything else, then launches it with execFile("open", [url]) — passing the URL as an argument, never through a shell. There's no string to inject into, so a texted "URL" can't run a command.
send_document is locked to Documents. It calls realpathSync first, which resolves ../ and symlinks down to the real file, then refuses anything that doesn't live inside ~/Documents. So no amount of ../../.ssh/id_rsa trickery will text someone your keys.

That's the whole discipline: validate the input, scope the filesystem, never touch a shell. An agent gets powerful and dangerous at exactly the same moment — when you give it hands. Lock every door.

Try it yourself

Add the three tool() definitions to claude.ts and wrap them in createSdkMcpServer.
List them in allowedTools and pass mcpServers: { claw } to query().
Run it: npm run whatsapp, then text yourself send me my quarterly report.

Watch it search your Documents and text the PDF straight back. Try "play some lofi" too — it builds a YouTube URL and opens it on your screen. And then try to make it send a file outside Documents; it won't.

Next episode

E05 — Give it a heartbeat

The claw can act now, but it still only speaks when spoken to. Next we give it a heartbeat — a timer that texts you first, reminds you to drink water, and tracks how much you've had. Read E05 →

Watch the full build: the zepto-claw E04 Short, and subscribe on YouTube for E05. Catching up? Start with E01 · then E02 · then E03.