zepto-claw · Episode 4

I gave my AI agent hands

It could talk and remember. Now it can actually do things.

Building your own claw — a series where we build a tiny AI agent one small piece at a time.

▶ Watch the build on YouTube

My claw could think, remember, and text me on WhatsApp. But it could only ever answer from what it already knew. It couldn't do anything — no browser, no files, no reaching into the real world.

So I gave it hands: three tools. Now I can text it "send me my quarterly report" — and it digs through my Documents, finds the PDF, and drops Q3-report.pdf straight back into the chat. Same brain, same memory as the last three episodes; it just grew the ability to act.

The problem

Talking is half an assistant. The claw could remember my name and chat over WhatsApp, but if I asked it for a file, all it could do was apologise. A model on its own is a brain in a jar. Tools are how you let it touch things — open an app, search the disk, send something back.

The code

The whole lesson lives in claude.ts: three tool() definitions, registered as one little MCP server, then listed in allowedTools.

import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";

const claw = createSdkMcpServer({
  name: "claw",
  version: "1.0.0",
  tools: [
    // Open a real browser window. http(s)-only + execFile (no shell)
    // so a texted URL can't smuggle in a command.
    tool("open_url", "Open a URL in the user's browser.",
      { url: z.string() },
      async ({ url }) => {
        if (!/^https?:\/\//i.test(url)) return text("refused: not http(s)");
        execFile("open", [url]);
        return text(`opened ${url}`);
      }),

    // Search Documents (Spotlight). Returns matching absolute paths.
    tool("find_documents", "Search the user's Documents folder.",
      { query: z.string() },
      async ({ query }) => text((await mdfind(query)).join("\n"))),

    // Send a file back into the chat — but only from Documents.
    tool("send_document", "Send a file to the user in this chat.",
      { path: z.string() },
      async ({ path }) => {
        const real = realpathSync(path);          // resolve ../ and symlinks
        if (!isInsideDocs(real)) return text("refused: Documents only");
        return text(await fileSender(real));
      }),
  ],
});

Then ask() hands them to the model:

options: {
  mcpServers: { claw },
  allowedTools: [
    "mcp__claw__open_url",
    "mcp__claw__find_documents",
    "mcp__claw__send_document",
  ],
}

How it works

A tool is tiny. It's four things: a name, a description, an input schema (here, a zod shape), and a function that runs when it's called. That's it.

You don't write any "if the user wants a file, call this" logic. The model reads the descriptions and decides for itself when to call one. I text "send me my quarterly report"; it figures out it should run find_documents with a query like quarterly report, looks at the paths that come back, picks the most likely one, and calls send_document on it. The PDF arrives. I never wrote that plan — the model assembled it from the tool descriptions.

One nice detail: WhatsApp plugs in the actual file-sender each turn, while the terminal leaves it off. Same tools, different hands depending on who's driving the chat.

Safety — the real lesson

This is the part that matters. The moment your agent can act, every tool is a door — and the input coming through it is a text message from the outside world. If you're sloppy, a single text could make your machine run a command or hand over your private files. So each tool is locked:

That's the whole discipline: validate the input, scope the filesystem, never touch a shell. An agent gets powerful and dangerous at exactly the same moment — when you give it hands. Lock every door.

Try it yourself

  1. Add the three tool() definitions to claude.ts and wrap them in createSdkMcpServer.
  2. List them in allowedTools and pass mcpServers: { claw } to query().
  3. Run it: npm run whatsapp, then text yourself send me my quarterly report.

Watch it search your Documents and text the PDF straight back. Try "play some lofi" too — it builds a YouTube URL and opens it on your screen. And then try to make it send a file outside Documents; it won't.

Watch the full build: the zepto-claw E04 Short, and subscribe on YouTube for E05. Catching up? Start with E01 · then E02 · then E03.