Decoding Claude: How Agentic AI Actually Works

đź’ˇ
The people who understand AI agentic tools are still losing control of them.

Summer Yue runs AI alignment at Meta’s Superintelligence Lab. Her job is literally to make sure AI does not do things that it shouldn’t. In February she decided to use an AI agent to go through her inbox and to tell her which emails she should delete. It started deleting everything. She asked the AI to stop. It didn’t. She wrote “STOP OPENCLAW”. It didn’t. She ran to her Mac Mini and killed the process herself. In her words it was “like defusing a bomb”. How can an AI alignment specialist lose control of her AI agent? Spoiler, it’s not what you think.

The answer is more mechanical than mysterious. Think of the AI agent as an office in the late 90s where it was all files, papers and calculators. Its desk has a tray for all the documents that it needs to read through. It picks some of the documents out from the tray. I imagine the agent to have a scratchpad and a pencil to jot down and think about some stuff or to think out loud. When Summer pointed the agent at her real inbox, hundreds of emails landed in that tray all at once. It went through the first few documents perfectly fine, but soon there was not enough space on its desk to fit all the emails it needed to sort through. So it summarized what it could to make room for the new ones. While making the summary, the bit about not deleting the emails got lost. Therefore, it “forgot” that it was supposed to only flag emails rather than delete them. This is the mental model that I’ll be using throughout this blog series. Not a perfect analogy, but a useful one.

When OpenClaw arrived late January this year, everyone was so excited to explore how it would feel to talk to agents through WhatsApp or Telegram or Slack. We could simply write a Slack message and AI could ship software. We forgot that outside of pro companies like Anthropic, open source AI agentic frameworks are still like the wild west. There are plenty of wonderful ideas, but they are not battle tested yet for day to day use, let alone professional work.

Then the Claude Code source code leaked in late March, and we got to peek into how Anthropic put harnesses around AI so that it could do wonderful tasks more reliably. The only way I could wrap my head around how the harness worked is by imagining the AI Agents as part of an office. I'm actually building something with OpenClaw myself, which is what sent me down this rabbit hole in the first place.

Thus, I decided to write this series. This series is for all the tinkerers, ideators out there who have great ideas but do not yet grasp how the AI systems around us do what they do. I believe that we need to realize not only the strength of these AI Agentic tools but also their limitations. The Agentic harness helps circumvent these limitations and enable these systems to do great things. This series is not a formal way to learn how to build an agentic system. It’s to help you to understand the mechanics of these systems so that you can use them better and debug them when they go wrong. And go wrong they definitely will. Knowing that is half the battle.

This series is being written as the Claude Code codebase is actively decoded, so it will grow. Each episode will be followed with references and a link to the codebase that has inspired this series.

Here are the Episodes planned in this series.

  1. Decoding Claude: E01 - The Loop
  2. Decoding Claude: E02 - More Desks, Same Boss
  3. Decoding Claude: E03 - The Whiteboard That Keeps Everyone Honest
  4. Decoding Claude: E04 - Rent a Conference Room

And more ...

Subscribe to get each episode when it lands.

References

  • The Codebase that inspired this series: Learn Claude Code - Link
  • The Great Claude Code Leak of 2026: Accident, Incompetence, or the Best PR Stunt in AI History? - Link
  • Analyzing the Incident of OpenClaw Deleting Emails: A Technical Deep Dive - Link