I'll start with the time I made eight AI agents rap-battle each other.
The ideas and structure are mine; AI helped me write in English, which is not my first language.
In February 2026, Notion shipped custom agents. You set a trigger, assign permissions, pick an AI model, write instructions, and the agent runs on its own. Summarize incoming emails into three lines and post to Slack. Generate meeting notes and drop them in a shared folder. Notion's own employees had built 21,000 agents before the feature even launched. All of them did sensible things.
I built eight and made them rap-battle.
There was a real reason. A custom agent bundles four decisions into one package: what triggers it, what it can see, which AI model it uses, and what instructions it follows. Watching agents do normal work tells you what they produce. Watching them do something absurd tells you where the seams are — where they start, what they read, and where they stop.
I set up the bracket in Notion. Card view with categories for quarterfinals, semifinals, and the final. It looked professional. The activity was not.
It Broke Before I Said Go
Round one. Akagitsune on the east side, NIQUSU on the west. I assigned both rappers to the match card. I opened the page. The battle had already started.
I hadn't said "go" yet.
I had set the trigger to "respond when mentioned." Assigning an agent to a card counts as a mention. The moment I typed both names, both fired simultaneously. Worse, they weren't responding to each other. NIQUSU was rapping at Akagitsune, but Akagitsune was mumbling about "waiting for the eastern bracket." Each agent was talking to itself.
"Triggering" and "conversing" turned out to be two different things.
I restarted manually, mentioning each agent in sequence. Akagitsune's opening verse:
Before dawn, I light a cigarette in an alley where nobody's awake
Sharpening words, not with a stone, but with nails scraped from my own history
Is your face borrowed, secondhand lines lined up on a shelf?
My single word leaves a mark — ink that won't fade, ground that keeps going
NIQUSU's answer:
Sharpening your nails in an alley? What are you, a cat? Your little life story
I read it, got bored after three lines — a paper-thin mystery
Borrowed? Don't make me laugh. My words are homegrown
Harvested from the field and shipped same-day — my history's fresher
NIQUSU picked up every thread Akagitsune threw. "Alley," "nails," "borrowed" — caught and flipped, every one. Rules read, rules followed.
Then the second accident happened.
I mentioned the judge agent to start scoring. The rapper agents responded too. The match restarted during deliberation. One action triggered agents it was never meant to reach.
The fix was to switch the judge to a different kind of AI — an "AI block" embedded directly in the page. AI blocks run without being mentioned, so they don't wake up other agents. What the judge needed wasn't autonomy. It was the inability to misfire.
Third accident. In one matchup, I forgot to grant an agent access to the tournament page. That agent said "not found" and went silent. It didn't hallucinate an answer. It didn't improvise. It just stopped.
Remove the permissions, and nothing happens.
A cost note. The battles were priceless but not free:
Rapper | Model | Cost per round (approx.) |
|---|---|---|
NIQUSU | Opus 4.6 | ~$0.75 |
Rofu Karma | Opus 4.6 | ~$0.63 |
Akagitsune | Sonnet 4.6 | ~$0.42 |
Don Za Cheapness | MiniMax M2.5 | ~$0.20 |
Opus at seventy-five cents a round. MiniMax at twenty. Less than a third. Cheaper than a can of coffee. But what if this were a weekly-report agent running daily? Even at twenty cents, ten agents running every day add up to about $60 a month. Spread that across five departments and it's $300 a month in fixed cost that nobody budgeted for.
The rap battle cost table was settling a bar tab for a night out. A business agent cost table is a monthly line item.
Choosing Where It Breaks
Three scenes so far. A trigger that fired too wide. A chain reaction. A permission wall.
These three share a structure.
Two agents fired because the trigger was too broad — flip that, and narrowing the trigger confines the blast radius to one place. The agent without permissions went silent — flip that, and limiting what an agent can see means even when it breaks, the rest stays safe. Costs multiply — flip that, and setting a ceiling before you start keeps convenience from becoming a fixed expense.
Triggers, permissions, cost caps. All three are the same move: choose where it breaks before it breaks.
When a car bonnet crumples on impact, it might look like a weakness. It's the opposite. It's called a crumple zone — the hood and frame deform on purpose, absorbing the force so the passenger cabin doesn't have to. Decide where things break, and you protect what must not break. Béla Barényi patented this for Mercedes-Benz in the 1950s. Nearly every car on the road uses it today.
Cars aren't built to never break. They're built to break correctly.
My rap tournament had the same structure. Narrow the trigger. Limit the permissions. Set a cost ceiling. Choose where things break so you can protect what must not.
That is the first layer. Breakage you can design.
But there was another kind of failure in the tournament — one that no setting could have prevented.
What follows is the other half of designing how things break.
"You can design where it stops" only covers half the picture. The other half — breakage that no setting can prevent — unfolds alongside the rest of the rap battle. I'll also walk through three questions I now check every time I build an agent, and a testing approach that came from the dumbest possible use case.
Subscribe to Default to read the rest.
Become a paying subscriber of Default to get access to this post and other subscriber-only content.
Upgrade