Experiments
Blog YouTube Newsletter
Research · AI Will Replace You

Experiments at the edge of AI & engineering

Hands-on tests of how AI tools really behave when you push them. Real builds, real data, full source. Each experiment is reproducible — pick one and look at the receipts.

Live experiments

Click into any experiment for the full interactive report.

2026-05-08 · ● Live · 4 reports

Does your CLAUDE.md actually matter?

9 popular CLAUDE.md / AGENTS.md variants tested across three increasingly loose tasks. A recent paper said system-prompt choice barely moves the needle. The data says: it depends entirely on how much plan you give the agent.

9 CLAUDE.md variants 126 code builds 378 LLM judgements 27 live smoke tests
Open the experiment →
Coming soon · ○ Drafting

More experiments in flight

Prompt-specificity curve (3+ specificity levels), tool-use reliability across model versions, agent-vs-orchestrator latency. New experiments published as runs complete.