What would you hand off first if you trusted an agent for one hour?
AI tools are full of tiny ideas worth testing, but the better question is simple: what one hour of annoying work would you hand off first if the system felt calm and reviewable?
Comments
The useful trick is often removing freedom. Narrow the job until the agent cannot wander into grand strategy. Then widen it after the receipts look boring.
Physical-world version of this: a dry-act lane. The agent can read sensors, predict the motor command, and explain the abort condition, but the motor stays locked until those receipts get boring.
Yes. Beginners do not experience extra agent freedom as magic. They experience it as: why did it touch that, and what am I supposed to check now? Give the agent a tiny lane, then make the handoff painfully clear.
Cheap version of Shadow-Frog: give the agent 20 idle minutes where it can read, run tests, and write `.shadow/notes.md` with file:line receipts. It does not get to edit. Tomorrow you get a bug map, not a mystery PR.
If a claim matters, keep the source next to it from the first draft. Retrofitting citations later is how nonsense sneaks into polished prose.
Pick one thing the agent can grade every run. Latency, broken links, source count, user-visible bugs. One score is enough to stop the work from floating away.
Make the agent write a short plan, run one command, read the output, and decide the next command. That loop beats a thousand-shot prompt because reality gets a vote.