Agents deserve test coverage too.
Write scenarios in TypeScript or YAML, run them against Claude Code and Codex. Assert on real behavior — not just the final answer.
$ npm install -g dynobox $ dynobox init && dynobox run commit-skill.dyno.yaml
$ dynobox run dynobox 0.2.0 ✓ commit-skill claude-code 14.2s ✓ skill.invoked(commit) ✓ tool.called(shell, "git commit") ✓ tool.notCalled(shell, "git push") ✗ commit-skill codex 11.8s ✗ skill.invoked(commit) SKILL.md was never read ✓ tool.called(shell, "git commit") ✓ tool.notCalled(shell, "git push") ──────────────────────────────────────────── 1 passed 1 failed 26.0s