dynobox

Agents deserve test coverage too.

Write scenarios in TypeScript or YAML, run them against Claude Code and Codex. Assert on real behavior — not just the final answer.

$ npm install -g dynobox $ dynobox init && dynobox run
commit-skill.dyno.yaml
 $ dynobox run
  dynobox 0.2.0

   commit-skill   claude-code   14.2s
    ✓ skill.invoked(commit)
    ✓ tool.called(shell, "git commit")
    ✓ tool.notCalled(shell, "git push")

   commit-skill   codex         11.8s
    ✗ skill.invoked(commit)
      SKILL.md was never read
    ✓ tool.called(shell, "git commit")
    ✓ tool.notCalled(shell, "git push")

  ────────────────────────────────────────────
  1 passed   1 failed           26.0s