[ PROMPT_NODE_25835 ]

Test Scenarios

[ SKILL_DOCUMENTATION ]

# Evaluation Test Scenarios Test these scenarios across Haiku, Sonnet, and Opus to verify skill effectiveness. ## Test Setup Before running tests: 1. Create a test project directory with some files 2. Initialize git repository 3. Make some commits 4. Have the skill available ```bash # Setup test environment mkdir -p /tmp/handoff-test-project/src cd /tmp/handoff-test-project git init echo "console.log('hello');" > src/index.js echo "# Test Project" > README.md git add . && git commit -m "Initial commit" echo "function test() {}" >> src/index.js git add . && git commit -m "Add test function" ``` --- ## Scenario 1: Basic Handoff Creation **Trigger phrase:** "create a handoff" **User prompt:** ``` I've been working on implementing user authentication. I modified src/auth.js to add JWT token validation and updated the middleware. Create a handoff so I can continue later. ``` **Expected behavior:** - [ ] Skill triggers (recognizes "create a handoff") - [ ] Runs `create_handoff.py` script - [ ] Creates file in `.claude/handoffs/` - [ ] Pre-fills metadata (timestamp, project path, git branch) - [ ] Prompts user to complete TODO sections - [ ] Mentions validation step **Model-specific expectations:** | Model | Expectation | |-------|-------------| | Haiku | Should follow script instructions literally, may need prompting for details | | Sonnet | Should execute workflow smoothly, fill in reasonable context | | Opus | Should proactively add rich context, may suggest improvements | --- ## Scenario 2: Handoff with Chaining **Trigger phrase:** "continue from previous handoff" **Setup:** First create a handoff using Scenario 1, then: **User prompt:** ``` I'm continuing the auth work from yesterday. Create a new handoff that links to the previous one. ``` **Expected behavior:** - [ ] Lists existing handoffs - [ ] Uses `--continues-from` flag - [ ] Adds "Handoff Chain" section with link - [ ] References previous handoff context --- ## Scenario 3: Resume from Handoff **Trigger phrase:** "resume from handoff" or "load handoff" **Setup:** Have an existing handoff file **User prompt:** ``` I want to continue where I left off. Load my last handoff. ``` **Expected behavior:** - [ ] Runs `list_handoffs.py` to find handoffs - [ ] Runs `check_staleness.py` on selected handoff - [ ] Reports staleness level - [ ] Reads handoff document - [ ] Summarizes "Immediate Next Steps" - [ ] Follows resume checklist --- ## Scenario 4: Proactive Handoff Suggestion **Trigger:** After substantial work (no explicit request) **Setup:** Simulate a long session by describing significant work done **User prompt:** ``` Great, we've now: 1. Refactored the database connection pooling 2. Fixed the N+1 query in UserService 3. Added caching layer with Redis 4. Updated all the tests 5. Fixed 3 TypeScript errors What's next? ``` **Expected behavior:** - [ ] Recognizes substantial work done (5+ items) - [ ] Proactively suggests creating a handoff - [ ] Uses the suggested phrasing from SKILL.md --- ## Scenario 5: Validation Flow **Trigger phrase:** "validate the handoff" **Setup:** Create a handoff with incomplete sections **User prompt:** ``` I created a handoff but I'm not sure if it's complete. Can you validate it? ``` **Expected behavior:** - [ ] Runs `validate_handoff.py` - [ ] Reports quality score - [ ] Lists missing/incomplete sections - [ ] Warns about any secrets detected - [ ] Provides actionable next steps --- ## Scenario 6: Staleness Check **Trigger phrase:** "check if handoff is still valid" **Setup:** Have an older handoff with several commits since **User prompt:** ``` I have a handoff from last week. Is it still relevant or should I create a new one? ``` **Expected behavior:** - [ ] Runs `check_staleness.py` - [ ] Reports staleness level (FRESH/SLIGHTLY_STALE/STALE/VERY_STALE) - [ ] Lists specific issues (days old, commits since, etc.) - [ ] Provides recommendation --- ## Scenario 7: Secret Detection **Trigger:** During handoff creation with sensitive content **User prompt:** ``` Create a handoff. Note that I configured the API with key sk-abc123xyz and the database password is "supersecret". ``` **Expected behavior:** - [ ] Creates handoff - [ ] Runs validation - [ ] Detects potential secrets - [ ] Warns user about security risk - [ ] Recommends removing sensitive data --- ## Scoring Rubric For each scenario, score: | Criterion | Points | Description | |-----------|--------|-------------| | Triggers correctly | 2 | Skill activates on trigger phrase | | Follows workflow | 3 | Executes steps in correct order | | Uses scripts | 2 | Runs appropriate Python scripts | | Output quality | 2 | Produces useful, accurate output | | Error handling | 1 | Handles edge cases gracefully | | **Total** | **10** | Per scenario | **Pass threshold:** 7/10 per scenario --- ## Results Template ```markdown ## Test Results: [Model Name] Date: YYYY-MM-DD Model: [haiku/sonnet/opus] Skill version: session-handoff | Scenario | Score | Notes | |----------|-------|-------| | 1. Basic Creation | /10 | | | 2. Chaining | /10 | | | 3. Resume | /10 | | | 4. Proactive | /10 | | | 5. Validation | /10 | | | 6. Staleness | /10 | | | 7. Secret Detection | /10 | | | **Total** | /70 | | ### Issues Found - ### Recommendations - ```

Source: claude-code-templates (MIT). See About Us for full credits.

BAGUA AI