Overview
Goal: Ensure consistent, empathetic support responses that handle refunds, returns, shipping questions accurately. Time Investment: 30 minutes to first insights Team Size: 1-5 PMs for initial evaluationThe Challenge
You’re building a support bot for an e-commerce company. Questions:- What makes a “good” support response?
- How empathetic should the bot be?
- When should it escalate to a human?
- How do we handle edge cases (damaged items, partial refunds)?
Step-by-Step Walkthrough
Step 1: Gather Real Support Questions (5 min)
Pull 20 real questions from your support queue or help desk: Refund Questions:- “Where is my refund?”
- “I want a refund for order #12345”
- “How long do refunds take?”
- “Can I get a partial refund?”
- “My refund hasn’t arrived yet”
- “Where is my order?”
- “When will my package arrive?”
- “Can I change my shipping address?”
- “I need expedited shipping”
- “My tracking says delivered but I didn’t receive it”
- “How do I return an item?”
- “Can I return after 30 days?”
- “Do I have to pay return shipping?”
- “I lost the receipt, can I still return?”
- “Item arrived damaged, what do I do?”
- “How do I reset my password?”
- “I can’t log in”
- “Cancel my subscription”
Step 2: Create Project & Write Initial Prompt (3 min)
- Click “New Project”
- Name: “Customer Support Bot - v1”
- Model: GPT-4 (for high quality)
- Temperature: 0.3 (consistent responses)
- System Prompt:
- Click “Create Project”
Step 3: Add Scenarios & Generate (5 min)
- Click “Bulk Add Scenarios”
- Paste your 20 questions
- Click “Generate Outputs”
- Wait 60 seconds
- Review the table of AI responses
Step 4: Rate Outputs (10 min)
Real Pattern You’ll Notice: Output for “Where is my refund?”:“Refunds are processed within our standard timeframe. Please check your account for status.”Your Rating: ⭐⭐ (2 stars) Feedback: “Too vague—should say ‘5-7 business days’ not ‘standard timeframe’”
Output for “Item arrived damaged”:
“I sincerely apologize that your item arrived damaged. We offer free return shipping for defective items. Please initiate a return in your account here, and we’ll process a full refund within 5-7 business days. Let me know if you need any help!”Your Rating: ⭐⭐⭐⭐⭐ (5 stars) Feedback: “Perfect—empathetic, clear policy, actionable steps, includes link”
After rating all 20 outputs:
| Rating | Count | % |
|---|---|---|
| ⭐⭐⭐⭐⭐ | 8 | 40% |
| ⭐⭐⭐⭐ | 5 | 25% |
| ⭐⭐⭐ | 3 | 15% |
| ⭐⭐ | 4 | 20% |
Step 5: Extract Patterns (2 min)
Click “Run Pattern Extraction” and wait for results.Failure Cluster 1: Vague Timelines (3 outputs)
Pattern: Using “soon”, “shortly”, or “standard timeframe” instead of specific “5-7 business days” Root Cause: System prompt mentions policy but AI paraphrases instead of using exact wording Suggested Fix:Affected Scenarios: #1, #2, #5
Failure Cluster 2: Missing Empathy (2 outputs)
Pattern: Responses jump straight to solution without acknowledging frustration Root Cause: Prompt says “be polite” but doesn’t emphasize empathy Suggested Fix:Affected Scenarios: #10, #15
Quality Patterns (5-Star Outputs)
Structure:
- Empathetic opening (“I sincerely apologize…”)
- Clear explanation of policy
- Specific timeline (“5-7 business days”)
- Actionable next step (link to account, instructions)
- Friendly closing (“Let me know if you need anything else!”) Tone: Professional but warm—not robotic Length: 100-200 words (concise but complete) Key Phrases:
- “I sincerely apologize” (not just “Sorry”)
- “5-7 business days” (exact wording)
- “Let me know if you have any questions” (invite follow-up)
Step 6: Apply Fixes & Retest (5 min)
Updated System Prompt (v2):- Click “Apply Fix & Retest”
- Review the diff
- Click “Update & Retest”
- Only the 4 failed scenarios regenerate (saves time)
- 3 out of 4 now pass ✅
- 1 still fails (partial refunds not in policy)
Step 7: Final Iteration (Optional)
Remaining Failure: Partial refund scenario Issue: Policy doesn’t cover partial refunds (only full refunds) Action: Add to system prompt:Key Insights from This Example
Pattern 1: Exact Wording Matters
AI paraphrases unless you explicitly say “use exact wording.” Fix: Add “say exactly ‘5-7 business days’—don’t paraphrase”Pattern 2: Empathy Must Be Explicit
“Be polite” doesn’t mean “apologize first.” Fix: Specify exact empathetic opening phrasePattern 3: Structure Improves Consistency
Numbered steps in system prompt → more consistent output structure Fix: Use numbered list for response structurePattern 4: Edge Cases Emerge from Testing
Partial refunds weren’t considered until a scenario revealed the gap. Fix: Add to policy documentationSupport Bot Scenario Library
Ready to test your own support bot? Use these scenarios:Refunds & Returns (5 scenarios)
Shipping & Delivery (5 scenarios)
Account Management (5 scenarios)
Product Questions (3 scenarios)
Escalation Scenarios (2 scenarios)
Best Practices for Support Bots
✅ Do
Empathy First: Always acknowledge frustration before solving Exact Policy Wording: Use precise language for timelines, prices, processes Clear Next Steps: Tell user exactly what to do next (with links if possible) Consistent Tone: Define tone explicitly (“professional but warm” vs. “casual and friendly”) Handle Escalation: Detect when to say “Let me connect you with a human agent”❌ Don’t
Vague Language: “Soon”, “shortly”, “typically”—be specific Robotic Responses: “Your request has been processed” → “I’ve processed your request!” Ignore Emotion: User is angry → Acknowledge it, don’t ignore Over-Apologize: One apology at start is enough Make Promises You Can’t Keep: “I’ll personally ensure…” (you’re a bot!)Export for Engineering
After achieving 95%+ success rate:- Click “Export”
- Choose “Golden Examples” (all 5-star outputs)
- Choose “Test Suite” (pytest format for CI/CD)
- Share with engineering team
- Engineer implements bot using your system prompt
- Engineer runs exported test suite in CI
- If bot fails tests, engineer knows exactly which scenarios broke
- CI/CD prevents regressions
Success Metrics
| Metric | Before Sageloop | After Sageloop |
|---|---|---|
| Time to quality definition | 2 weeks | 30 minutes |
| Quality bar clarity | Vague (“be helpful”) | Concrete (exact criteria) |
| Success rate at launch | ~60% (discover issues in prod) | 95%+ (tested before launch) |
| PM confidence | Low | High |
| Documentation clarity | Subjective | Quantified & actionable |