Rating Outputs

Learn how to rate AI outputs effectively. Your ratings teach Sageloop what quality means to you.

Why Rating Matters

Better ratings → Better pattern extraction → Better prompt fixes → Higher quality AI Your ratings are the foundation of Sageloop’s insights. Garbage in, garbage out.

The 5-Star Scale

⭐⭐⭐⭐⭐ (5 Stars): Perfect

Definition: Exactly what I want in production. Zero changes needed. Criteria:

Accurate information
Appropriate tone
Correct length
Proper formatting
Handles edge case well

Example (Support Bot):

“I sincerely apologize for the delay with your order. Refunds typically take 5-7 business days to process. You can track the status in your account dashboard. Let me know if you have any other questions!”

Why 5-star: Empathetic, specific timeline, actionable next step.

⭐⭐⭐⭐ (4 Stars): Good, Minor Issues

Definition: I’d ship this, but small improvements would help. Typical Issues:

Slightly too long/short
Could be friendlier/more professional
Missing one small detail
Minor formatting preference

Example:

“Apologies for the delay. Refunds take 5-7 days. Check your dashboard for status.”

Why 4-star: Correct info, but a bit terse. Could be warmer.

⭐⭐⭐ (3 Stars): Acceptable but Needs Improvement

Definition: Not ideal, but not broken. Wouldn’t ship without changes. Typical Issues:

Missing important information
Tone is off (too formal or too casual)
Awkward phrasing
Could cause confusion

Example:

“Your refund is being processed. It will arrive soon.”

Why 3-star: Vague timeline (“soon”), missing empathy, no next steps.

⭐⭐ (2 Stars): Problematic, Major Issues

Definition: Multiple serious problems. Needs significant rework. Typical Issues:

Incorrect information
Inappropriate tone
Unprofessional language
Confusing structure
Missing critical details

Example:

“Refunds are handled by the finance department according to internal processing schedules.”

Why 2-star: Too corporate, no timeline, mentions internal processes (user doesn’t care).

⭐ (1 Star): Unacceptable, Completely Wrong

Definition: This would damage our brand. Never ship this. Typical Issues:

Factually incorrect
Rude or offensive
Completely misunderstands request
Security/privacy violation
Nonsensical

Example:

“I don’t know. Ask someone else.”

Why 1-star: Unhelpful, unprofessional, dismissive.

Adding Feedback (Critical for 1-2 Stars)

Rule: Always explain WHY you gave 1-2 stars.

Examples of Good Feedback

Bad: ❌ “This is wrong” Good: ✅ “Missing specific refund timeline (should say 5-7 days)” Bad: ❌ “Tone is off” Good: ✅ “Too formal and corporate—should be friendly and empathetic like the other good ones” Bad: ❌ “Not good” Good: ✅ “Jumps straight to solution without apologizing first; low ratings all skip apology”

Feedback Categories

Length Issues:

“Too long—should be 2-3 sentences max”
“Too brief—needs more context and explanation”

Tone Issues:

“Too formal—users expect casual, friendly language”
“Too casual—this is financial transaction, be professional”
“Sounds robotic, not human”

Accuracy Issues:

“Wrong refund timeline (it’s 5-7 days, not 3-5)”
“Mentions features we don’t have”
“Outdated policy”

Missing Information:

“Should mention they can track status in dashboard”
“Doesn’t apologize first (empathy is critical)”
“No clear next steps”

Formatting Issues:

“Should use bullet points for steps”
“Needs line breaks between paragraphs”
“Needs a clear call-to-action at the end”

Be specific. “Missing empathy” is okay, but “Doesn’t start with apology like all the 5-star ones” is better because it points Sageloop to the exact pattern.

Rating Best Practices

1. Rate Consistently

Scenario: Two outputs are nearly identical. ❌ Inconsistent: Rate one 5-star, one 3-star ✅ Consistent: Rate both 5-star (or both 3-star) Why: Inconsistent ratings confuse pattern extraction. Tip: If unsure, compare to previous ratings. “Is this better or worse than the last 4-star I gave?“

2. Rate Honestly

Temptation: “I want 80% success rate, so I’ll inflate scores.” ❌ Don’t: Give 4-star to a 2-star output ✅ Do: Rate truthfully; let metrics reflect reality Why: Honest ratings reveal true quality gaps. Inflated ratings hide problems you need to fix.

3. Provide Context in Feedback

❌ Vague: “This is bad” ✅ Specific: “Too corporate—users expect friendly, casual tone like ‘Hey there!’ not ‘Dear Customer’” Include Examples: Show what “good” looks like in your feedback.

4. Rate the Output, Not the Scenario

Bad Scenario: “asdfghjkl” AI Output: “I’m sorry, I didn’t understand that. Could you clarify?” Temptation: Rate 1-star because scenario is garbage. ✅ Correct: Rate 5-star—AI handled nonsensical input gracefully! Principle: Rate how well the AI responded, not the scenario quality.

5. Don’t Overthink It

Trust Your Gut: If it feels wrong, it’s 1-2 stars. If it feels great, it’s 5 stars. Avoid Analysis Paralysis: “Is this a 3 or a 4?” → Pick one and move on. Why: Patterns emerge from aggregate ratings, not individual precision.

Keyboard Shortcuts (Speed Rating)

Rate all outputs faster using keyboard shortcuts.

Rating Shortcuts

1 - 1 star
2 - 2 stars
3 - 3 stars
4 - 4 stars
5 - 5 stars

↓ or J - Next output
↑ or K - Previous output
F - Add feedback (opens modal)
T - Add tag
Shift + Enter - Save and advance to next

Pro Workflow

Press 3 (rate 3 stars)
Press F (add feedback modal opens)
Type “Too formal, should be casual”
Press Shift + Enter (save and move to next)

Result: Rate 30 outputs in about 5 minutes.

Common Rating Mistakes

❌ Mistake 1: Rating Too Generously

Problem: Everything is 4-5 stars, even mediocre outputs. Impact: Pattern extraction can’t find failures to fix. Fix: Use the full scale. 3 is average, not bad.

❌ Mistake 2: No Feedback on Low Ratings

Problem: You give 1-2 stars but don’t explain why. Impact: Sageloop can’t group failures or suggest fixes. Fix: Always add feedback for ratings ≤2 stars.

❌ Mistake 3: Inconsistent Standards

Problem: Similar outputs get wildly different ratings. Impact: Confuses pattern detection; unreliable insights. Fix: Compare to past ratings. Ask “Is this better/worse/same as output #5?”

❌ Mistake 4: Overthinking

Problem: Spending 2 minutes per output debating 3 vs 4 stars. Impact: Slow workflow; rating fatigue. Fix: Trust your gut. Rate quickly. Patterns matter, not individual precision.

❌ Mistake 5: Rating Based on Potential

Problem: “This could be 5-star with one small fix” → Rate 5-star. Impact: Inflates metrics; hides real issues. Fix: Rate what you see now, not what it could be. If it needs a fix, it’s 3-4 stars.

When to Re-Rate

Scenario 1: You Change Your Mind

After rating 10 outputs, you realize your standards shifted. ✅ Go back and adjust earlier ratings for consistency.

Scenario 2: After Prompt Iteration

New outputs are generated after updating the prompt. ✅ Re-rate new outputs—old ratings stay with old outputs for version comparison.

Scenario 3: Team Collaboration

Teammate rated outputs differently. ✅ Discuss and align on standards. Decide on a shared mental model of quality.

Example Rating Session

Scenario: “How do I reset my password?” Output A:

“To reset your password, visit account settings and click ‘Reset Password.’”

Your Rating: ⭐⭐⭐ (3 stars) Feedback: “Missing link to account settings and no next steps (check email for reset link)”

Output B:

“I can help you reset your password! Visit your account settings, click ‘Reset Password,’ and check your email for a reset link. Let me know if you need anything else!”

Your Rating: ⭐⭐⭐⭐⭐ (5 stars) Feedback: “Perfect—friendly tone, clear steps, includes link, sets expectation for email”

Output C:

“Password resets are handled via the automated system. Follow the standard protocol.”

Your Rating: ⭐ (1 star) Feedback: “Too corporate and vague—no actionable steps, assumes user knows ‘standard protocol’, no link or guidance”

Tips for Great Ratings

✅ Trust your gut - You know quality when you see it ✅ Be specific - “Too formal” + example is better than vague feedback ✅ Compare to past ratings - Maintain consistency across batch ✅ Provide feedback on failures - Helps extraction group similar issues ✅ Rate quickly - Don’t overthink. Patterns emerge from aggregate, not individual precision ❌ Don’t rate generously to inflate metrics ❌ Don’t skip feedback on low ratings ❌ Don’t spend 2+ minutes per output ❌ Don’t rate based on potential (“could be good”) ❌ Don’t second-guess yourself endlessly

Next Steps

Keyboard Shortcuts - Full reference
Pattern Extraction - How ratings become insights
Quick Start - Create your first project

Getting Started

Guides

Use Cases

Reference

Why Rating Matters

The 5-Star Scale

⭐⭐⭐⭐⭐ (5 Stars): Perfect

⭐⭐⭐⭐ (4 Stars): Good, Minor Issues

⭐⭐⭐ (3 Stars): Acceptable but Needs Improvement

⭐⭐ (2 Stars): Problematic, Major Issues

⭐ (1 Star): Unacceptable, Completely Wrong

Adding Feedback (Critical for 1-2 Stars)

Examples of Good Feedback

Feedback Categories

Rating Best Practices

1. Rate Consistently

2. Rate Honestly

3. Provide Context in Feedback

4. Rate the Output, Not the Scenario

5. Don’t Overthink It

Keyboard Shortcuts (Speed Rating)

Rating Shortcuts

Navigation

Pro Workflow

Common Rating Mistakes

❌ Mistake 1: Rating Too Generously

❌ Mistake 2: No Feedback on Low Ratings

❌ Mistake 3: Inconsistent Standards

❌ Mistake 4: Overthinking

❌ Mistake 5: Rating Based on Potential

When to Re-Rate

Scenario 1: You Change Your Mind

Scenario 2: After Prompt Iteration

Scenario 3: Team Collaboration

Example Rating Session

Tips for Great Ratings

Next Steps

Getting Started

Guides

Use Cases

Reference

​Why Rating Matters

​The 5-Star Scale

​⭐⭐⭐⭐⭐ (5 Stars): Perfect

​⭐⭐⭐⭐ (4 Stars): Good, Minor Issues

​⭐⭐⭐ (3 Stars): Acceptable but Needs Improvement

​⭐⭐ (2 Stars): Problematic, Major Issues

​⭐ (1 Star): Unacceptable, Completely Wrong

​Adding Feedback (Critical for 1-2 Stars)

​Examples of Good Feedback

​Feedback Categories

​Rating Best Practices

​1. Rate Consistently

​2. Rate Honestly

​3. Provide Context in Feedback

​4. Rate the Output, Not the Scenario

​5. Don’t Overthink It

​Keyboard Shortcuts (Speed Rating)

​Rating Shortcuts

​Navigation

​Pro Workflow

​Common Rating Mistakes

​❌ Mistake 1: Rating Too Generously

​❌ Mistake 2: No Feedback on Low Ratings

​❌ Mistake 3: Inconsistent Standards

​❌ Mistake 4: Overthinking

​❌ Mistake 5: Rating Based on Potential

​When to Re-Rate

​Scenario 1: You Change Your Mind

​Scenario 2: After Prompt Iteration

​Scenario 3: Team Collaboration

​Example Rating Session

​Tips for Great Ratings

​Next Steps

Why Rating Matters

The 5-Star Scale

⭐⭐⭐⭐⭐ (5 Stars): Perfect

⭐⭐⭐⭐ (4 Stars): Good, Minor Issues

⭐⭐⭐ (3 Stars): Acceptable but Needs Improvement

⭐⭐ (2 Stars): Problematic, Major Issues

⭐ (1 Star): Unacceptable, Completely Wrong

Adding Feedback (Critical for 1-2 Stars)

Examples of Good Feedback

Feedback Categories

Rating Best Practices

1. Rate Consistently

2. Rate Honestly

3. Provide Context in Feedback

4. Rate the Output, Not the Scenario

5. Don’t Overthink It

Keyboard Shortcuts (Speed Rating)

Rating Shortcuts

Navigation

Pro Workflow

Common Rating Mistakes

❌ Mistake 1: Rating Too Generously

❌ Mistake 2: No Feedback on Low Ratings

❌ Mistake 3: Inconsistent Standards

❌ Mistake 4: Overthinking

❌ Mistake 5: Rating Based on Potential

When to Re-Rate

Scenario 1: You Change Your Mind

Scenario 2: After Prompt Iteration

Scenario 3: Team Collaboration

Example Rating Session

Tips for Great Ratings

Next Steps