How to configure scorecard scoring criteria for AI evaluation in Demodesk
Learn how to build AI scorecards in Demodesk that score calls accurately. Step-by-step guide to scoring criteria, strictness, and global AI context.
What and why
This guide shows you how to build a scorecard in Demodesk that gives the AI Coach clear, calibrated instructions for scoring your sales calls. Done right, your scorecards return scores that reflect how your team sells. Done wrong, every call comes back as a 2 out of 5. Reps stop trusting the feedback and the coaching loop breaks.
The fix lives in two places: per-question scoring criteria with the right strictness level, and a global AI context at the scorecard level that gives the AI your sales motion, methodology, and ICP.
Who this is for
Sales managers, enablement leads, and RevOps teams setting up AI sales coaching in Demodesk—especially anyone who has built a first scorecard and noticed the scores skew too harsh or too generous.
Prerequisites
- Demodesk Coaching & AI seat with admin access to scorecards
- A defined coaching methodology (MEDDIC, BANT, Challenger, or your own)
- 5–10 representative call recordings to test the scorecard against
- A clear sense of which sales motion the scorecard applies to (new business, expansion, SMB vs. enterprise)
Steps
1. Open the scorecard builder
Navigate to Agents → Scorecards in the top navigation. Click Create scorecard to start a new one, or open an existing scorecard to refine it.
2. Add one question per step
Structure your scorecard as a sequence of steps with one question per step. Don't stack multiple questions into a single step. The AI evaluates each step independently, and combining questions dilutes the score.
Examples of well-scoped questions:
- “Did the rep confirm the prospect's budget range during discovery?”
- “Did the rep identify the economic buyer by name?”
- “Did the rep set a clear next step with a date before ending the call?”
Avoid compound questions like “Did the rep qualify budget, authority, and timeline?” Split that into three separate steps.
3. Add scoring criteria to each question
This is where most scorecards go wrong. For each question, click Add Scoring Criteria and give the AI explicit guidelines for how to score it.
Include:
- Strictness level— Tell the AI how strict to be. “Score generously if the rep mentioned budget at any point” is different from “Only score full marks if the rep confirmed a specific number or range.” Skip this and the AI defaults to its own interpretation, which tends to be strict.
- What good looks like— Describe what a strong answer sounds like in your sales motion. Example: “For SMB deals, budget confirmation can be a directional range like '5–10K per month.' It doesn't need to be a precise figure.”
- What to ignore— Be explicit about edge cases. Example: “If the prospect raises budget themselves without being asked, still count this as confirmed.”
If your first round of scoring comes back too harsh, this is almost always the fix. Loosen the strictness language and add more examples of what counts as a pass.
4. Add global AI context at the scorecard level
Open Advanced settings on the scorecard and add a global AI context that applies to every question.
This is where you give the AI the bigger picture:
- Your sales methodology (e.g., “We use MEDDIC for enterprise and a lighter BANT variant for SMB”)
- Your ICP and segment (e.g., “This scorecard is for SMB inbound demos with 10–50 FTE companies in DACH”)
- Your product positioning (one or two sentences on what you sell)
- Common objections and how your team handles them
- Anything specific to your sales motion the AI couldn't infer from the questions alone
Global context stops the AI from grading a discovery call the way it would grade a closing call, or penalizing a rep for skipping steps that don't apply to your motion.
5. Test against real recordings and recalibrate
Run the scorecard against 5–10 recent calls you've already reviewed. Compare the AI scores to your own. If the AI is consistently 1–2 points lower, your scoring criteria are too strict. Go back to step 3 and loosen them.
Recalibrate until AI scores and human scores agree on at least 7 out of 10 calls. That's when the coaching loop starts to work.
Tips
- Start with one scorecard per sales motion, not per stage. A single discovery scorecard that works for new business beats five hyper-specific ones nobody updates.
- Write criteria in your reps' language.If your team says “BANT-light” or “champion check,” use those words in the scoring criteria. The AI follows your vocabulary.
- Bookmark calls that scored unexpectedly.Use them as test cases when you adjust the scorecard. Demodesk's configurable retention keeps bookmarked recordings even after auto-deletion.
- Don't aim for a perfect 5/5 distribution.A healthy scorecard returns a spread—some 2s, mostly 3s and 4s, occasional 5s. If everyone scores 5, your criteria are too loose.
- Re-test after every change. Small edits to the global context can shift scores across all questions at once.
Related skills and agents
- AI Coach — the agent that runs scorecards against your calls. See https://demodesk.com/agents/ai-coach-a-3-0.
- Marketplace scorecards — pre-built scorecards for MEDDIC, BANT, Challenger, and SPICED at https://marketplace.demodesk.ai.
- AI Assistant— generates the transcripts and summaries the AI Coach scores against.