Why are all my calls scoring 2 out of 5?

Your scoring criteria are too strict. The AI defaults to a literal reading of each question. Add explicit 'score generously if...' or 'this counts as a pass when...' language to each Scoring Criteria field, then re-test against 5–10 calls.

How strict should I make my scorecards?

Strict enough that scores spread across the 1–5 range, loose enough that strong reps consistently score 4 or 5. If every call comes back the same number, your criteria need adjustment.

Should I have one scorecard or many?

Start with one per sales motion (new business demo, expansion call, renewal). Resist building a separate scorecard for every product line or persona — that's how scorecards become unmaintained.

What goes in global AI context vs. per-question scoring criteria?

Global context is the bigger picture: methodology, ICP, product, sales motion. Per-question criteria is how to score that specific question: strictness, examples of pass/fail, edge cases.

Can I customize scorecards for MEDDIC, BANT, or our own methodology?

Yes. Demodesk scorecards are fully customizable. Build to any framework or write your own from scratch.

Does the AI score in languages other than English?

Yes. Demodesk supports 98 languages, including German. Write your scoring criteria in the language your team sells in.

How often should I update a scorecard?

Review every quarter and after any major change to your sales motion, ICP, or methodology. Small recalibrations can happen any time.

Can reps see their own scores?

Yes. Reps see their own scores and coaching feedback. Managers see aggregate trends across the team.

May 21, 2026·6 min read

How to configure scorecard scoring criteria for AI evaluation in Demodesk

Learn how to build AI scorecards in Demodesk that score calls accurately. Step-by-step guide to scoring criteria, strictness, and global AI context.

Veronika WaxFounder & CEO

What and why

This guide shows you how to build a scorecard in Demodesk that gives the AI Coach clear, calibrated instructions for scoring your sales calls. Done right, your scorecards return scores that reflect how your team sells. Done wrong, every call comes back as a 2 out of 5. Reps stop trusting the feedback and the coaching loop breaks.

The fix lives in two places: per-question scoring criteria with the right strictness level, and a global AI context at the scorecard level that gives the AI your sales motion, methodology, and ICP.

Who this is for

Sales managers, enablement leads, and RevOps teams setting up AI sales coaching in Demodesk—especially anyone who has built a first scorecard and noticed the scores skew too harsh or too generous.

Prerequisites

Demodesk Coaching & AI seat with admin access to scorecards
A defined coaching methodology (MEDDIC, BANT, Challenger, or your own)
5–10 representative call recordings to test the scorecard against
A clear sense of which sales motion the scorecard applies to (new business, expansion, SMB vs. enterprise)

Steps

1. Open the scorecard builder

Navigate to Agents → Scorecards in the top navigation. Click Create scorecard to start a new one, or open an existing scorecard to refine it.

2. Add one question per step

Structure your scorecard as a sequence of steps with one question per step. Don't stack multiple questions into a single step. The AI evaluates each step independently, and combining questions dilutes the score.

Examples of well-scoped questions:

“Did the rep confirm the prospect's budget range during discovery?”
“Did the rep identify the economic buyer by name?”
“Did the rep set a clear next step with a date before ending the call?”

Avoid compound questions like “Did the rep qualify budget, authority, and timeline?” Split that into three separate steps.

3. Add scoring criteria to each question

This is where most scorecards go wrong. For each question, click Add Scoring Criteria and give the AI explicit guidelines for how to score it.

Include:

Strictness level— Tell the AI how strict to be. “Score generously if the rep mentioned budget at any point” is different from “Only score full marks if the rep confirmed a specific number or range.” Skip this and the AI defaults to its own interpretation, which tends to be strict.
What good looks like— Describe what a strong answer sounds like in your sales motion. Example: “For SMB deals, budget confirmation can be a directional range like '5–10K per month.' It doesn't need to be a precise figure.”
What to ignore— Be explicit about edge cases. Example: “If the prospect raises budget themselves without being asked, still count this as confirmed.”

If your first round of scoring comes back too harsh, this is almost always the fix. Loosen the strictness language and add more examples of what counts as a pass.

4. Add global AI context at the scorecard level

Open Advanced settings on the scorecard and add a global AI context that applies to every question.

This is where you give the AI the bigger picture:

Your sales methodology (e.g., “We use MEDDIC for enterprise and a lighter BANT variant for SMB”)
Your ICP and segment (e.g., “This scorecard is for SMB inbound demos with 10–50 FTE companies in DACH”)
Your product positioning (one or two sentences on what you sell)
Common objections and how your team handles them
Anything specific to your sales motion the AI couldn't infer from the questions alone

Global context stops the AI from grading a discovery call the way it would grade a closing call, or penalizing a rep for skipping steps that don't apply to your motion.

5. Test against real recordings and recalibrate

Run the scorecard against 5–10 recent calls you've already reviewed. Compare the AI scores to your own. If the AI is consistently 1–2 points lower, your scoring criteria are too strict. Go back to step 3 and loosen them.

Recalibrate until AI scores and human scores agree on at least 7 out of 10 calls. That's when the coaching loop starts to work.

Tips

Start with one scorecard per sales motion, not per stage. A single discovery scorecard that works for new business beats five hyper-specific ones nobody updates.
Write criteria in your reps' language.If your team says “BANT-light” or “champion check,” use those words in the scoring criteria. The AI follows your vocabulary.
Bookmark calls that scored unexpectedly.Use them as test cases when you adjust the scorecard. Demodesk's configurable retention keeps bookmarked recordings even after auto-deletion.
Don't aim for a perfect 5/5 distribution.A healthy scorecard returns a spread—some 2s, mostly 3s and 4s, occasional 5s. If everyone scores 5, your criteria are too loose.
Re-test after every change. Small edits to the global context can shift scores across all questions at once.

Related skills and agents

AI Coach — the agent that runs scorecards against your calls. See https://demodesk.com/agents/ai-coach-a-3-0.
Marketplace scorecards — pre-built scorecards for MEDDIC, BANT, Challenger, and SPICED at https://marketplace.demodesk.ai.
AI Assistant— generates the transcripts and summaries the AI Coach scores against.