AI-022: Einstein for Service: Case Summarisation, Recommendations, and Knowledge

What you will learn in this tutorial

How Einstein case summarisation works technically and what data prerequisites it requires
Next Best Action configuration for service — strategy design, recommendation surfaces, and acceptance rate measurement
Knowledge management requirements for AI-powered article recommendations to perform reliably
Einstein Search versus standard Knowledge search — when semantic search changes agent behaviour
How to measure the impact of service AI features against handle time, CSAT, and deflection
The common failure patterns that prevent service AI from delivering on its potential

Case Summarisation: What It Does and What It Needs

Einstein case summarisation generates a structured text summary of a case from its description, email thread, and activity history. The practical value for agents is in the warm-transfer scenario: when a case is escalated or transferred between agents, the receiving agent reads the AI-generated summary rather than scrolling through a long email thread. A well-configured summarisation implementation consistently reduces warm-transfer handle time by 25–35% — not because the AI is exceptional, but because the previous workflow (read entire email chain, form mental model, pick up conversation) was genuinely slow.

Case summarisation uses the Prompt Builder infrastructure and the Einstein Trust Layer. The prompt is pre-built by Salesforce and available out of the box, but most production deployments customise it to match the organisation's case structure — particularly to include custom fields (product line, entitlement tier, previous case history) and to constrain the output format to what agents actually need. The standard output is a paragraph summary; many deployments replace this with a structured four-field output (situation, customer sentiment, actions taken, recommended next step) that agents find more immediately actionable.

The data prerequisites are straightforward but often under-estimated. Summarisation quality scales directly with case description quality and activity log completeness. Organisations where agents log case updates inconsistently — free-form notes rather than structured resolution steps — receive summaries that reflect the inconsistency. Rolling out case summarisation without first establishing activity logging standards produces results that agents dismiss as unhelpful, which kills adoption regardless of the underlying capability.

💡

Insight

Case summarisation is most valuable as an input to the next agent, not as a convenience for the agent currently working the case. The highest-impact deployment pattern is surfacing the summary automatically when a case is assigned or transferred — not requiring agents to actively request it. Passive display (always visible in the case header) beats on-demand access (a button that generates it) on adoption and handle time reduction in every deployment I have seen.

Next Best Action in the Service Console

Next Best Action (NBA) in Service Cloud presents contextual recommendations to agents during live interactions — which knowledge article to send, which product to upgrade, which retention offer to make, which escalation path to take. The AI component sits inside the recommendation strategy: Einstein uses predictive models to score which recommendation is most likely to be accepted given the current case context, rather than presenting every applicable recommendation in an undifferentiated list.

The technical architecture is a combination of Recommendation records (what to present), Strategies (a decision tree built in Strategy Builder that determines eligibility and filtering logic), and the Einstein Recommendation engine (which scores and ranks eligible recommendations). Configuring NBA for service requires designing the strategy first — determining what types of recommendations exist, what eligibility rules apply, and how conflicts between multiple applicable recommendations are resolved — before the AI component can add value.

The performance signal that drives NBA is acceptance rate: which recommendations agents accepted, which they dismissed, and what the subsequent case outcome was. This signal trains the Einstein model over time, progressively improving recommendation ranking for the specific patterns in your case population. New NBA deployments start with rule-based ranking (explicit priority ordering) and shift to AI-driven ranking once sufficient acceptance history has accumulated — typically 3–6 months of usage for a high-volume service centre.

🔑

Key Concept

NBA is only as good as the recommendations available to present. Before investing in AI ranking, audit the recommendation catalogue: are the knowledge articles accurate and up to date? Are the retention offers competitive with what agents would escalate to anyway? Is the escalation path guidance current? A well-ranked catalogue of poor recommendations produces poor outcomes. Address the catalogue quality first, then let AI improve the ranking.

Knowledge Management for AI-Powered Recommendations

The performance of every AI feature that touches knowledge — article recommendations, Einstein Search, agent-assisted resolution, autonomous agent responses — is capped by the quality of the Knowledge base. This is the single most consistent root cause of under-performing service AI deployments: the AI capability is functional, but it is working with a knowledge base that has structural problems the technology cannot compensate for.

Three knowledge base characteristics matter most for AI performance. Currency: articles that describe deprecated product versions, outdated procedures, or prices that no longer apply are retrieved and presented confidently by AI recommendation systems — there is no mechanism that detects staleness automatically. Establish an article review cadence (quarterly for high-traffic articles, annually for low-traffic) and make article owners accountable for it. Specificity: AI retrieval systems match against article content, not article intent. Articles that try to cover a broad topic in one long document produce diluted matches — a single 2,000-word article covering ten product variants will be retrieved for all ten but be accurately useful for none. Segment articles to single-purpose resolution documents. Structural consistency: articles with a consistent template (symptom, cause, resolution steps, related articles) are retrieved, ranked, and summarised more reliably than free-form narrative articles.

Einstein Search adds semantic search capability on top of the standard keyword-based Knowledge search. The practical difference is that agents can search for "customer can't log in after password reset" and retrieve articles about authentication failure, locked accounts, and SSO configuration — even if none of those articles contain the agent's exact search terms. For organisations where agents have highly variable search behaviour (because the agent population has diverse tenures and product knowledge), semantic search measurably improves first-article relevance. For organisations where agents already use precise, consistent search terms, the improvement is marginal.

Measuring Service AI Impact

Service AI features produce measurable outcomes across three dimensions: handle time, first-contact resolution, and knowledge utilisation. Measuring correctly requires isolating the AI feature's contribution from other confounding variables — agent tenure, case type distribution, seasonal volume changes — which means establishing baselines before deployment and running controlled comparisons rather than before/after averages.

Handle time is the most direct measurement. Average handle time (AHT) for case types where the AI feature is active, compared against AHT for the same case types in a control group (or the same period in the prior year, adjusted for volume). Case summarisation impact is measurable on warm-transfer cases specifically; NBA impact is measurable on cases where recommendations were presented versus dismissed; knowledge recommendation impact is measurable on time-to-first-relevant-article.

First-contact resolution (FCR) is the more meaningful business outcome but harder to attribute. FCR improvements from knowledge AI are real but lag — agents must first adopt the AI-recommended articles and trust them enough to resolve cases at first contact. Expect a three-to-six-month adoption curve before FCR impact is visible. Measuring agent acceptance rate (the proportion of AI recommendations acted on) is an earlier leading indicator of whether the adoption curve is progressing.

CSAT correlation is worth measuring but rarely attributable directly to a specific AI feature. Where service AI reduces handle time substantially (20%+), CSAT improvements are typically observed — customers respond positively to faster resolution. Where AI reduces agent effort but resolution time remains similar, CSAT impact is minimal.

✅

Leader Perspective

Avoid reporting AI impact through aggregate metrics (total AHT across all case types) in the first six months. AI features deliver impact unevenly — summarisation helps most on complex multi-touch cases, NBA helps most on specific case types where the recommendation catalogue is strong, knowledge AI helps most on agents with shorter tenure. Report impact per case type and per agent segment, and lead with the strong signals rather than averaging them away.

Avoiding Common Failure Patterns

Service AI deployments fail in predictable ways. Recognising these patterns early prevents the kind of failed rollout that entrenches scepticism about AI capability in the service organisation for years.

Feature-before-foundation: deploying AI summarisation or NBA before establishing the activity logging standards, knowledge base quality, and agent adoption of the Service Console that AI features depend on. AI features amplify the quality of the underlying process — they do not substitute for it. If agents are not logging case activities consistently, summarisation will summarise sparse data consistently.

No change management: treating service AI as a technology deployment rather than a behavioural change programme. Agents who understand why case summarisation exists and what it is trying to do adopt it at significantly higher rates than agents who see it as a new panel in the console with no context. Invest in the "why" communication as much as the technical configuration.

Passive display without adoption monitoring: deploying NBA or knowledge recommendations but not tracking whether agents are interacting with them. Passive display of recommendations that agents do not use produces no value and consumes configuration and maintenance effort. Track acceptance rates from go-live and treat low acceptance as a signal to investigate the recommendation quality, display placement, or agent training — not to conclude that "AI doesn't work."

Treating AI output as infallible: particularly relevant for knowledge AI deployments where agents stop forming their own judgement about article appropriateness and present whatever the AI recommends. Train agents to treat AI recommendations as a starting point that requires professional judgement — this is both correct practice and a guard against the confident-wrongness failure mode where a stale or miscategorised article is recommended authoritatively.

Key Takeaways

Case summarisation delivers most value in warm-transfer scenarios — passive display triggered on case assignment outperforms on-demand generation on both adoption and handle time reduction
Next Best Action value scales with the quality of the recommendation catalogue, not just the sophistication of the Einstein ranking model — audit catalogue quality before investing in AI ranking
Knowledge base currency, specificity, and structural consistency cap the performance ceiling for all AI features that retrieve and surface knowledge content
Einstein Search's semantic capability delivers the most measurable benefit when agent search behaviour is variable; for organisations with consistent, precise search terms the improvement is marginal
Service AI impact should be measured per case type and per agent segment in the first six months — aggregate metrics average away the strong signals and can mask real improvements
The four recurring failure patterns — feature-before-foundation, no change management, passive display without adoption monitoring, and treating AI output as infallible — account for the majority of service AI deployments that fail to deliver measurable value

Checkpoint: Test Your Understanding

1. A service AI deployment shows poor case summarisation quality even though the Einstein configuration is correct. What is the most likely root cause?

A. The LLM model selected in the Einstein configuration is too small for case summarisation tasks

B. Agents are not logging case activities consistently, so the model is summarising sparse, unstructured input — the AI capability is functional, but the underlying data quality is insufficient

C. Case summarisation requires a minimum of 500 cases in the org before the Einstein model can train effectively

D. The Prompt Builder template is using standard Salesforce fields; custom fields must be explicitly added before quality improves

2. Why does a new Next Best Action deployment typically start with rule-based recommendation ranking before switching to AI-driven ranking?

A. Salesforce requires a rule-based configuration phase for compliance reasons before Einstein personalisation can be enabled

B. AI-driven ranking requires a minimum number of users to be licensed before it can activate

C. The Einstein model needs sufficient acceptance history — typically 3–6 months of agent usage — to learn which recommendations perform best in your specific case population; rule-based ranking provides value while that signal accumulates

D. Rule-based ranking provides a cheaper licensing option during the pilot phase before committing to AI-powered ranking

3. Service AI adoption is low three months after go-live. Agents are not interacting with knowledge recommendations despite technically correct configuration. What is the most appropriate immediate action?

A. Escalate to Salesforce Support to investigate whether the Einstein models have trained correctly for this org's data

B. Replace the AI-ranked recommendations with rule-based ranking, which agents will find more predictable and trustworthy

C. Add a mandatory interaction requirement where agents must accept or dismiss every recommendation before closing a case

D. Investigate recommendation quality, display placement, and agent training — low acceptance rate is a signal to diagnose root cause, not to conclude that AI doesn't work for this use case

Einstein for Service: Case Summarisation, Recommendations, and Knowledge

Case Summarisation: What It Does and What It Needs

Next Best Action in the Service Console

Knowledge Management for AI-Powered Recommendations

Measuring Service AI Impact

Avoiding Common Failure Patterns

Key Takeaways

Checkpoint: Test Your Understanding

Continue Reading

Responsible AI in CRM: A Framework for Tech Leaders

Autonomous Agents in Customer Service: Design Patterns and Pitfalls

Salesforce and Large Language Models: Integration Patterns and Guardrails

Discussion & Feedback