AI-010: The Salesforce AI Trust Layer: What It Does and Doesn't Protect

What you will learn in this tutorial

What the Einstein Trust Layer is and the specific components it comprises
How data de-identification and tokenisation work before prompts reach the LLM
What toxicity filtering enforces and where its limits lie
How the Trust Layer's audit logging works and what it does and does not capture
The specific threat vectors and compliance scenarios the Trust Layer does not address
The architectural compensating controls required to fill the Trust Layer's gaps

What the Trust Layer Actually Is

The Einstein Trust Layer is Salesforce's collective term for the set of security and safety controls that sit between Salesforce data and the external large language models used by Einstein and Agentforce features. It is not a single technical component — it is a suite of controls: a data de-identification pipeline, a toxicity filter, an audit logging system, and a set of contractual and architectural commitments about how Salesforce manages the LLM vendors it partners with.

The core claim of the Trust Layer is that your Salesforce data is never used to train the LLMs that Einstein and Agentforce use. When your data is sent to an external LLM provider for inference, Salesforce contractually prohibits that provider from using the data to retrain or fine-tune their model. This is the "zero data retention" commitment. It addresses the concern that activating Einstein features would cause proprietary customer data to leak into the LLM's training corpus and potentially surface in other customers' outputs.

Understanding what this commitment actually covers — and, critically, what it does not — is the prerequisite for evaluating whether the Trust Layer is sufficient for your organisation's security and compliance requirements.

🔑

The Trust Layer is a controls suite, not a security boundary: The Trust Layer reduces specific risks around data exfiltration into LLM training and harmful output generation. It does not replace your org's sharing model, field-level security, or data classification programme. An architect who tells a CISO "the Trust Layer handles AI security" without this qualification is giving an incomplete and potentially misleading assurance.

Data De-identification and Tokenisation

Before a prompt is sent from Salesforce to the external LLM, the Trust Layer applies a de-identification scan. The scanner identifies field values that match configured sensitive data patterns — names, email addresses, phone numbers, national identification numbers, financial account references — and replaces them with opaque tokens. The LLM receives the tokenised version of the prompt; it never processes the actual sensitive values.

When the LLM returns a response, the Trust Layer intercepts it before it reaches the Salesforce application and re-hydrates the tokens with the original values. From the user's perspective, the output contains the actual customer names and contact details; the LLM only ever processed tokens. This architecture means that even if the LLM provider had a data exposure incident, the exposed data would be a set of tokens rather than your customers' actual PII.

The scope of de-identification is configured, not automatic: The Trust Layer de-identifies fields that match the patterns in its PII detection configuration. It does not automatically understand that your custom field "Contract_Value__c" contains commercially sensitive pricing information, or that your "Internal_Account_Code__c" field maps to a specific high-value customer in a way that should be protected. De-identification coverage is bounded by the PII pattern configuration — fields that are sensitive but do not match standard PII patterns will not be de-identified unless you explicitly extend the configuration.

💡

Business-sensitive fields require explicit de-identification configuration: Standard PII detection covers names, email addresses, and similar personal data. Commercial sensitivity — contract values, pricing tiers, competitive intelligence stored in text fields — is not a standard PII category. For regulated industries or commercially sensitive deployments, audit your prompt template merge fields against your data classification policy and explicitly extend the de-identification configuration to cover fields beyond standard PII.

Toxicity Filtering and Content Policy

The Trust Layer applies a content filter to LLM outputs before they are returned to the Salesforce application. The filter screens for responses that contain harmful, hateful, sexually explicit, or otherwise policy-violating content. If the filter identifies a policy violation in the output, the response is blocked and the user receives an error message rather than the harmful content.

The toxicity filter is applied to the LLM's response, not to the user's input. This is an important asymmetry: the filter prevents the platform from surfacing harmful outputs to users, but it does not prevent users from attempting to elicit those outputs through adversarial prompts. If a user submits a prompt designed to manipulate the LLM into producing harmful content, the Trust Layer will attempt to block the output — but the attempt was still made, and the attempt itself may warrant investigation.

The filter is calibrated for general enterprise content — it is not domain-specific. For highly regulated industries (healthcare, financial services, legal), the Trust Layer's general content policy may not align precisely with your sector's specific prohibited content definitions. A financial services firm whose compliance policy prohibits the LLM from making any statement that could be construed as investment advice will find that the Trust Layer's general filter does not enforce that specific prohibition. Domain-specific content restrictions require additional guardrail design at the prompt template and agent instruction level.

Audit Logging and the Accountability Chain

The Trust Layer logs every LLM interaction — every prompt sent and every response received — in the Einstein audit trail. The log captures the timestamp, the user identity, the feature invoked, and metadata about the interaction. For Agentforce agents, the log also captures the sequence of actions the agent took and the data it accessed during the session.

This audit capability is genuinely useful for compliance and incident investigation. If a data subject submits a GDPR access request asking what AI features processed their data, the Einstein audit trail provides a queryable record. If an Agentforce agent takes an unexpected action, the audit log provides the reconstructable sequence of what the agent did and why.

The limitation of the audit log is that it captures what happened, not whether what happened was appropriate. The log records that the agent updated a record; it does not flag whether that update was outside the agent's intended scope. Identifying anomalies requires either automated monitoring rules against the audit log, or periodic manual review — neither of which the Trust Layer provides out of the box. The audit log is evidence; the governance programme provides the interpretation.

⚠️

Audit logs require active monitoring to be useful: Salesforce retains Einstein audit log data for a limited period (typically six months). If your compliance framework requires AI interaction records for longer retention, you must export and archive the audit log through an integration — it is not preserved indefinitely by default. Discover this requirement before a compliance audit, not during one.

What the Trust Layer Does Not Protect Against

The Trust Layer is designed around a specific threat model: data leakage into LLM training corpora, harmful content generation, and the absence of an accountability record. It addresses those threats meaningfully. It does not address the following threat vectors, which architects must compensate for explicitly.

Prompt injection: An adversarial user who crafts an input designed to override the agent's instructions or extract information beyond their access rights is not blocked by the Trust Layer. The Trust Layer filters outputs for toxicity; it does not validate that the LLM is honouring its instruction boundaries. Prompt injection resistance must be designed into the agent's system instructions and validated through adversarial testing before production deployment.

Excessive data exposure via grounding: The Trust Layer de-identifies specific sensitive values, but it does not limit how much data Atlas retrieves from the org to include in the grounding context. An agent whose Actions allow it to query broad record sets and include them in the LLM context is exposing that data to the LLM — de-identified at the field level, but still potentially revealing through the aggregate. The grounding scope must be designed with the principle of minimum necessary data, not maximum available context.

Business logic violations: An agent that invokes a valid Action in an inappropriate business context is not caught by the Trust Layer. If an agent approves a discount request it should have escalated to a manager, or sends an outreach email to a customer who has opted out, the Trust Layer's controls do not prevent this. Business logic guardrails must live in the Action definitions themselves and in the agent's Topic instructions — they cannot be delegated to the Trust Layer.

Sharing model bypass through agentic context: When an Agentforce agent runs a session, it accesses data using the running user's sharing permissions. But the agent may request data across multiple records in ways that a human user would not typically query — effectively combining information from multiple records into a single context that the human user could technically access individually but would rarely assemble in practice. This is not a Trust Layer bypass; it is a valid use of existing permissions. But it is a data minimisation concern that the Trust Layer is not designed to address.

Designing for the Gaps

Filling the Trust Layer's gaps requires deliberate architectural decisions at the design phase — not compensating controls bolted on after deployment. Three principles guide this design.

Minimum necessary grounding: Every merge field in a prompt template and every record query in an agent Action should be justified by a specific output quality requirement. If including a field does not demonstrably improve the quality of the LLM output, it should not be included. This principle limits the exposure surface without degrading output quality.

Action-level business rule enforcement: Business logic that must never be violated — cannot-contact flags, manager approval requirements, regulatory disclosures — should be enforced in the Flow or Apex that implements the Action, not in the LLM's instructions. The LLM can be instructed not to approve discounts above 20%; the Flow that executes the discount approval should enforce the 20% ceiling regardless of what the LLM decided. Defence in depth at the execution layer compensates for instruction drift at the reasoning layer.

Adversarial testing before production: Every agent and every high-volume generative feature should undergo structured adversarial testing before production deployment. This means attempting to elicit out-of-scope behaviour through carefully crafted inputs, testing boundary conditions on field access, and verifying that business rule enforcement at the Action level works correctly when the LLM attempts to invoke the Action with unexpected parameters. The Trust Layer protects against known harmful patterns; adversarial testing identifies the unknown ones specific to your deployment.

Key Takeaways

The Einstein Trust Layer is a suite of controls — de-identification, toxicity filtering, audit logging, and zero-data-retention commitments — not a single security boundary or a complete AI security solution.
Data de-identification tokenises standard PII fields before prompts reach the LLM, but commercially sensitive custom fields require explicit extension of the de-identification configuration beyond standard PII patterns.
The toxicity filter blocks harmful outputs but does not prevent adversarial prompt injection attempts, and its general content policy does not enforce domain-specific regulatory prohibitions.
The audit log provides a queryable record of every LLM interaction and agent action, but has limited default retention and requires active monitoring rules to identify anomalies — the log is evidence, not governance.
The Trust Layer does not protect against prompt injection, excessive data exposure through broad grounding, business logic violations by agents, or sharing-model concerns arising from agentic data aggregation patterns.
Compensating controls — minimum necessary grounding, business rule enforcement at the Action level, and adversarial testing before production — must be designed in explicitly; they cannot be delegated to the Trust Layer.

Check Your Understanding

Q1. A prompt template for an Agentforce agent includes a merge field for a custom "Contract_Pricing_Tier__c" field that contains commercially sensitive pricing information. Does the Einstein Trust Layer's de-identification automatically protect this field?

A. Yes — the Trust Layer de-identifies all fields classified as sensitive in the Salesforce data model

B. Not automatically — the Trust Layer de-identifies standard PII patterns; commercially sensitive custom fields must be explicitly added to the de-identification configuration

C. Yes — all custom field values are tokenised by default before being included in any prompt

D. No, custom fields are completely blocked from prompt templates for security reasons

Q2. An Agentforce agent is configured to apply a renewal discount and send a confirmation email. The business rule states no discount above 15% can be approved without manager authorisation. Where should this rule be enforced?

A. In the agent's system prompt instructions, telling the LLM not to approve discounts above 15%

B. In the Einstein Trust Layer's content policy configuration

C. In the Flow or Apex that implements the discount Action, enforcing the 15% ceiling at execution regardless of the LLM's decision

D. In a validation rule on the opportunity object, which forces the agent to retry with a lower value

Q3. What is the primary limitation of the Einstein audit log from a compliance monitoring perspective?

A. It only logs generative AI interactions and does not capture Agentforce agent actions

B. It captures what happened but does not flag whether it was appropriate — identifying anomalies requires active monitoring rules or manual review that the audit log does not provide by default

C. It is only accessible to Salesforce support staff and cannot be queried by the org's own administrators

D. It only retains records for 24 hours before they are permanently purged

The Salesforce AI Trust Layer: What It Does and Doesn't Protect

What the Trust Layer Actually Is

Data De-identification and Tokenisation

Toxicity Filtering and Content Policy

Audit Logging and the Accountability Chain

What the Trust Layer Does Not Protect Against

Designing for the Gaps

Key Takeaways

Check Your Understanding

Related Tutorials

Agentforce Architecture: The Technical Foundation of Autonomous Agents

AI Governance in Salesforce Programmes: Ethics, Bias, and Oversight

Einstein GPT and Generative AI in Salesforce: Real vs Marketed

Discussion & Feedback