- What Salesforce's 99.9% SLA actually means in hours — and what is and isn't covered by it
- How to read trust.salesforce.com proactively to detect degradation before your users report it
- The three incident severity levels and what each means for your business response
- How Salesforce's planned maintenance windows and tri-annual releases affect availability
- How to build a business continuity plan that treats Salesforce outages as an expected architectural constraint
What Salesforce's Trust Model Actually Guarantees
Salesforce's trust model is the set of commitments Salesforce makes about availability, data integrity, and incident transparency. It is codified in the Master Subscription Agreement (MSA) and the Salesforce Trust and Compliance Documentation. Most CTOs and programme directors know the headline number — 99.9% uptime — but haven't read what that number actually guarantees.
The 99.9% uptime SLA means Salesforce commits to no more than 0.1% unplanned unavailability per month. In hours, 99.9% monthly availability allows 43 minutes of unplanned downtime per month — or approximately 8.7 hours per year. This is not "always available." It is "available with acceptable brief interruptions." For business-critical workflows that run during market hours, a 45-minute outage on a Tuesday afternoon is a material business impact even if it's contractually within the SLA.
Salesforce's SLA measurement is binary: the platform is either "available" (users can log in and transact) or "unavailable" (login is impossible). Degraded performance — slow page loads, timeouts on specific features, API response times exceeding 30 seconds — may not constitute "unavailability" under the SLA, but they absolutely constitute business impact. Real-world Salesforce reliability includes both outages (which the SLA covers) and degradation events (which the SLA typically doesn't). Track both in your operational metrics.
Reading the SLA: What 99.9% Really Means
Several important exclusions and clarifications in Salesforce's SLA are worth understanding before presenting it to a board as a reliability commitment.
Planned maintenance is excluded. Salesforce has planned maintenance windows (typically on weekends) that are not counted against the 99.9% SLA. These windows are scheduled months in advance and published on trust.salesforce.com. Depending on your organisation's operating schedule, weekend maintenance windows may or may not impact your users — but they represent real scheduled unavailability that your BCP should account for.
The SLA applies to the standard platform — not all features. Some Salesforce features (certain Einstein capabilities, Heroku infrastructure, third-party AppExchange services) have separate SLAs or no SLA at all. If your critical business process depends on an Einstein AI feature, check whether that specific feature has an SLA commitment.
Financial remedies are limited. Salesforce's SLA violation remedy is typically a service credit of a percentage of monthly fees — not compensation for business losses caused by the outage. A 4-hour outage that prevents order entry during a peak period is not compensated by the 5% monthly fee credit that the SLA may provide. Understand the financial remedy model when making business cases for continuity investment.
trust.salesforce.com: How to Read It
trust.salesforce.com is Salesforce's real-time status dashboard. It shows per-instance (NA1, EU10, etc.) availability status, active incidents, maintenance windows, and historical incident data. Every Salesforce organisation is hosted on a specific instance, and your organisation's service status is on the row for that instance.
The dashboard displays five status levels: Operational (green), Degraded Performance (yellow), Partial Outage (orange), Service Disruption (red), and Under Maintenance (grey). Degraded Performance and Partial Outage statuses often precede Service Disruption — monitoring trust.salesforce.com proactively allows your operations team to begin contingency procedures before users report widespread problems.
// trust.salesforce.com also provides a status API
// Programmatic access to current instance status
GET https://api.status.salesforce.com/v1/instances/NA64/status
// Response structure (simplified):
{
"key": "NA64",
"location": "North America",
"environment": "production",
"releaseVersion": "254.3.0",
"releaseNumber": "254.3",
"isActive": true,
"status": {
"key": "OK",
"message": "The service is available"
},
"incidents": []
}
// Automate monitoring: poll this API every 5 minutes
// and send alerts to your ops channel when status != "OK"
Find your organisation's instance now (Setup → Company Information → Instance) and subscribe to trust.salesforce.com notifications for that specific instance. During an active incident, trust.salesforce.com is the authoritative source — it's updated more frequently than Salesforce Support tickets and tells you whether a problem is platform-wide (everyone on your instance is affected) or specific to your org. This distinction matters for your incident response: platform-wide issues require waiting; org-specific issues require opening a Salesforce Support case.
Incident Severity Levels
Salesforce classifies incidents into three severity levels, each with different response obligations and business impact profiles.
P0 — Critical: Complete service unavailability. Users cannot log in. All functionality is inaccessible. Salesforce's commitment is immediate response and continuous updates until resolution. P0 incidents are rare but have occurred. Their business impact is total — every process that depends on Salesforce is down simultaneously.
P1 — Major: Significant functionality is impaired but the service is accessible. Users can log in, but key features — report generation, approval processes, specific API endpoints — are degraded or unavailable. P1 incidents are more common than P0 and often affect specific features while others remain operational. Your BCP should identify which P1 scenarios are most likely to impact critical workflows.
P2 — Minor: Limited impact on non-critical functionality. Users can perform most tasks with minimal degradation. P2 incidents may not even be visible to typical users. Salesforce publishes P2 incidents for transparency but they typically don't require business response.
P0 is total outage — obvious, immediate, and everyone knows about it. P2 is minor — typically invisible. P1 is the dangerous middle: specific features fail while the platform appears functional. A P1 affecting the API endpoint your ERP uses for order sync may stop order processing without any visible error in the Salesforce UI. Users won't report it as a "Salesforce issue" — they'll report it as an "order processing issue." Train your operations team to check trust.salesforce.com as the first step in any unexplained business process failure.
Planned Maintenance and Releases
Salesforce's three annual releases (Spring, Summer, Winter) represent the most significant planned changes to the platform. Unlike typical SaaS updates, Salesforce releases upgrade all tenants simultaneously to the same version — there is no "opt out" of a major release (though some specific features can be enabled/disabled post-release). Releases introduce new features, deprecate old ones, and occasionally change default behaviour.
Releases are scheduled months in advance and sandboxes receive the release 4–6 weeks before production. This window exists specifically for organisations to test their customisations against the new platform version before production upgrade. The consequences of not testing in sandbox before a release: breaking changes discovered in production, automation failures, and layout changes that confuse users without warning.
Planned maintenance windows (distinct from releases) are shorter-duration activities: infrastructure updates, security patches, database maintenance. These are communicated 2–4 weeks in advance for standard maintenance and 7+ days for emergency security patches. Check the Maintenance Calendar on trust.salesforce.com at least monthly.
When Salesforce Goes Down: The Runbook Leaders Never Write
Most Salesforce-dependent organisations have no documented runbook for Salesforce unavailability. The response during an incident is improvised: support tickets are raised, IT teams scramble, and business processes halt while waiting for restoration. This is preventable with a minimal runbook that doesn't require weeks to create.
A minimal Salesforce outage runbook covers: who is notified (the escalation chain), where to check status (trust.salesforce.com, instance), what manual fallbacks exist for the top 5 critical business processes, how long the manual fallback can be sustained before business impact is unacceptable, and when to escalate to a Salesforce Premier Support case vs waiting for the trust.salesforce.com resolution timeline.
The "top 5 critical business processes" is the most important part. Not every Salesforce process needs a manual fallback. Identify the 3–5 processes where Salesforce unavailability causes immediate, material business impact (order entry, customer service case creation, financial approval workflows) and document the manual alternative for each. This is 80% of the value of a BCP document in 20% of the work.
Building Business Continuity Around the Trust Model
Mature Salesforce-dependent organisations treat platform availability as an architectural constraint, not an assumption. Business continuity planning for Salesforce is not about preventing outages (Salesforce controls that) — it's about designing business processes that can tolerate brief Salesforce unavailability without catastrophic business impact.
Practical continuity architecture: design offline capabilities for the most critical field-facing workflows (mobile offline via Briefcase), maintain fallback communication channels that don't depend on Salesforce (if your customer communications route through Service Cloud, have a direct email fallback), and ensure that critical integrations (order processing, payment capture) have a queue-based design that buffers transactions during Salesforce outages and replays them on restoration rather than losing them.
Once a year, simulate a Salesforce outage (in a controlled way — block Salesforce access for a small team, don't take production down): can your team identify what's affected within 5 minutes, communicate status within 10 minutes, and begin manual fallbacks within 15 minutes? If any of these fail, you have a BCP gap. The simulation reveals process knowledge gaps (who knows the manual order entry process?), tool gaps (does the email fallback template exist?), and escalation gaps (does anyone have the Salesforce Support Premier number saved?). These are cheap to discover in a simulation and expensive to discover in a real incident.
Key Takeaways
- Salesforce's 99.9% SLA permits 43 minutes of unplanned downtime per month — it is not a zero-downtime commitment, and planned maintenance and certain features are excluded from the guarantee
- The SLA measures binary availability; degraded performance (slow pages, feature-specific failures) does not constitute "unavailability" but causes real business impact — track both in your operational metrics
- trust.salesforce.com is the authoritative real-time status source — know your instance, subscribe to notifications, and train your operations team to check it before raising internal incidents
- P1 (major, partial) incidents are the most important BCP scenario — platform appears accessible while specific features fail, often missed until downstream systems report errors
- Salesforce's tri-annual releases are mandatory, simultaneous upgrades — testing in sandbox during the 4–6 week pre-production window is the only way to catch breaking changes before they affect users
- A minimal outage runbook with the top 5 critical business processes and their manual fallbacks provides 80% of BCP value with 20% of the effort
- Mature continuity architecture buffers transactions during outages (queue-based integrations, Briefcase offline) rather than failing hard — design for Salesforce unavailability as an expected constraint, not a surprise
Checkpoint: Test Your Understanding
1. Salesforce's 99.9% monthly uptime SLA translates to approximately how much permitted unplanned downtime?
2. A P1 incident is posted on trust.salesforce.com while your ERP's order sync integration appears to have stopped. What should your operations team do first?
3. Why is it critical to test your Salesforce customisations in a sandbox 4–6 weeks before a major Salesforce release?
Discussion & Feedback