← Back to Delivery Management
DEL-007 Delivery Management 25 min read For: Delivery Managers

Managing a Big Bang Go-Live: Risks, Mitigations, and the Decision to Flip

The definitive guide to big bang go-lives — what makes them fail, what makes them survive, and how to make the go/no-go decision when the pressure is at its highest.

VS

Vishal Sharma

Salesforce Delivery Specialist · Updated May 2026

What you will learn in this tutorial
  • Why most programmes end up with a big bang go-live even when the delivery team knows it is higher risk
  • The three failure modes that kill big bang go-lives — and the warning signs that appear weeks beforehand
  • What a credible go-live risk register looks like and what "done" actually means for each item
  • How to run the week before go-live and go-live day itself in a way that keeps you in control
  • How to make and communicate a go/no-go recommendation when the pressure to say "go" is at its maximum

Big Bang vs Phased: How the Decision Gets Made

Let's be honest about how the big bang decision gets made on most Salesforce programmes. It is almost never made by the delivery manager. It is made by the commercial team, the licence contract, the executive sponsor, or the board presentation that committed to a live date six months before the project started. By the time you arrive, the date is often already fixed. Your job is not to relitigate the decision — it is to deliver safely within it.

That said, you need to understand the forces that drive the big bang preference, because understanding them helps you manage the stakeholders who are invested in maintaining it.

The Commercial Forces Behind Big Bang

The most common driver is licence cost. When an organisation is running a legacy system in parallel with a new Salesforce implementation, they are paying for both. Every month of phased delivery is another month of dual running costs. For a large organisation with a complex legacy estate, those costs can easily run to six figures per month. The CFO sees the go-live date not as a technical milestone but as the date on which a significant cost line disappears from the budget. Compressing the timeline is not stubbornness — it is a rational financial decision, made by people who are not wrong to make it.

The second driver is stakeholder impatience. Enterprise Salesforce programmes typically run for twelve to twenty-four months. By month eighteen, senior stakeholders have heard the word "delivery" many more times than they have seen delivered capability. The pressure to show the whole thing working — not a pilot, not a subset, but the real thing — is genuine and understandable. A phased approach that delivers something in month twelve but doesn't deliver the full promised scope until month twenty-four may feel, to a board, like the project is never going to end.

The third driver, less often acknowledged, is integration complexity. Many Salesforce implementations touch five to fifteen downstream and upstream systems. In genuinely complex integration landscapes, a phased approach that keeps the legacy system partially live creates a hybrid state that is harder to manage than a clean cutover. You end up with two sets of integrations, two sets of data flows, and an operational team that has to know which system is the record of truth for which subset of data. Some organisations reasonably conclude that the clean break of a big bang is less risky than the sustained complexity of a long phased transition.

When Big Bang Is Genuinely the Right Call

There are situations where big bang is not just commercially driven but architecturally correct. If your data model requires referential integrity across objects that cannot be split across systems, a phased approach breaks data quality. Regulatory environments — particularly those involving financial services or healthcare data — sometimes mandate that there be a single system of record with no overlap period. And when the legacy system is genuinely failing — not just old, but actively unreliable — the risk of running it in parallel for another six months may exceed the risk of a clean cutover.

None of this means big bang is easy. It means that when someone tells you phased delivery is not an option, they may not be wrong. This tutorial is written for that situation: you have been told the go-live date is fixed, big bang is the approach, and your job is to make it work.

💡
Insight

The most dangerous point in any big bang programme is when the delivery team privately believes the go-live date is unachievable but nobody has said so formally. This silence is rational — nobody wants to be the person who delays a multimillion-pound programme. Your job as delivery manager is to create the conditions where the honest assessment can be surfaced early enough to act on it, not on the night before go-live.

The Three Things That Kill Big Bang Go-Lives

After reviewing dozens of Salesforce programme post-mortems, the failure modes cluster into three categories. They are not surprising in retrospect. They are almost always visible in advance — if you know what to look for.

Failure Mode 1: Data Migration

Data migration is the most common cause of big bang go-live failure, and it fails in predictable ways.

The first is volume surprise. The programme team estimated the data migration based on record counts from a legacy report. When the actual extract ran, it was four times larger than expected — because the report had been filtering out inactive records, historical data, and soft-deleted entries that the legacy system was still carrying. The migration that was expected to complete in six hours took twenty-six. The go-live window closed before the data was clean in the target system.

The second pattern is data quality discovered at midnight. A programme I reviewed had run multiple data migration rehearsals against a static extract. The rehearsals passed. What they hadn't tested was a live extract from the legacy system in its final operational state — with all the last-minute data entry that users had done in the final weeks before the freeze. That data included relationship references to records that had been merged, deleted, or never existed in the source system. The referential integrity failures weren't caught until the migration ran in the live window, at 11 PM, with go-live scheduled for 7 AM. The team spent the night triaging records manually. Some made it. Some didn't. Users went live the next morning with incomplete data, which took three weeks to fully remediate.

The warning signs are visible weeks before go-live. If your data migration rehearsals are consistently finding new categories of error — not the same errors being resolved, but new error types appearing — your data quality assessment is incomplete. If the time taken to run the migration is not stabilising across rehearsals, your volume estimates are wrong. If the data migration team is routinely staying late to meet rehearsal windows, the real migration is going to be worse.

Failure Mode 2: Integration

The integration failure that kills go-lives is almost never the integration everyone was worried about. It is the one that was considered low-risk, low-volume, or somebody else's responsibility.

A common pattern: a programme had a complex real-time ERP integration that the team had spent months testing. It worked. What failed was the overnight batch feed from a billing system that nobody had load-tested because the volumes were described as "small." On go-live night, the batch ran with three months of accumulated transactions — because the parallel running period had generated a backlog — and the Salesforce API limits were hit at 2 AM. The billing system retried, generated duplicate records, and by morning the Salesforce data was in an inconsistent state with the billing platform.

The warning sign here is any integration that has been tested in isolation but never end-to-end at production volume. If your integration testing has been conducted against synthetic data sets with capped volumes, you do not know how those integrations behave at real load. And "real load" almost always includes edge cases — concurrent calls, retry logic, error handling paths — that are invisible until production.

Failure Mode 3: Change Management

Users who were "trained" but not ready is a failure mode that is underappreciated because it does not show up on the technical readiness checklist. The system works. The integrations work. The data is clean. And then 400 people log in for the first time and cannot do their jobs.

The most reliable warning sign is training completion metrics that measure attendance rather than competence. If your training sign-off says "95% of users have completed training" but nobody has validated whether those users can actually perform their key workflows, you have a participation metric, not a readiness metric. The second warning sign is when the business team responsible for change management has been consistently deprioritised in programme governance — their RAG status is reported as amber but nobody escalates it because the technical workstream is the real priority.

⚠️
Warning

If you reach the week before go-live and your change management lead is still writing training materials, your go-live is not ready. Training material creation should be complete no later than four weeks before go-live. The final four weeks are for delivery, competency validation, and support preparation — not content creation. A change management workstream that is still in build mode at T-4 weeks is a go-live risk, regardless of what the technical RAG says.

The Go-Live Risk Register: What Should Be on It

Every programme has a risk register. Most risk registers are maintained for governance reporting, not for actual decision-making. The go-live risk register is different. It is a live document, updated daily in the final four weeks, that contains the specific risks that could prevent or derail the go-live. It is the document you read at every morning stand-up in the war room. Here is what it must contain.

Data Migration Sign-Off: What "Done" Actually Means

Data migration is done when: the full production dataset has been migrated in a rehearsal, the migration completed within the allocated time window with at least a 20% buffer, the error rate is below your agreed threshold (typically less than 0.1% of records), all rejected records have a documented remediation path, and a senior representative of the business has signed off on the data quality in the target system. Not "reviewed a sample." Signed off. In writing.

If any of those criteria are not met, data migration is not done. Calling it done when it isn't is the single most common source of go-live night surprises.

Performance Testing at Production Load

Performance testing in UAT runs at UAT load. UAT load is almost never production load. Your performance testing for go-live sign-off should be run at a minimum of 150% of expected peak production load — because peak production load on day one is usually higher than normal operating load, as users log in for the first time, run their first reports, and the system handles the backlog of work that accumulated during the freeze period.

The specific metrics you need: page load times for the five most critical user journeys under peak load, API response times for all integrations under concurrent call scenarios, batch processing times for overnight jobs at production data volumes, and report execution times for the reports the business will run on day one.

Hypercare Plan

Hypercare is the period immediately after go-live — typically two to four weeks — when the full implementation team is available on short notice to resolve production issues. A hypercare plan that exists as a paragraph in a project document is not a hypercare plan. A hypercare plan that will actually work specifies: which team members are on call, what their on-call hours are, what the escalation path is (user issue to support team to implementation team to vendor), what the severity classification system is, what the response time SLA is for each severity level, and who has the authority to make configuration changes in production outside of normal change control.

Rollback Criteria

This is the conversation nobody wants to have before go-live. Every big bang go-live needs a defined rollback criteria — the specific conditions under which you will halt the go-live and revert to the legacy system — and a rollback plan that has been rehearsed, not just documented. The rollback criteria should be agreed by the executive sponsor before the go-live window opens. If you are agreeing rollback criteria at 3 AM on go-live night, you will not make a rational decision.

The Go-Live Risk Register Table

Risk Item Likelihood Impact Mitigation Owner
Data migration exceeds time window Medium Critical 3 full rehearsals at production volume; go-live window sized with 25% buffer Data Lead
Integration failure at production load Medium High End-to-end load test at 150% peak; all integrations tested with real retry scenarios Integration Lead
User adoption failure on day one Medium High Competency assessment completed; floor walkers deployed for first 5 days Change Manager
Data quality errors in migrated records High High Business sign-off on data sample; remediation scripts tested; error threshold defined Data Lead / Business Owner
Performance degradation under peak load Low High Performance baseline established; Salesforce support case pre-raised for go-live day Technical Lead
Legacy system unavailable for rollback Low Critical Legacy system in read-only state, not decommissioned, for minimum 72 hours post go-live Infrastructure Lead
Key team member unavailable on go-live night Low Medium Named backup for each critical role; runbook documented so any team member can execute Delivery Manager
💡
Insight

A risk register that contains items with "likelihood: low, impact: critical" and no mitigation owner is not a risk register — it is a liability log. Every critical-impact risk must have a named individual accountable for the mitigation, a dated milestone for when the mitigation will be in place, and a verification method. If the mitigation is "we'll deal with it on the night," that is not a mitigation.

The Week Before Go-Live: What Good Looks Like

The week before go-live is not a normal project week. It is a focused operational preparation period with a specific agenda each day. If your team is still resolving open UAT defects in the week before go-live, you have a problem that no amount of good planning in that week will solve. The week before go-live assumes that the build is complete and signed off. Here is what that week should look like.

Day-by-Day Outline

Monday (T-7): Dress rehearsal. A full end-to-end rehearsal of the go-live sequence — data migration, validation, integration smoke tests, user access provisioning — run as closely as possible to the real thing, including the timing. The dress rehearsal is not for finding defects. It is for validating that the sequence works, that the timings are accurate, and that every team member knows their role. Any significant issues found in the dress rehearsal are escalation items, not normal defect resolution.

Tuesday (T-6): Dress rehearsal debrief and remediation. Address any issues found on Monday. Update the runbook based on what actually happened vs what was planned. Confirm all go-live day roles and contact details. Confirm that the legacy system team has a freeze date and knows what it means.

Wednesday (T-5): Data freeze. The legacy system is put into a read-only or restricted state. No new data entry except for transactions that have a defined migration path. This is also the day for final communications to end users: what is happening, when, what they need to do, and who to contact if they have issues.

Thursday (T-4): Final data extract and validation. Run the data validation scripts against the final extract. Confirm record counts and error rates. Any errors found on Thursday that were not present in previous rehearsals are a serious concern. This is also the day for the war room setup — physical or virtual — and the final confirmation of everyone's availability for go-live night.

Friday (T-3 for a Monday go-live, or T-1 for a weekend go-live): Final readiness review with executive sponsor. All exit criteria reviewed. Go/no-go formally confirmed. Stakeholder communications sent. On-call rota confirmed. Everyone rests.

Who Should Be in the War Room

The war room on go-live day should contain: the delivery manager (running the room), the technical lead, the data migration lead, the integration lead, the infrastructure lead, the change management lead, and — critically — a business representative who has the authority to make decisions about data quality and user access. It should not contain senior executives (they create pressure that distorts decision-making), third-party vendor representatives who aren't actively involved in the go-live sequence (they observe, they don't contribute), and it should not contain more than fifteen people total. A war room with thirty people in it is not a decision-making environment.

Technical Readiness Checklist

This is not exhaustive, but these are the items that most often catch teams off-guard:

  • All production profiles and permission sets have been reviewed against the UAT configuration — not assumed to be the same
  • Email deliverability settings in production have been tested (workflow email alerts, case notification emails)
  • All scheduled Apex jobs and scheduled flows have been reviewed and confirmed for production scheduling
  • Connected App OAuth settings for all integrations have been validated in production, not just sandbox
  • Named credentials and custom metadata pointing to external endpoints are confirmed as production URLs, not sandbox URLs
  • Data storage limits have been reviewed — you are about to add a large volume of migrated data
  • All Salesforce support cases for known platform issues have been reviewed for any active bugs affecting your use cases
  • The deactivation sequence for the legacy system has been confirmed with the team responsible for it

The 48-Hour Stakeholder Email

Forty-eight hours before go-live, send a brief communication to all executive stakeholders. It should cover: the go-live sequence and timing, the readiness status (green/amber/red for each workstream — be honest), the communication plan for go-live day, who the named escalation contact is, and what the plan is if go-live is postponed. Sending this email serves two purposes. It manages expectations and it forces you to articulate your readiness position in writing — which surfaces any concerns you have been privately carrying but not escalating.

Go-Live Day: Hour by Hour

Go-live day is the most operationally intensive day of any programme. The sequence matters. Deviating from the agreed sequence under pressure is one of the most common ways go-lives fail.

The Sequence That Works

Data migration first. Nothing else starts until the data migration is complete and validated. This is not negotiable. Teams that start provisioning user access while the data migration is still running create a situation where users can log in to an incomplete or inconsistent system. Keep user access locked until data validation is complete.

Integration validation second. Once the data is confirmed clean in the target system, run the integration smoke tests. Each integration should have a defined set of test transactions that validate the end-to-end flow. This is not a full regression test — it is a confirmation that each integration is alive and producing expected results.

Business sign-off third. Before users are given access, the business representative in the war room formally reviews and signs off on the data quality, the integration validation results, and the system state. This takes fifteen to thirty minutes. It is not optional. It is the moment at which responsibility for the system transfers from the delivery team to the business.

User access last. Only after data, integrations, and business sign-off are complete do you provision user access. Stagger the rollout if you can — release access to a pilot group first, confirm their experience is as expected, then open access to all users.

Decisions You Will Face

The decisions that matter on go-live day are the ones you cannot fully anticipate. But some decisions are predictable enough to prepare for. When to pause: if the data migration is running significantly behind the rehearsal timing, pause and reassess. Do not let the migration continue to completion on hope alone if the evidence suggests it will not finish within the window. When to push through: minor errors with known remediation paths — a few hundred records that failed validation and have a clean fix — are not a reason to pause. Your error threshold was agreed in advance. If you are within it, continue. When to call it: if you reach the rollback criteria you agreed before the go-live window opened, you call it. Even if it is 5 AM and go-live is at 7 AM.

Communication Cadence

During the go-live window, send a status update to the executive sponsor every two hours, regardless of whether anything has changed. The update should be three lines: where you are in the sequence, whether you are on time, and any items requiring awareness. Do not wait for something to go wrong before you communicate. Silence during a go-live window creates anxiety in people who are not in the room, and anxious executives make phone calls that interrupt the team.

The Most Important Role in the War Room

The most important role in the war room is not the delivery manager. It is the person who has the authority to make the go/no-go call. On most programmes, this is the executive sponsor or their nominated deputy. They need to be available throughout the go-live window — not "reachable by phone" but actively present, either physically or on a video call. A decision to halt a go-live at 3 AM that requires waking up a board member who was not expecting to be involved is a decision that is likely to be delayed, made with incomplete information, or overridden. The decision-maker should be in the room from the start.

The Go/No-Go Decision Framework

The go/no-go decision is the most consequential decision in a big bang programme. It is also the decision that is most susceptible to pressure, exhaustion, and motivated reasoning. Understanding the forces at work helps you make it more clearly.

The Pressure to Say Go When the Answer Is No

The pressure is real and it comes from multiple directions simultaneously. The business has scheduled hypercare support, floor walkers, and management briefings for the morning after go-live. Marketing has prepared a launch communication. The executive sponsor has told the board the system is going live this weekend. The legacy system team has already partially decommissioned their support capacity. Every one of these creates a gravitational pull toward saying go, even when the delivery team knows the system is not ready.

The antidote to this pressure is not willpower. It is exit criteria that were agreed in advance, in writing, by the executive sponsor. When exit criteria exist, the go/no-go decision is not a judgment call under pressure — it is a factual assessment against agreed standards. The delivery manager's job is to report the facts. The decision is owned by the person who signed off the exit criteria.

Exit Criteria: What These Must Cover

If these items are not green, you do not go live:

  • Data migration: completed within the time window, error rate below the agreed threshold, business sign-off obtained
  • Integration: all critical integrations passed smoke tests with no unresolved errors
  • Performance: system response times within agreed thresholds under load
  • Security: all user profiles and permission sets reviewed; no unintended data access
  • Rollback: legacy system confirmed available and restorable within the agreed recovery time objective
  • Business readiness: business representative present and willing to accept the system

Note that "training completion" is not on this list. Training completion is a leading indicator, not a readiness indicator. What belongs here is business readiness — a human being who represents the business has reviewed the system and is prepared to accept it.

What a Postponement Looks Like at 23:00

A postponement at 23:00 the night before a 07:00 go-live is not a catastrophe. It feels like one. But a failed go-live that requires emergency rollback at 10:00 AM — with users in the system and data in an inconsistent state — is an actual catastrophe. The right comparison is not "postponement vs smooth go-live." The right comparison is "postponement vs failed go-live."

When you make the call to postpone, you need four things ready: a clear statement of why (the specific exit criterion not met, not a vague reference to "readiness concerns"), a proposed revised date, an immediate communication to the executive sponsor, and a plan for the next morning's communication to the wider business. The statement of why should be factual and specific. "Data migration completed at 89% with an error rate of 2.3% against a threshold of 0.5%" is a statement that a board can understand and accept. "We don't feel ready" is not.

Presenting a No-Go Recommendation

A no-go recommendation presented to a board should follow a simple structure. First, the specific exit criterion that is not met and the evidence. Second, the risk of proceeding (quantified where possible — "proceeding with a 2.3% data error rate means approximately X records will have incorrect data, affecting Y business process"). Third, the proposed remediation and revised timeline. Fourth, what can be done in the interim to reduce business impact (keeping the legacy system operational, providing workarounds for critical processes).

What you are not doing is apologising. A no-go recommendation made on the basis of objective evidence is professional delivery management. The board may be disappointed. They will respect a delivery manager who gave them an honest, evidence-based recommendation significantly more than one who said go and then spent the next month managing a crisis.

Post Go-Live: The First 30 Days

A go-live that concludes at 7 AM is not the end of your work. It is the beginning of the period in which the programme's success or failure will actually be determined. The first thirty days are when most big bang go-lives either stabilise into sustainable operations or spiral into a crisis that damages the programme's credibility beyond recovery.

Hypercare: What It Is and What It Isn't

Hypercare is not a help desk. It is an intensive, time-limited operational support model in which the full implementation team — developers, configuration specialists, data team, integration team — is available at significantly elevated response times to address issues as they emerge. The distinction matters because hypercare is expensive. A twenty-person implementation team on two-hour response times is a significant cost. It exists because the risk profile of the first weeks after a big bang go-live justifies the cost. As the risk profile reduces, hypercare should wind down systematically.

The characteristic issues of the first week are different from the issues of the first month. In the first week, you will see: user access and permission issues, data quality questions from users who cannot find records they expected to be there, integration errors surfacing in real transaction volumes, and user workflow confusion in business processes that were not adequately tested in UAT. These are expected. They are not signs that the go-live failed. They are the normal operational noise of a complex system under first-use conditions.

In the first month, the issues shift. By week two, the acute access and data issues should be resolved. What you start seeing is: business process gaps — workflows that users need but were not built because they were not surfaced in requirements — report and dashboard gaps, performance issues under sustained rather than peak load, and integration issues related to the edge cases that only appear in real-world transaction patterns. These issues require a structured triage process: which are genuine gaps that should be addressed in a post-go-live sprint, which are working-as-designed features that require user education, and which are defects that require emergency fixes.

Stabilisation Criteria

Hypercare should end when specific, measurable criteria are met — not on a calendar date. The criteria should include: support ticket volume has dropped below an agreed threshold for a sustained period (typically one week), no critical or high-severity incidents have been raised for a defined period, key business KPIs are within expected ranges (conversion rates, case closure times, or whatever metrics the business uses to measure operational health), and the business IT team that will own long-term support has been successfully transferred the system and has handled at least one issue independently.

The Retrospective That Usually Doesn't Happen

Most programmes hold a retrospective after UAT. Almost none hold a meaningful retrospective after go-live. The go-live retrospective is the most valuable retrospective in the programme, because it is the only retrospective conducted after real users have used the real system with real data. It answers questions that no other review can: which risks materialised and which didn't, what the early warning signs were that the team saw but didn't escalate, what the data migration experience tells you about your source system assumptions, and what users needed that wasn't built.

The output of the retrospective should not be a list of lessons learned that is filed and forgotten. It should be a set of specific actions: changes to the estimation process for the next programme, changes to the data migration methodology, changes to the go/no-go exit criteria, and a frank assessment of whether the go-live date was realistic and what drove the decision if it wasn't. This is institutional knowledge. It is worth capturing.

Key Takeaways

  • Big bang go-lives are usually driven by commercial pressure, not technical preference — understanding this helps you manage stakeholders rather than fight the decision
  • The three failure modes are data migration, integration, and change management — all have visible warning signs weeks before go-live if you know what to look for
  • Data migration is "done" only when it has been completed within the time window, below the error threshold, with business sign-off — not when the technical team says it is done
  • Exit criteria agreed in advance by the executive sponsor are the only reliable protection against the pressure to say "go" when the honest answer is "no"
  • The go-live sequence — data migration, integration validation, business sign-off, user access — should be followed exactly, regardless of time pressure
  • A postponement at 23:00 is not a failure; a failed go-live with 400 users in an inconsistent system is a failure
  • Hypercare should end against measurable stabilisation criteria, not calendar dates, and the go-live retrospective should be treated as the most valuable learning event of the programme

Checkpoint: Test Your Understanding

1. A data migration rehearsal has been completed and the error rate is 2.1% against an agreed threshold of 0.5%. The go-live is in 48 hours. What is the correct action?

A. Proceed — a 2.1% error rate means 97.9% of records are clean, which is acceptable for a go-live
B. Escalate to the project manager and ask them to make the call
C. The exit criterion is not met — escalate to the executive sponsor with the specific evidence and a recommendation to postpone, citing the agreed threshold
D. Run the migration anyway and fix the errors manually after go-live during hypercare

2. It is 02:30 on go-live night. Data migration is complete and validated. Integration smoke tests are underway. An integration with a billing system is producing duplicate records in Salesforce under retry conditions. This integration was not in the critical path for go-live. What should you do?

A. Continue with go-live — the integration was not critical and can be fixed in hypercare
B. Assess whether the duplicate records would affect the business process, whether the integration can be temporarily disabled without breaking core functionality, and make the call based on evidence — not on the desire to meet the go-live time
C. Wake up the executive sponsor immediately and ask them to decide
D. Roll back immediately — any integration failure is a blocker regardless of severity

3. Which of the following is the most reliable indicator that change management is on track for a big bang go-live?

A. 95% of users have completed the required training sessions
B. The change management workstream is reporting green in the programme RAG status
C. Users have been assessed against their ability to perform key workflows, floor walkers are confirmed and briefed, and the business representative is prepared to accept the system
D. Training materials have been completed and distributed to all users two weeks before go-live

Discussion & Feedback