Data Migration Strategy: The 6-Week vs 6-Month Debate

What you will learn...

The five factors that determine migration complexity — and which ones are usually underestimated
Data profiling: what to measure before writing a single migration script
The three migration execution strategies: big-bang, phased, and parallel-run
How to design for cutover — delta sync, freeze windows, and rollback plans
Data quality and cleansing: when to fix data in the source versus the target
The architecture of a migration pipeline: extraction, staging, transformation, and load

What Makes a Migration Complex

The 6-week vs 6-month question resolves to five factors: data volume, data quality, relationship complexity, business rule complexity, and system availability constraints. A migration that is small on volume but severe on data quality will take longer than a large-volume migration of clean, well-structured data. Most teams correctly estimate volume — it's the other four factors where estimates go wrong.

Data quality is the most underestimated complexity driver. Legacy CRM systems (Siebel, older Salesforce orgs, custom databases) accumulate years of inconsistent data entry: duplicate accounts with different names and addresses, contacts with invalid emails, opportunities with missing required fields (that were added as required years after the records were created), and referential integrity violations where lookup IDs point to records that no longer exist. Profiling source data before estimation is mandatory, not optional.

Relationship complexity determines load order. If you are migrating Accounts, Contacts, Opportunities, Cases, and their related attachments, you must load them in dependency order: Accounts first (other objects reference them), then Contacts and Opportunities (which reference Accounts), then Cases (which may reference Contacts and Accounts), then Attachments (which reference Cases). Any parent-child relationship where both parent and child are migrated adds an ordering constraint and requires temporary ID mapping between source IDs and target Salesforce IDs.

⚠️

Never estimate without data profiling: The most expensive mistake in migration planning is providing a timeline before profiling the source data. A team that commits to 6 weeks based on record count alone, without profiling data quality, will discover the real complexity at week 4 when cleansing work doubles the remaining timeline. Always profile first, estimate second.

Data Profiling Before Migration

Data profiling is the systematic analysis of source data to understand its quality, completeness, and structural characteristics. Before writing any migration code, every field that will be migrated should be profiled: completeness rate (what percentage of records have a non-null value), uniqueness rate (for ID and email fields, what percentage are unique), format validity (are date fields in consistent formats, are phone numbers in parseable formats), and referential integrity (what percentage of foreign keys resolve to existing parent records).

The profiling output drives the migration plan. A field with 40% completeness means 60% of target records will have a null value — is that acceptable, or does it require a default value strategy? A field with 15% invalid email formats means 15% of Contact records will fail Salesforce's email validation — those records need cleansing before migration or they will fail at load time. Profiling turns surprises into planned work.

-- Sample data profiling query against source database
SELECT
  COUNT(*) AS total_accounts,
  COUNT(email) AS has_email,
  ROUND(COUNT(email) * 100.0 / COUNT(*), 1) AS email_completeness_pct,
  COUNT(DISTINCT email) AS unique_emails,
  SUM(CASE WHEN email NOT LIKE '%@%.%' THEN 1 ELSE 0 END) AS invalid_email_format,
  SUM(CASE WHEN phone IS NULL THEN 1 ELSE 0 END) AS missing_phone,
  SUM(CASE WHEN billing_country IS NULL THEN 1 ELSE 0 END) AS missing_country
FROM source_accounts;
-- Run this profile for every object before estimating migration effort

Migration Execution Strategies

Big-bang migration migrates all data in a single cutover event. The source system is frozen (no new data entry), the final delta is migrated to Salesforce, and users switch to Salesforce on a defined date. The advantage is simplicity — one migration, one cutover. The risk is that if the migration fails or produces unacceptable data quality, rollback means reverting to the source system while fixing the issue. Big-bang migrations work for smaller datasets (under 2 million records) with high data quality and a clear freeze window.

Phased migration migrates data in stages — typically by business unit, geography, or object type. Phase 1 might migrate Accounts and Contacts; Phase 2 adds Opportunities; Phase 3 adds historical Cases. Each phase has its own cutover, with the source system partially decommissioned as each phase completes. Phased migration reduces risk per phase but introduces complexity: for the duration of the phased migration, data exists in both systems, requiring a synchronisation layer to keep them consistent.

Parallel-run migration runs both systems simultaneously with bidirectional sync until confidence is established that Salesforce is correct and complete. Users work in Salesforce while the legacy system remains live as a fallback. This is the lowest-risk strategy but the highest-cost — maintaining bidirectional sync between two live systems requires significant integration effort and increases the total migration duration.

The Migration Pipeline Architecture

A production migration pipeline has four stages: extraction, staging, transformation, and load. Extraction pulls data from the source system into an intermediate staging environment — typically a cloud database or object storage — without any transformation. Keeping extraction separate from transformation allows the full source dataset to be preserved for re-running transformation and load stages when issues are discovered, without re-extracting from the (often slow, often production) source system.

The staging database is the migration team's working area. Data profiling runs against the staging data. Transformation scripts (SQL or Python) convert source data to Salesforce-compatible formats, apply cleansing rules, generate External ID values, and produce the final load files. The transformation stage also performs the lookup resolution — replacing source system IDs with the Salesforce IDs of already-loaded parent records.

The load stage uses Salesforce Bulk API 2.0 to push transformed data to Salesforce. Load jobs are idempotent via External ID upsert — each run either creates or updates records based on the External ID, allowing the load to be re-run safely when failures occur. Post-load validation queries compare record counts and key field values between staging and Salesforce to confirm completeness.

💡

Run the migration at least three times before production: The first mock migration run reveals transformation bugs and data quality issues. The second run, after fixes, measures actual throughput and confirms the production load will complete within the freeze window. The third run (ideally in a full sandbox) is a dress rehearsal. Teams that run only one test migration before production almost always discover something they didn't expect during the real cutover.

Cutover Planning and Delta Sync

Cutover is the period between freezing the source system and users going live on Salesforce. For a large migration, the initial load may have run weeks earlier — the final cutover load only needs to migrate the delta (records created or modified since the initial load). Delta sync requires timestamp-based extraction from the source: all records where last_modified > initial_load_date are re-extracted, re-transformed, and re-loaded as updates to already-migrated records.

The freeze window — the period when the source system is locked for new entry — should be as short as possible while allowing the delta load to complete and the validation checks to pass. For a 10-million-record migration, the initial load might run over a weekend; the final delta of changed records since the initial load might be 50,000 records that load in under an hour. Design for the delta, not just the full load.

Rollback planning is non-negotiable. If the migration validation fails after cutover — data quality issues, missing records, broken relationships — the fallback must be tested and ready. For big-bang migrations, rollback means restoring the source system to production use and invalidating the Salesforce data. This requires that the source system not have been decommissioned, user training not have been fully completed, and the Salesforce org be reset to its pre-migration state. Test the rollback procedure as rigorously as the forward migration.

Data Quality: Fix at Source or Target

The principle is fix at source wherever possible. If an Account record has an invalid billing address in the source system, fixing it in the migration transformation layer means the source system retains the bad data, the migration appears to succeed, and if anyone ever needs to roll back or re-migrate, the same cleansing must be re-applied. Fixing at source means the source system (which may continue to be used as a reference) also has clean data.

In practice, fixing everything at source is rarely feasible within a migration timeline. The pragmatic approach is a cleansing matrix: for each data quality issue, categorise as either source-fixable (the business owner of the source system can correct the data before migration begins) or migration-fixable (the transformation layer applies a standard cleansing rule). Critical fields — email, phone, primary identifier — should be source-fixed. Low-criticality formatting issues — phone number format normalisation, country name standardisation — are appropriate to handle in transformation.

Key Takeaways

Migration complexity is driven by five factors: data volume, data quality, relationship complexity, business rule complexity, and system availability. Data quality is the most commonly underestimated driver of extended timelines.
Profile every field of every migrated object before estimating. Completeness rate, uniqueness rate, format validity, and referential integrity are the four key profiling metrics that reveal hidden complexity.
Choose the execution strategy based on risk tolerance: big-bang (simple, highest risk), phased (balanced), parallel-run (lowest risk, highest cost). The choice must be made before the migration architecture is designed.
The migration pipeline has four stages: extraction (to staging), staging (transform and profile), transformation (source to Salesforce format), load (Bulk API upsert via External ID). Keep these stages separate to enable re-runs.
Run the migration at least three times — initial mock, performance-validated mock, and full dress rehearsal — before production cutover. First-run surprises during production cutover are avoidable failures.
Design for the delta cutover, not just the initial load. The final cutover load should be only records changed since the initial load — measured in hours, not days, of load time within the freeze window.

Test Your Understanding

1. A migration team estimates a 6-week timeline for migrating 3 million Account records from a legacy CRM. The estimate is based primarily on record count and Bulk API throughput calculations. What critical analysis is missing?

The team should also calculate the network bandwidth required to transfer 3 million records to Salesforce's servers

Data profiling of the source data — without knowing completeness rates, data quality issues, and referential integrity violations, the estimate cannot account for the cleansing and transformation work that commonly doubles migration timelines

The team needs to confirm Salesforce's storage capacity for 3 million Account records before committing to a timeline

2. During a big-bang migration, the load completes but post-load validation reveals that 8% of Opportunity records are missing their linked Account — the Account foreign key is null. The source system has already been frozen. What is the root cause and correct resolution path?

The Bulk API failed to process 8% of the Opportunity records — re-running the Opportunity load job will resolve the missing links

The transformation stage failed to resolve source Account IDs to Salesforce Account IDs for 8% of Opportunities — this is a lookup resolution failure. Fix the ID mapping in the transformation stage and re-run the Opportunity load with corrected records.

Salesforce deleted the Account links due to a deduplication rule — disable duplicate management and re-insert the Opportunity records

3. A phased migration migrates Accounts and Contacts in Phase 1 (completed 4 weeks ago). Phase 2 will migrate Opportunities. During the 4-week gap, users have created new Accounts in Salesforce. What must the Phase 2 migration plan account for?

Phase 2 can proceed normally — Opportunities only link to pre-migration Accounts which all exist in Salesforce from Phase 1

Opportunities created in the source system during the 4-week gap may reference Accounts that were also created in the source during that period — these new Accounts may not be in Salesforce. Phase 2 must include a delta Account load for new Accounts created since Phase 1 cutover before loading Opportunities.

Opportunities should not be linked to newly created Salesforce Accounts — they should use a placeholder Account until the migration is fully complete

Data Migration Strategy: The 6-Week vs 6-Month Debate

What Makes a Migration Complex

Data Profiling Before Migration

Migration Execution Strategies

The Migration Pipeline Architecture

Cutover Planning and Delta Sync

Data Quality: Fix at Source or Target

Key Takeaways

Test Your Understanding

Continue Reading

Heroku Connect: When It's the Right Answer and When It Isn't

Real-Time Integration Patterns: Platform Events, CDC, and Streaming API

External Objects and Salesforce Connect: Federation Without Migration

Discussion & Feedback