Legacy System Integration: The Patterns That Actually Work

What you will learn...

What makes a legacy system "legacy" from an integration perspective
The anti-corruption layer pattern: why it exists and how to implement it
File-based integration: the pattern that outlasts everything else
Database-level integration: when and why it's sometimes the only option
The strangler fig pattern for gradually replacing legacy integrations
Operational considerations for integration with systems that cannot be changed

What Makes a System Legacy from an Integration Perspective

A system is "legacy" from an integration perspective when it has one or more of these characteristics: it cannot be modified to add new API endpoints or event publishing capabilities, it runs protocols that modern systems do not natively support (IBM MQ, AS/400 data queues, mainframe CICS transactions, SFTP file drops), it has undocumented data structures that require reverse engineering to map, or it has availability windows that preclude real-time integration. All of these constraints are common. The integration architecture must accommodate the legacy system as it is, not as you wish it were.

The most common legacy integration scenarios in Salesforce projects are: SAP ERP (primarily R/3 or S/4 with IDOCs, RFC, and BAPI interfaces), IBM Mainframe systems (COBOL-based, file-based or MQ-based), Oracle EBS/PeopleSoft (SOAP-heavy, with proprietary message structures), and older Siebel CRM systems (the ironic common scenario of migrating from Siebel to Salesforce while keeping Siebel running in parallel for years). Each has specific integration characteristics that determine the appropriate pattern.

The key principle for legacy integration architecture: minimize the blast radius of the legacy system's constraints. If SAP only supports batch IDOC files, that batch file interface should terminate as early as possible in the integration stack — ideally at a transformation layer that converts IDOCs to standard REST/JSON for the rest of the integration. Allowing SAP-specific data structures to flow all the way through to Salesforce creates a fragile coupling that makes future system replacement expensive.

💡

The legacy system is not your fault, but the integration debt is: Legacy system constraints are inherited technical reality. But the integration architecture you build around them is your choice. A well-designed anti-corruption layer around a mainframe interface isolates the legacy constraints to a single adapter. A poorly designed direct coupling spreads those constraints across every system that touches Salesforce.

The Anti-Corruption Layer Pattern

The anti-corruption layer (ACL) is a translation layer between a legacy system and modern systems that prevents the legacy system's data model and semantics from "corrupting" the clean domain model of the modern system. In Salesforce integration architecture, the ACL sits between the legacy system's interface (IDOC, flat file, SOAP web service) and the Salesforce API, translating in both directions.

An SAP-to-Salesforce ACL does three things: it receives data from SAP in SAP's native format (IDOC XML, typically), it transforms the SAP data model to the Salesforce data model (mapping SAP Business Partner fields to Salesforce Account fields, handling SAP-specific code values and reference tables), and it calls the Salesforce REST API with the transformed data. In the opposite direction, it receives Salesforce data via Platform Events or Apex callouts, transforms it to SAP's expected format, and calls SAP via RFC or writes to SAP's inbound interface queues.

MuleSoft is the most commonly used platform for ACL implementation at enterprise scale — its DataWeave transformation language and SAP connector make the translation layer maintainable. For simpler scenarios, a lightweight Node.js or Python microservice can serve as the ACL without the MuleSoft infrastructure cost. The key requirement is that the ACL is the only system that knows about both the legacy and modern data models — if Salesforce Apex code starts importing SAP-specific data structures, the anti-corruption boundary has been violated.

// DataWeave ACL transformation: SAP IDOC Customer to Salesforce Account
%dw 2.0
output application/json
---
{
  "records": payload.IDOC.E1KNA1M map (customer) -> {
    "Name": customer.NAME1 ++ (
      if (customer.NAME2 != null) " " ++ customer.NAME2 else ""
    ),
    "External_SAP_ID__c": customer.KUNNR,
    "BillingStreet": customer.STRAS,
    "BillingCity": customer.ORT01,
    "BillingPostalCode": customer.PSTLZ,
    "BillingCountry": lookupCountryCode(customer.LAND1),
    "Phone": customer.TELF1,
    "Industry": mapSapIndustry(customer.BRSCH)
  }
}

File-Based Integration: The Durable Pattern

File-based integration — where data is exchanged as structured files (CSV, XML, JSON, fixed-width) written to and read from a shared location (SFTP server, S3 bucket, network share) — is the oldest integration pattern and still the most widely used in legacy system integration. Its durability is earned: files are visible, auditable, retryable, and independent of the availability of either system at the moment of exchange.

The SFTP-based file drop pattern works as follows: the legacy system generates a file of changed records on a schedule and deposits it to an SFTP server. The integration layer (MuleSoft SFTP connector, a Lambda function, a Python script) picks up the file, transforms it, and loads it to Salesforce via Bulk API. Both the file drop and the transformation-load step are asynchronous — neither system needs to be available simultaneously, and failures in the loading stage leave the file intact for retry.

File integrity is the critical design consideration. Files must be delivered completely before the integration layer processes them. The common pattern for atomicity is write-then-rename: the legacy system writes the file with a .tmp extension, and renames it to .csv only when the write is complete. The integration layer processes only .csv files, ignoring .tmp files. This prevents the integration from processing a partially written file that arrives during a legacy system write operation.

Database-Level Integration: When and Why

Direct database integration — connecting to the legacy system's database directly (via JDBC for Oracle/SQL Server, via DB2 connectors for IBM systems) rather than through an API — is sometimes the only available interface. Many legacy systems do not expose APIs and cannot be modified to add them. The only way to extract or inject data is directly through the database. This pattern is used for read-heavy scenarios (extracting historical data for migration, populating a data warehouse) more than for operational bidirectional sync.

Database-level integration bypasses the legacy application's business logic, validation rules, and audit trail. Writing records directly to a legacy system's database — even if technically possible — creates data that the application layer is unaware of and has not validated. This can break the application's referential integrity, trigger rules, and consistency guarantees. Direct database writes to production legacy systems should be treated as a last resort, never a design pattern, and should always involve the legacy system's vendor or owner to confirm what is safe.

For read-only extraction, direct database access is safer but still carries risks: it adds load to the production database, it exposes the legacy system's internal data model (which may change without notice), and it bypasses the security model of the application layer. Use read replicas (database secondaries) for extraction wherever available. Never run heavy ETL queries against a primary production database during business hours.

The Strangler Fig: Gradual Replacement

The strangler fig pattern (coined by Martin Fowler) describes a strategy for gradually replacing a legacy system by building new functionality around it until the legacy system can be decommissioned. In Salesforce integration architecture, this often applies when the goal is to replace a legacy CRM or ERP with Salesforce over a multi-year timeline — not in a single big-bang migration, but by incrementally moving capabilities and data to Salesforce while keeping the legacy system running for the functions not yet migrated.

The integration architecture during a strangler fig migration must support a hybrid state: some customer data in Salesforce, some in the legacy system, with a synchronisation layer maintaining consistency between them. The integration complexity during this phase is significant — conflict resolution, data ownership routing (which system is authoritative for which records), and the operational overhead of maintaining two live systems simultaneously. This is the cost of the lower-risk migration approach.

The exit criterion for strangler fig is clear: a function is fully moved to Salesforce when both the data and the business process have been migrated, no users are accessing that function in the legacy system, and the integration point for that function can be decommissioned. Tracking migration completeness by function rather than by record count is the correct metric — a system is not replaced until its functions are replaced, not just its data.

💡

Document the legacy interface contract formally: Legacy systems often have undocumented interfaces that only specific individuals understand. Before designing the integration, document the interface contract formally: what data the legacy system produces, what format it uses, what the timing is, what error conditions it can return, and who owns it. This documentation protects the integration team when the legacy system changes — and it always changes.

Key Takeaways

Legacy systems are defined by integration constraints: no API capability, unsupported protocols (SFTP, IBM MQ, RFC), undocumented data structures, or restricted availability windows. The architecture must accommodate these constraints as-is.
The anti-corruption layer (ACL) pattern isolates legacy data models and semantics from the Salesforce domain model. The ACL is the only component that knows both data models — Salesforce code should never import legacy-specific data structures directly.
File-based integration (SFTP drop, S3, network share) is the most durable pattern for legacy systems that cannot expose real-time APIs. Use write-then-rename file atomicity to prevent partial file processing.
Direct database integration bypasses the legacy application's business logic and should be used only for read-only extraction, on read replicas, during off-peak hours. Direct writes to legacy production databases are a last resort, not a design pattern.
The strangler fig pattern enables gradual legacy system replacement without big-bang risk, but requires a hybrid-state integration architecture with conflict resolution and data ownership routing during the transition period.
Document the legacy interface contract formally before implementation — including format, timing, error conditions, and ownership — to protect against undocumented changes and knowledge loss when legacy system owners change.

Test Your Understanding

1. A Salesforce integration with SAP is designed so that SAP's IDOC Customer segment structure is directly mapped to Salesforce custom fields — the Salesforce Account object has fields named "KUNNR", "LAND1", and "BRSCH" to match SAP field names. What architectural problem does this create?

SAP field names exceed Salesforce's 40-character field name limit — this is a technical error that will prevent deployment

The SAP data model has "corrupted" the Salesforce domain model — SAP-specific terminology is now embedded in Salesforce, violating the anti-corruption principle. Any future SAP field rename or SAP decommission requires Salesforce field changes. A clean ACL would map KUNNR to External_SAP_ID__c, LAND1 to BillingCountry, and BRSCH to Industry.

Using SAP field names in Salesforce improves maintainability by making the mapping explicit — this is actually a best practice for SAP-Salesforce integration

2. A legacy system drops CSV files to an SFTP server every hour. The integration team observes that occasionally a partial file is picked up and processed, creating incomplete data in Salesforce. What is the correct fix?

Increase the polling interval on the SFTP connector from 1 minute to 15 minutes to ensure files are fully written before pickup

Implement write-then-rename atomicity: the legacy system writes files with a .tmp extension and renames to .csv only when the write is complete. The integration connector processes only .csv files, eliminating the partial file pickup problem.

Add a record count footer line to the CSV file and validate the count before processing — reject files where the count doesn't match

3. During a strangler fig migration from a legacy CRM to Salesforce, 40% of customer records are now managed in Salesforce and 60% remain in the legacy CRM. A new marketing campaign requires segmenting ALL customers. What does the integration architecture need to support?

Pause the campaign until 100% of customers are migrated to Salesforce — cross-system segmentation is architecturally impractical

A unified view of customers across both systems — either by reading from both Salesforce and the legacy CRM into a shared analytics layer (data warehouse or Data Cloud), or by temporarily importing legacy CRM customer IDs into Salesforce as read-only records with a "legacy" source flag for the duration of the campaign

Run two separate campaigns — one from Salesforce for the 40% migrated customers and one from the legacy CRM for the remaining 60%

Legacy System Integration: The Patterns That Actually Work

What Makes a System Legacy from an Integration Perspective

The Anti-Corruption Layer Pattern

File-Based Integration: The Durable Pattern

Database-Level Integration: When and Why

The Strangler Fig: Gradual Replacement

Key Takeaways

Test Your Understanding

Continue Reading

iPaaS Platforms and Salesforce: Boomi, Workato, Zapier at Enterprise Scale

Data Cloud Unification: Identity Resolution at Scale

JSON and XML Processing in Apex: Performance Trade-offs

Discussion & Feedback