- What makes a legacy system "legacy" from an integration perspective
- The anti-corruption layer pattern: why it exists and how to implement it
- File-based integration: the pattern that outlasts everything else
- Database-level integration: when and why it's sometimes the only option
- The strangler fig pattern for gradually replacing legacy integrations
- Operational considerations for integration with systems that cannot be changed
What Makes a System Legacy from an Integration Perspective
A system is "legacy" from an integration perspective when it has one or more of these characteristics: it cannot be modified to add new API endpoints or event publishing capabilities, it runs protocols that modern systems do not natively support (IBM MQ, AS/400 data queues, mainframe CICS transactions, SFTP file drops), it has undocumented data structures that require reverse engineering to map, or it has availability windows that preclude real-time integration. All of these constraints are common. The integration architecture must accommodate the legacy system as it is, not as you wish it were.
The most common legacy integration scenarios in Salesforce projects are: SAP ERP (primarily R/3 or S/4 with IDOCs, RFC, and BAPI interfaces), IBM Mainframe systems (COBOL-based, file-based or MQ-based), Oracle EBS/PeopleSoft (SOAP-heavy, with proprietary message structures), and older Siebel CRM systems (the ironic common scenario of migrating from Siebel to Salesforce while keeping Siebel running in parallel for years). Each has specific integration characteristics that determine the appropriate pattern.
The key principle for legacy integration architecture: minimize the blast radius of the legacy system's constraints. If SAP only supports batch IDOC files, that batch file interface should terminate as early as possible in the integration stack — ideally at a transformation layer that converts IDOCs to standard REST/JSON for the rest of the integration. Allowing SAP-specific data structures to flow all the way through to Salesforce creates a fragile coupling that makes future system replacement expensive.
The Anti-Corruption Layer Pattern
The anti-corruption layer (ACL) is a translation layer between a legacy system and modern systems that prevents the legacy system's data model and semantics from "corrupting" the clean domain model of the modern system. In Salesforce integration architecture, the ACL sits between the legacy system's interface (IDOC, flat file, SOAP web service) and the Salesforce API, translating in both directions.
An SAP-to-Salesforce ACL does three things: it receives data from SAP in SAP's native format (IDOC XML, typically), it transforms the SAP data model to the Salesforce data model (mapping SAP Business Partner fields to Salesforce Account fields, handling SAP-specific code values and reference tables), and it calls the Salesforce REST API with the transformed data. In the opposite direction, it receives Salesforce data via Platform Events or Apex callouts, transforms it to SAP's expected format, and calls SAP via RFC or writes to SAP's inbound interface queues.
MuleSoft is the most commonly used platform for ACL implementation at enterprise scale — its DataWeave transformation language and SAP connector make the translation layer maintainable. For simpler scenarios, a lightweight Node.js or Python microservice can serve as the ACL without the MuleSoft infrastructure cost. The key requirement is that the ACL is the only system that knows about both the legacy and modern data models — if Salesforce Apex code starts importing SAP-specific data structures, the anti-corruption boundary has been violated.
// DataWeave ACL transformation: SAP IDOC Customer to Salesforce Account
%dw 2.0
output application/json
---
{
"records": payload.IDOC.E1KNA1M map (customer) -> {
"Name": customer.NAME1 ++ (
if (customer.NAME2 != null) " " ++ customer.NAME2 else ""
),
"External_SAP_ID__c": customer.KUNNR,
"BillingStreet": customer.STRAS,
"BillingCity": customer.ORT01,
"BillingPostalCode": customer.PSTLZ,
"BillingCountry": lookupCountryCode(customer.LAND1),
"Phone": customer.TELF1,
"Industry": mapSapIndustry(customer.BRSCH)
}
}
File-Based Integration: The Durable Pattern
File-based integration — where data is exchanged as structured files (CSV, XML, JSON, fixed-width) written to and read from a shared location (SFTP server, S3 bucket, network share) — is the oldest integration pattern and still the most widely used in legacy system integration. Its durability is earned: files are visible, auditable, retryable, and independent of the availability of either system at the moment of exchange.
The SFTP-based file drop pattern works as follows: the legacy system generates a file of changed records on a schedule and deposits it to an SFTP server. The integration layer (MuleSoft SFTP connector, a Lambda function, a Python script) picks up the file, transforms it, and loads it to Salesforce via Bulk API. Both the file drop and the transformation-load step are asynchronous — neither system needs to be available simultaneously, and failures in the loading stage leave the file intact for retry.
File integrity is the critical design consideration. Files must be delivered completely before the integration layer processes them. The common pattern for atomicity is write-then-rename: the legacy system writes the file with a .tmp extension, and renames it to .csv only when the write is complete. The integration layer processes only .csv files, ignoring .tmp files. This prevents the integration from processing a partially written file that arrives during a legacy system write operation.
Database-Level Integration: When and Why
Direct database integration — connecting to the legacy system's database directly (via JDBC for Oracle/SQL Server, via DB2 connectors for IBM systems) rather than through an API — is sometimes the only available interface. Many legacy systems do not expose APIs and cannot be modified to add them. The only way to extract or inject data is directly through the database. This pattern is used for read-heavy scenarios (extracting historical data for migration, populating a data warehouse) more than for operational bidirectional sync.
Database-level integration bypasses the legacy application's business logic, validation rules, and audit trail. Writing records directly to a legacy system's database — even if technically possible — creates data that the application layer is unaware of and has not validated. This can break the application's referential integrity, trigger rules, and consistency guarantees. Direct database writes to production legacy systems should be treated as a last resort, never a design pattern, and should always involve the legacy system's vendor or owner to confirm what is safe.
For read-only extraction, direct database access is safer but still carries risks: it adds load to the production database, it exposes the legacy system's internal data model (which may change without notice), and it bypasses the security model of the application layer. Use read replicas (database secondaries) for extraction wherever available. Never run heavy ETL queries against a primary production database during business hours.
The Strangler Fig: Gradual Replacement
The strangler fig pattern (coined by Martin Fowler) describes a strategy for gradually replacing a legacy system by building new functionality around it until the legacy system can be decommissioned. In Salesforce integration architecture, this often applies when the goal is to replace a legacy CRM or ERP with Salesforce over a multi-year timeline — not in a single big-bang migration, but by incrementally moving capabilities and data to Salesforce while keeping the legacy system running for the functions not yet migrated.
The integration architecture during a strangler fig migration must support a hybrid state: some customer data in Salesforce, some in the legacy system, with a synchronisation layer maintaining consistency between them. The integration complexity during this phase is significant — conflict resolution, data ownership routing (which system is authoritative for which records), and the operational overhead of maintaining two live systems simultaneously. This is the cost of the lower-risk migration approach.
The exit criterion for strangler fig is clear: a function is fully moved to Salesforce when both the data and the business process have been migrated, no users are accessing that function in the legacy system, and the integration point for that function can be decommissioned. Tracking migration completeness by function rather than by record count is the correct metric — a system is not replaced until its functions are replaced, not just its data.
Key Takeaways
- Legacy systems are defined by integration constraints: no API capability, unsupported protocols (SFTP, IBM MQ, RFC), undocumented data structures, or restricted availability windows. The architecture must accommodate these constraints as-is.
- The anti-corruption layer (ACL) pattern isolates legacy data models and semantics from the Salesforce domain model. The ACL is the only component that knows both data models — Salesforce code should never import legacy-specific data structures directly.
- File-based integration (SFTP drop, S3, network share) is the most durable pattern for legacy systems that cannot expose real-time APIs. Use write-then-rename file atomicity to prevent partial file processing.
- Direct database integration bypasses the legacy application's business logic and should be used only for read-only extraction, on read replicas, during off-peak hours. Direct writes to legacy production databases are a last resort, not a design pattern.
- The strangler fig pattern enables gradual legacy system replacement without big-bang risk, but requires a hybrid-state integration architecture with conflict resolution and data ownership routing during the transition period.
- Document the legacy interface contract formally before implementation — including format, timing, error conditions, and ownership — to protect against undocumented changes and knowledge loss when legacy system owners change.
Test Your Understanding
1. A Salesforce integration with SAP is designed so that SAP's IDOC Customer segment structure is directly mapped to Salesforce custom fields — the Salesforce Account object has fields named "KUNNR", "LAND1", and "BRSCH" to match SAP field names. What architectural problem does this create?
2. A legacy system drops CSV files to an SFTP server every hour. The integration team observes that occasionally a partial file is picked up and processed, creating incomplete data in Salesforce. What is the correct fix?
3. During a strangler fig migration from a legacy CRM to Salesforce, 40% of customer records are now managed in Salesforce and 60% remain in the legacy CRM. A new marketing campaign requires segmenting ALL customers. What does the integration architecture need to support?
Discussion & Feedback