← Back to Integration & Data
INTG-003 Integration & Data 20 min read For: Salesforce Architects & Tech Leaders

ETL for Salesforce: Talend, Informatica, and the Alternatives

Every organisation that runs Salesforce alongside a data warehouse or ERP system will eventually need an ETL layer. The choice of tool shapes integration architecture for years. This tutorial covers what actually matters when selecting and designing ETL for Salesforce — beyond vendor marketing.

VS

Vishal Sharma

Salesforce Architecture Specialist · Updated May 2026

What you will learn...
  • What ETL means in a Salesforce context versus traditional data warehousing ETL
  • How Informatica and Talend compare for Salesforce integration workloads
  • The emerging ELT pattern and why it matters for Salesforce data pipelines
  • Where simpler alternatives (Fivetran, dbt, custom Python) beat enterprise ETL tools
  • Key architectural decisions: change data capture, delta processing, and error handling
  • The Salesforce API constraints that every ETL design must respect

ETL in a Salesforce Context

ETL (Extract, Transform, Load) in the Salesforce world covers two distinct problems that are often conflated. The first is operational integration — moving data between Salesforce and other operational systems (ERP, order management, billing) on a scheduled or near-real-time basis. The second is analytical integration — extracting Salesforce data to a data warehouse or data lake for reporting and analytics. The tool choice and design patterns differ substantially between these two use cases.

Operational ETL requires bidirectional data flow, low latency (often sub-hourly), robust error handling with business-logic-aware retry, and conflict resolution when the same record is updated in both systems. Analytical ETL is typically unidirectional (Salesforce to warehouse), tolerant of higher latency, and focused on completeness and consistency rather than real-time availability.

The Salesforce API is the boundary condition for all ETL designs. Whether using Informatica, Talend, Fivetran, or a custom script, data must enter and exit Salesforce through one of its APIs — REST, SOAP, Bulk API 2.0, or the Streaming API. API rate limits, governor limits, and API version compatibility are constraints that apply regardless of what ETL tool sits above them. An ETL design that ignores Salesforce API limits will fail in production regardless of how well the ETL tool itself is configured.

💡
ETL tool selection is often the wrong first decision: Organisations spend weeks evaluating ETL tools before defining data volumes, latency requirements, and API limit budgets. Define these constraints first — a 5-million-record nightly sync from Salesforce to Snowflake has completely different tool requirements than a bidirectional real-time sync with an ERP system.

Informatica for Salesforce

Informatica PowerCenter and Informatica Cloud (IICS) are the most widely deployed enterprise ETL tools in Salesforce implementations at large organisations. Informatica's Salesforce connector is mature — it handles the full Salesforce object model, supports bulk API extraction, manages OAuth authentication automatically, and has built-in delta detection using the SystemModstamp field or Change Data Capture.

Informatica IICS (the cloud-hosted version) has largely supplanted PowerCenter for new Salesforce integrations. IICS supports real-time integration through its Application Integration capability (event-triggered mappings) alongside batch ETL through Data Integration. The licensing model is consumption-based in IICS, which can be significantly cheaper than PowerCenter for organisations with moderate data volumes.

Informatica's strengths for Salesforce are its metadata-driven mapping approach (mappings are maintained visually without code), its enterprise-grade lineage and impact analysis, and its handling of Salesforce's relationship model — parent-child objects, lookup fields, and external ID-based upsert operations are handled natively without custom scripting. Its weakness is cost — Informatica licensing is among the most expensive in the ETL market, and the complexity of IICS for simple use cases is often disproportionate to the value it adds.

Talend and the Open Source Alternative

Talend Open Studio is the open-source ETL platform most commonly used as an alternative to Informatica at cost-constrained organisations. Talend's Salesforce connector covers the main API types — SOAP API for small datasets, Bulk API for high volumes — and the component library for transformation is extensive. The Talend Salesforce components handle upsert operations using External ID fields, batch size optimisation for the Bulk API, and OAuth credential management.

Talend's limitation in the Salesforce context is its Java code generation model. Talend compiles each job into executable Java code, which gives it good runtime performance but makes jobs opaque to non-developers during debugging and maintenance. When a Talend job fails during a bulk upsert due to a field validation error on 3% of records, diagnosing and fixing the rejection without Java knowledge is difficult. Organisations using Talend for Salesforce typically need a developer-level resource to maintain the integration, not just an integration administrator.

Talend Cloud (the SaaS-hosted version) has improved the operational experience significantly, adding monitoring, scheduling, and a more modern UI. But even Talend Cloud carries per-user licensing costs that make it non-trivial for smaller Salesforce integrations. The sweet spot for Talend is organisations already standardised on it for non-Salesforce data warehousing that want to extend it to Salesforce without introducing another tool.

// Salesforce Bulk API 2.0 query — what ETL tools use under the hood
// Step 1: Create a query job
POST /services/data/v60.0/jobs/query
{
  "operation": "query",
  "query": "SELECT Id, AccountId, Name, Amount, CloseDate, StageName,
             SystemModstamp FROM Opportunity
             WHERE SystemModstamp > 2026-05-18T00:00:00Z",
  "contentType": "CSV",
  "columnDelimiter": "COMMA",
  "lineEnding": "LF"
}
// Step 2: Poll for completion, then download result batches
// This is the pattern Informatica/Talend abstract into their connectors

The ELT Alternative: Fivetran and dbt

ELT (Extract, Load, Transform) inverts the traditional ETL order — data is extracted from Salesforce and loaded raw into a data warehouse (Snowflake, BigQuery, Databricks, Redshift), and transformation happens inside the warehouse using SQL. This pattern has largely replaced traditional ETL for analytical use cases, and for many Salesforce-to-warehouse scenarios it is architecturally simpler and cheaper than Informatica or Talend.

Fivetran is the dominant ELT connector for Salesforce-to-warehouse. It uses the Salesforce Bulk API to extract objects incrementally using SystemModstamp-based delta queries, handles schema discovery automatically (new fields added to Salesforce objects are detected and added to the warehouse schema), and loads raw data to the warehouse typically within 15-60 minutes of a record change. Fivetran's pricing is per monthly active rows — for many organisations, this is significantly cheaper than Informatica for analytical workloads.

dbt (data build tool) handles the transformation layer on top of Fivetran's raw Salesforce data. dbt models are SQL SELECT statements that transform raw Salesforce tables into dimensional models, business metrics, and reporting-ready entities. The combination of Fivetran + dbt has become a standard analytics stack for Salesforce customers that rivals the functionality of Informatica at a fraction of the cost for analytics-only use cases.

⚠️
ELT is not appropriate for operational integration: Fivetran + dbt is excellent for Salesforce to data warehouse analytics. It is not appropriate for bidirectional operational integration where data must flow back to Salesforce, where real-time latency matters, or where conflict resolution between systems is required. Use the right pattern for the use case — ELT for analytics, ETL or event-driven integration for operational use cases.

Delta Detection and Change Tracking

Every ETL design for Salesforce must solve the delta problem — how to extract only records that have changed since the last extraction, rather than re-extracting the entire dataset. For large orgs with millions of records, full re-extraction is impractical — a full extract of 10 million Opportunity records takes hours and consumes significant API allocations. Delta extraction based on change detection reduces extraction time to minutes and preserves API budget for operational integrations.

Salesforce provides three delta mechanisms: SystemModstamp filtering (extract all records WHERE SystemModstamp > last_run_timestamp), Change Data Capture (subscribe to change events published by Salesforce as records are modified), and the Replication API (a newer capability for efficient bulk extraction with built-in change tracking). SystemModstamp filtering is the most widely supported across ETL tools and the simplest to implement, but it has an edge case: records deleted from Salesforce are not captured by SystemModstamp queries. Deletions require separate handling — either querying the RecycleBin, maintaining a delete-capture mechanism via CDC, or accepting that deletions are not tracked in the warehouse.

Change Data Capture (CDC) is the most comprehensive delta mechanism — it captures creates, updates, deletes, and undeletes as structured events on the Salesforce Streaming API. The challenge is that CDC events are delivered via a long-polling subscription (CometD or Pub/Sub API) that requires a persistent connection, which does not fit the batch-oriented ETL model. Most ETL tools that claim CDC support implement it by polling the event log rather than maintaining a true streaming subscription.

Error Handling and Partial Failure Patterns

Salesforce bulk operations use partial success semantics — if you submit 50,000 records for upsert and 2,000 fail validation, the remaining 48,000 succeed. This is different from all-or-nothing database transaction semantics. ETL designs must handle partial failures explicitly: capturing the failed records, logging the error reasons (which Salesforce returns in the job result files), and routing them to a dead-letter queue or error remediation workflow.

The most common error categories in Salesforce ETL are validation rule failures (data from the source system violates a Salesforce validation rule), duplicate management blocks (a deduplication rule prevents the record from being created), and field-level security failures (the integration user lacks write access to a field that the source record is trying to update). All three categories require different remediation approaches and should be tracked separately in the ETL monitoring layer.

Idempotency is a critical design principle for Salesforce ETL — the ability to re-run a failed job without creating duplicate records. External ID fields are the primary mechanism. If every source record carries a stable External ID value, Salesforce upsert operations using that External ID are idempotent — running the same upsert twice creates one record, not two. Design source-to-Salesforce mappings with an External ID strategy before any ETL implementation begins.

Key Takeaways

  • ETL for Salesforce splits into operational integration (bidirectional, low-latency, conflict-aware) and analytical integration (unidirectional to warehouse, latency-tolerant). Tool selection should follow use case, not brand preference.
  • Informatica IICS is the most mature enterprise ETL for Salesforce but carries high licensing and complexity costs. Best for large organisations already standardised on Informatica or with complex bidirectional integration requirements.
  • Talend is a lower-cost alternative with strong Salesforce connector support, but its Java code generation model requires developer-level maintenance skills — not suitable as a no-code integration tool.
  • Fivetran + dbt has become the standard ELT stack for Salesforce analytics. Much cheaper than Informatica for analytics-only workloads but not appropriate for operational bidirectional integration.
  • Delta detection must use SystemModstamp filtering, CDC, or the Replication API — with explicit handling for deletes, which SystemModstamp filtering does not capture.
  • Idempotency via External ID fields and partial failure handling via dead-letter queues are non-negotiable design requirements for any Salesforce ETL implementation.

Test Your Understanding

1. An organisation needs to sync Salesforce Opportunity data to Snowflake for dashboards. Data freshness within 1 hour is acceptable. They have no existing ETL tooling. What is the most architecturally appropriate and cost-effective approach?

Deploy Informatica IICS with a Salesforce-to-Snowflake mapping for enterprise-grade reliability and full metadata lineage
Use Fivetran for Salesforce-to-Snowflake ELT with dbt for transformation — this is the appropriate pattern for analytics-only, latency-tolerant sync at significantly lower cost than enterprise ETL
Write a custom Python script using the Salesforce REST API and a Snowflake connector, deployed on a cron schedule

2. An ETL job upserts 100,000 Account records to Salesforce. The job completes but the ETL tool reports 3,200 failures. What is the most important first diagnostic step?

Re-run the entire job immediately — transient Salesforce errors often resolve on retry
Download the Bulk API job result files for the failed records — Salesforce returns specific error codes and field-level messages for each failed record that identify whether failures are validation rules, duplicate blocks, or field-level security issues
Check the Salesforce API limit dashboard to confirm the failures were caused by rate limiting

3. A nightly ETL job extracts changed Accounts from Salesforce using SystemModstamp filtering. The data warehouse team notices that Accounts deleted in Salesforce are still appearing in warehouse reports. What is the cause and solution?

Salesforce does not expose deleted records via any API — the only solution is to prohibit record deletion in Salesforce
SystemModstamp filtering does not capture deletions — deleted records must be tracked separately by querying the Recycle Bin, subscribing to Account CDC delete events, or maintaining a soft-delete flag on Account records
The ETL tool has a bug in its delta detection — upgrade to the latest version to fix the deletion capture gap

Discussion & Feedback