- How Bulk API 2.0 differs from the original Bulk API 1.0 and when to use each
- The complete job lifecycle: create, upload, close, poll, and download results
- DML operations supported: insert, update, upsert, delete, and hardDelete
- Performance tuning: batch sizing, parallel job limits, and throughput optimisation
- Trigger, workflow, and process automation behavior during bulk operations
- Query jobs and how to use Bulk API for large-scale data extraction
Bulk API 2.0 vs 1.0 — What Changed
The original Bulk API 1.0 was introduced in 2008 and used XML batches with a complex XML job management protocol. Bulk API 2.0, introduced with API version 41.0, simplified the interface substantially: CSV data format only (no XML), a cleaner REST job lifecycle, and automatic batch management (the API handles batch size internally rather than requiring the client to manage it). For new integrations, Bulk API 2.0 is the correct choice. The original Bulk API 1.0 should only be used for legacy system compatibility.
The fundamental architecture of both versions is the same: data operations are processed asynchronously in Salesforce's background processing infrastructure, separate from the synchronous governor-limit-governed execution context. This is why Bulk API can process 10 million records when an Apex DML statement is limited to 10,000 records per transaction — Bulk API jobs run outside the synchronous transaction model entirely.
Bulk API 2.0 uses the standard OAuth 2.0 Bearer Token for authentication (same token used for REST API calls). The job lifecycle involves creating a job, uploading CSV data, closing the job to signal completion, polling for status, and downloading success and error result files. This lifecycle is straightforward to implement in any HTTP client and is well-supported by all major ETL and integration tools.
The Complete Job Lifecycle
A Bulk API 2.0 ingest job follows a deterministic lifecycle that integrations must implement correctly. The four phases are: create the job (specifying the object, operation, and content type), upload data (upload CSV content to the job), close the job (signal that upload is complete and processing should begin), and monitor and retrieve results (poll the job until state is JobComplete or Failed, then download successfulResults and failedResults CSVs).
Job creation returns a job ID. Data is uploaded to the job's upload URL — a single CSV upload up to 100MB or 150 million characters, whichever is smaller. If the dataset exceeds 100MB, the data must be split across multiple jobs. The CSV must use UTF-8 encoding, use the field API names as headers, and include an Id field for update/delete operations or an External ID field for upsert. Omitting required headers causes the job to fail with a cryptic error.
After closing the job, Salesforce processes records asynchronously. Processing time depends on org load, record complexity (triggers, workflows, formula fields, roll-up summaries), and whether the operation is an insert (no ID lookup) or upsert (requires ID resolution for each record). Jobs against heavily-automated objects take significantly longer than against simple custom objects with no automation.
// Bulk API 2.0 — complete ingest job lifecycle
// Step 1: Create job
POST /services/data/v60.0/jobs/ingest
{
"object": "Account",
"operation": "upsert",
"externalIdFieldName": "External_ID__c",
"contentType": "CSV",
"lineEnding": "LF"
}
// Response: { "id": "7503g00000A9XXXXX", "state": "Open", ... }
// Step 2: Upload CSV data
PUT /services/data/v60.0/jobs/ingest/7503g00000A9XXXXX/batches
Content-Type: text/csv
External_ID__c,Name,BillingCity,BillingCountry,Industry
EXT-001,Acme Corp,San Francisco,US,Technology
EXT-002,Globex Corp,Springfield,US,Manufacturing
// Step 3: Close job to start processing
PATCH /services/data/v60.0/jobs/ingest/7503g00000A9XXXXX
{ "state": "UploadComplete" }
// Step 4: Poll for completion
GET /services/data/v60.0/jobs/ingest/7503g00000A9XXXXX
// Response: { "state": "JobComplete", "numberRecordsProcessed": 2,
// "numberRecordsFailed": 0 }
// Step 5: Download results
GET /services/data/v60.0/jobs/ingest/7503g00000A9XXXXX/successfulResults
GET /services/data/v60.0/jobs/ingest/7503g00000A9XXXXX/failedResults
Triggers, Automation, and AllOrNone Semantics
By default, Bulk API 2.0 processes records in serial mode and fires all Apex triggers, Flow record triggers, workflow rules, and process builders on each record. This is a critical architectural consideration — an org with complex automation on Account or Opportunity objects may find that a 500,000-record bulk upsert fires half a million trigger executions and causes significant background processing load that affects real-time users.
Bulk API 1.0 supported a "Parallel" processing mode that split data into batches processed concurrently but disabled triggers on secondary batches in some configurations. Bulk API 2.0 does not have this mode — it always processes serially. If automation must be bypassed during bulk operations, the correct pattern is a custom setting or feature flag in the trigger/flow that the integration user activates before bulk processing and deactivates afterward. Never use the "All or None" option for large operations — the Bulk API uses partial success semantics by design, and an error in one record should not roll back thousands of successfully processed records.
Validation rules, duplicate management rules, and field-level security all apply during bulk processing exactly as they do for interactive users. This is by design — the Bulk API is a data operation against the production org, not a backdoor. Test bulk operations against a sandbox with production-equivalent automation before running in production to identify automation-driven failures before they affect production data.
Query Jobs for Large-Scale Extraction
Bulk API 2.0 query jobs enable large-scale data extraction using SOQL queries without hitting the standard SOQL row limit (50,000 records via REST API). A query job accepts a SOQL SELECT statement and returns results as paginated CSV downloads. The SOQL can select from any accessible object and include WHERE clauses, ORDER BY, and field traversal — though deep relationship queries and aggregate functions are not supported in bulk query mode.
Query job results are returned as one or more result sets that must be paginated through using a nextRecordsUrl. For very large objects (50+ million records), result sets can be gigabytes in size. Integration designs should stream result sets directly to the destination (data warehouse, file storage) rather than loading them entirely into memory. Most enterprise ETL tools handle Bulk API query result pagination automatically.
// Bulk API 2.0 Query Job
POST /services/data/v60.0/jobs/query
{
"operation": "query",
"query": "SELECT Id, Name, AnnualRevenue, Industry, BillingCountry,
OwnerId, CreatedDate, LastModifiedDate
FROM Account
WHERE LastModifiedDate >= 2026-05-01T00:00:00Z
ORDER BY LastModifiedDate ASC",
"contentType": "CSV"
}
// After job completes, paginate through results:
GET /services/data/v60.0/jobs/query/{jobId}/results?maxRecords=50000
// Header: Sforce-Locator returned if more pages exist
GET /services/data/v60.0/jobs/query/{jobId}/results?locator={locator}
Performance Tuning and Operational Limits
The Bulk API processes records at approximately 500-2,000 records per second for simple objects with minimal automation. Complex objects (Opportunity, Case) with multiple triggers and roll-up summary fields process at 200-500 records per second. A 5-million-record upsert against a complex object can take 3-7 hours. Plan maintenance windows accordingly and design integrations to run incrementally (daily deltas of thousands of records) rather than large catch-up batches wherever possible.
Salesforce limits concurrent Bulk API jobs per org. The limit is typically 5 concurrent batch jobs per org for Bulk API 1.0, and Bulk API 2.0 jobs count against the overall concurrent ingest limit. Running multiple parallel Bulk API jobs on the same org simultaneously can cause job queuing and significantly longer processing times. Coordinate bulk job scheduling across integration teams to avoid concurrent job conflicts, especially in complex org environments.
Monitor bulk job health using the Bulk API status endpoint and by subscribing to the BulkApiResultEvent Platform Event, which fires when a job completes. Integrate job monitoring into the operational dashboard — a bulk job that silently fails and retries indefinitely without alerting is a common source of data inconsistency in production integrations. Set maximum retry counts and alerting thresholds for job failure rates above acceptable levels.
Key Takeaways
- Bulk API 2.0 replaces Bulk API 1.0 for all new integrations — CSV format, cleaner REST lifecycle, automatic batch management. Use Bulk API for any recurring operation processing more than ~2,000 records.
- The job lifecycle is: create → upload CSV → close (UploadComplete) → poll until JobComplete → download successfulResults and failedResults. Implement all five phases — skipping result download means failures go undetected.
- All org automation (triggers, Flows, workflow rules) fires during Bulk API processing. Complex automation significantly reduces throughput — test with production-equivalent automation in sandbox before go-live.
- Use a custom setting flag pattern to optionally bypass automation during bulk operations rather than disabling automation in the org — this preserves automation for interactive users while allowing bulk operations to bypass it when required.
- Query jobs support SOQL-based extraction of millions of records with paginated CSV results — this is the correct extraction pattern for large Salesforce datasets, not repeated REST API calls.
- Concurrent job limits and per-org throughput constraints require scheduling coordination across integration teams — plan bulk job windows to avoid contention.
Test Your Understanding
1. A bulk upsert job of 200,000 Opportunity records completes with state "JobComplete" but the integration team notices that 12,000 records were not updated in Salesforce. What is the most likely cause and how should it be diagnosed?
2. An organisation runs three concurrent Bulk API 2.0 ingest jobs simultaneously — one against Account, one against Contact, and one against Opportunity. Jobs that normally complete in 30 minutes are taking 3 hours. What is the most likely cause?
3. An integration needs to extract all Cases modified in the last 24 hours (potentially 300,000 records). The current implementation uses a loop of REST API SOQL queries with LIMIT 2000 OFFSET X. What is wrong with this approach?
Discussion & Feedback