JSON and XML Processing in Apex: Performance Trade-offs

What you will learn...

The three JSON parsing approaches in Apex: JSON.deserialize, JSON.deserializeUntyped, and JSONParser
When each approach is appropriate and the heap implications of each
Typed vs untyped parsing: the class-based deserialization pattern and its trade-offs
XML processing: Dom.Document vs XmlStreamReader and when streaming beats DOM
Heap limit strategies for large payloads: chunking, streaming, and selective field extraction
Common patterns that cause heap and CPU limit violations in integration Apex

The Heap Constraint in Integration Apex

Salesforce Apex has a 6MB synchronous heap limit (12MB for asynchronous contexts). When parsing a large API response, the heap is consumed by both the raw string and the parsed data structure simultaneously during deserialization. A 2MB JSON response body string occupies 2MB of heap. Deserializing it into a typed Apex object graph creates a second copy of the data in the parsed structure, consuming an additional 2-4MB (typed Apex objects have overhead beyond the raw data size). For large responses, this can approach or exceed the 6MB synchronous limit.

CPU limits are the other constraint for XML and JSON parsing. Parsing is computationally expensive relative to other Apex operations. A 1MB XML document parsed using Dom.Document consumes significantly more CPU milliseconds than the equivalent JSON deserialization because XML has a more complex token structure, optional attributes, namespaces, and a more verbose node model. The 10-second CPU limit in synchronous Apex can be exhausted by parsing very large XML documents, particularly those with deep nesting and many attributes.

The practical guidance for integration Apex: always measure heap and CPU consumption of parsing operations in a realistic sandbox before deploying to production. An API response that is 200KB in sandbox load testing may grow to 800KB in production for records with more related data. Parsers that are fine at sandbox scale fail in production when payloads are larger than tested.

⚠️

Deserializing into generic Map<String,Object> is more heap-expensive than typed classes: A common assumption is that untyped deserialization is "lighter" than typed class deserialization. The opposite is often true — Map<String,Object> structures carry more overhead per element than strongly-typed Apex classes because each Map entry has a String key and an Object value with type metadata. For large flat structures, typed deserialization into inner classes is typically more heap-efficient.

Three JSON Parsing Approaches in Apex

JSON.deserialize(jsonString, ApexType.class) is the most developer-friendly approach — define an Apex class that mirrors the JSON structure, and JSON.deserialize instantiates the class from the JSON string. This is strongly typed, supports nested objects and lists, and produces clean Apex code. The limitation is that the Apex class must exactly match the JSON structure — extra fields in the JSON that are not in the class are silently ignored; missing required fields cause deserialization errors. Field names in the Apex class must match JSON keys (case-insensitive by default).

JSON.deserializeUntyped(jsonString) returns a Map<String,Object> for JSON objects, a List<Object> for JSON arrays, and primitives for scalar values. This is flexible for dynamic structures where the JSON schema is not fully known at compile time, but it requires explicit type casting at every access point. Accessing a nested field requires multiple casts, making the code verbose and error-prone for deeply nested structures.

JSONParser (System.JSONParser) is a streaming parser that processes the JSON token-by-token without building a full in-memory object graph. It is the most memory-efficient approach for large payloads because it reads the JSON incrementally — the full document is never simultaneously in memory as a parsed structure. The trade-off is code complexity: streaming parsers require explicit state management, making them difficult to write and maintain for complex nested structures.

// Typed deserialization — clean for known schemas
public class PaymentResponse {
    public String transactionId;
    public String status;
    public Decimal amount;
    public String currency;
    public PaymentDetails details;

    public class PaymentDetails {
        public String cardBrand;
        public String lastFour;
        public String authCode;
    }
}

// Usage:
HttpResponse res = http.send(req);
PaymentResponse payment = (PaymentResponse)
    JSON.deserialize(res.getBody(), PaymentResponse.class);
System.debug('Transaction: ' + payment.transactionId);
System.debug('Auth: ' + payment.details.authCode);

// Untyped for dynamic structures:
Map<String,Object> parsed =
    (Map<String,Object>) JSON.deserializeUntyped(res.getBody());
String txId = (String) parsed.get('transactionId');
Map<String,Object> details =
    (Map<String,Object>) parsed.get('details');
String authCode = (String) details.get('authCode');

XML Processing: DOM vs Streaming

Apex provides two XML processing models. Dom.Document loads the entire XML document into a DOM (Document Object Model) tree in memory, allowing random access to any element by traversal. This is the familiar, developer-friendly model — you can navigate forward and backward through the document, access parent elements from child elements, and query elements by tag name. The cost is that the entire document is loaded into heap simultaneously, and the DOM tree occupies more heap than the raw XML string due to the node object overhead.

XmlStreamReader is a forward-only streaming parser that reads one XML token at a time (START_ELEMENT, END_ELEMENT, CHARACTERS). It is significantly more memory-efficient than Dom.Document for large XML documents because only the current token is in memory at any time. XmlStreamReader is appropriate for large SOAP responses (SAP IDOCs can be hundreds of kilobytes of XML), long HL7 messages, and any XML document that approaches or exceeds 100KB.

The decision is straightforward: use Dom.Document for small XML documents (under 100KB) where developer productivity matters more than heap conservation. Use XmlStreamReader for large XML documents where heap constraints are a real concern. Mixing the two models in a single integration adds unnecessary complexity — choose one based on the expected maximum document size and apply it consistently.

// XmlStreamReader — memory-efficient streaming for large XML
public static List<Map<String,String>> parseSoapResponse(String xmlBody) {
    List<Map<String,String>> results = new List<Map<String,String>>();
    XmlStreamReader reader = new XmlStreamReader(xmlBody);
    Map<String,String> current = null;

    while (reader.hasNext()) {
        if (reader.getEventType() == XmlTag.START_ELEMENT) {
            if (reader.getLocalName() == 'Customer') {
                current = new Map<String,String>();
            } else if (current != null) {
                String fieldName = reader.getLocalName();
                reader.next();
                if (reader.getEventType() == XmlTag.CHARACTERS) {
                    current.put(fieldName, reader.getText());
                }
            }
        } else if (reader.getEventType() == XmlTag.END_ELEMENT) {
            if (reader.getLocalName() == 'Customer' && current != null) {
                results.add(current);
                current = null;
            }
        }
        reader.next();
    }
    return results;
}

Heap and CPU Strategies for Large Payloads

When a single API response exceeds what can be safely parsed within heap limits, the integration architecture must change. The most effective strategy is pagination at the source — rather than fetching a response with 500 records, fetch pages of 50 records with 10 API calls. Each call parses a smaller response and completes before the next call begins, preventing heap accumulation across calls. Most REST APIs support pagination via page size parameters or cursor-based pagination tokens.

Selective field extraction is another heap reduction strategy. If a 500KB JSON response contains 50 fields per record but the integration only needs 5 fields, parse only those 5 fields using a targeted approach (JSONParser streaming parser that only reads the required token names, or untyped deserialization followed by selective extraction before the full map goes out of scope). Avoid deserializing fields that will never be used — they consume heap for no benefit.

For asynchronous Apex (Queueable, Batch, @future), the heap limit doubles to 12MB. Moving large payload processing to an asynchronous context is a straightforward architectural decision for integrations triggered by incoming callouts — receive the callout synchronously, store the payload temporarily (in a custom object or Platform Cache), and process it asynchronously in a Queueable that has 12MB of heap available. This approach handles payloads that would fail in synchronous context without changing the parsing logic.

Common Limit Violation Patterns

The most common heap violation pattern in integration Apex is building large collections during parsing. If a JSON response contains 1,000 order items and the parser creates an Apex Order object for each one with its associated List<OrderLine>, the in-memory object graph can exceed the heap limit before the collection is complete. Fix: process and insert records in batches during parsing rather than accumulating all parsed records in memory before any DML.

The most common CPU violation pattern is repeated string operations during XML parsing. Concatenating XML strings in a loop — building an XML document by string concatenation — is O(n²) in CPU time as the string grows. Use a StringBuilder-equivalent pattern (collect string segments in a List<String> and join at the end with String.join()) rather than repeated string concatenation in loops. This applies to both building request XML and processing response strings.

Key Takeaways

Heap in synchronous Apex is 6MB — during JSON/XML parsing, both the raw string and the parsed structure consume heap simultaneously. Measure heap consumption for realistic payload sizes, not sandbox minimums.
JSON.deserialize into typed classes is developer-friendly and often more heap-efficient than untyped Map<String,Object> for structured data. Use untyped deserialization for dynamic schemas only.
JSONParser (streaming) is the most memory-efficient JSON approach for large payloads — reads token-by-token without building a full in-memory structure. Higher code complexity is the trade-off.
Dom.Document (DOM) is appropriate for XML documents under ~100KB. XmlStreamReader (streaming) is required for larger XML documents where heap conservation matters — SAP IDOCs, SOAP responses, HL7 messages.
For payloads that exceed synchronous heap limits, move processing to asynchronous Apex (Queueable, Batch) where the heap limit is 12MB, or implement pagination at the source to reduce per-call payload size.
Avoid building large in-memory collections during parsing — process and persist records in batches as they are parsed. String concatenation in loops is O(n²) in CPU — use List<String> + String.join() instead.

Test Your Understanding

1. An Apex callout receives a 1.5MB JSON response and deserializes it into a Map<String,Object>. The transaction fails with a heap limit exception. What is the most effective architectural fix?

Switch from JSON.deserializeUntyped to JSON.deserialize with a typed class — typed deserialization always uses less heap than untyped

Implement pagination at the source API — fetch 10 pages of 150KB each rather than one 1.5MB payload. If pagination is not available, move the callout and parsing to a Queueable (asynchronous, 12MB heap limit) by storing the response payload in a Platform Cache or Custom Setting and processing it asynchronously.

Request the API vendor compress the response with gzip — Salesforce Apex automatically decompresses gzip responses and the compressed size counts against the heap limit

2. An Apex method builds a SOAP request XML by repeatedly concatenating strings in a loop: xmlBody += '<Item>' + item.Name + '</Item>'; for 500 items. The method fails with CPU limit exceeded. What is the fix?

Move the loop to a Batch Apex job — batch context provides a higher CPU limit per execute() call

Replace string concatenation with a List: List<String> parts = new List<String>(); parts.add('<Item>' + item.Name + '</Item>'); then use String.join(parts, '') at the end. String concatenation in a loop is O(n²) in CPU; list-then-join is O(n).

Reduce the number of fields per XML item to reduce the string length — shorter strings process faster

3. An integration receives a 400KB SAP SOAP IDOC response containing 2,000 Customer elements. Processing using Dom.Document occasionally throws heap limit exceptions in production. What is the appropriate solution?

Increase the Apex heap limit for the integration user's profile in Salesforce Setup

Replace Dom.Document with XmlStreamReader — for a 400KB XML document with 2,000 elements, streaming parsing reads one token at a time without loading the full document into a DOM tree, dramatically reducing heap consumption

Split the SOAP call into multiple requests, each requesting a subset of customers — Dom.Document works fine for payloads under 400KB

JSON and XML Processing in Apex: Performance Trade-offs

The Heap Constraint in Integration Apex

Three JSON Parsing Approaches in Apex

XML Processing: DOM vs Streaming

Heap and CPU Strategies for Large Payloads

Common Limit Violation Patterns

Key Takeaways

Test Your Understanding

Continue Reading

Outbound Messaging vs Platform Events vs Streaming: When to Use Which

iPaaS Platforms and Salesforce: Boomi, Workato, Zapier at Enterprise Scale

Salesforce as Source of Truth: Data Governance Implications

Discussion & Feedback