- How the SOQL query engine evaluates WHERE clauses and chooses between index seek and table scan
- What selectivity means and how to calculate it for a given query
- Which fields are automatically indexed and how to request custom indexes
- Using EXPLAIN to diagnose a slow query before it hits production
- Common SOQL anti-patterns: non-selective filters, text search on long fields, formula field filters
- Query optimisation techniques that consistently improve performance
How the SOQL Engine Decides: Seek vs Scan
Understanding SOQL performance starts with understanding the two modes the query engine can operate in: an index seek (fast) and a table scan (slow).
An index seek navigates a B-tree index structure to find matching rows directly — it reads a small fraction of the data to answer the query. A table scan reads every row in your org's data partition for the object and evaluates the WHERE clause against each row. On a 10-million-record object, a table scan reads 10 million rows; an index seek might read a few thousand.
The query engine chooses based on selectivity. If the WHERE clause filter is selective (returns a small percentage of total records), the engine uses an available index. If the filter is non-selective (would return most records), the engine determines that a full scan is actually faster — index overhead is not worth it when most of the data must be read anyway.
Automatically Indexed Fields
Salesforce automatically creates indexes for a specific set of fields on every object. Knowing which fields are indexed helps you write queries that can use them.
Always indexed: Id (record ID), Name (on standard objects), OwnerId, CreatedDate, SystemModstamp, LastModifiedDate, RecordTypeId, and all fields marked as External ID or Unique. These fields can support selective queries without requesting custom indexes.
Standard relationship fields: Lookup and master-detail relationship fields (AccountId on Contact, CaseId on CaseComment) are indexed. Filtering by parent ID is one of the most efficient query patterns in SOQL.
Not automatically indexed: Custom fields that are not External ID or Unique, standard fields not in the above list (Amount, StageName, Status), and formula fields (formula fields cannot be indexed at all).
-- Efficient: filters on indexed fields
SELECT Id, Name, Amount FROM Opportunity
WHERE AccountId = :accountId -- indexed relationship
AND CreatedDate = LAST_N_DAYS:30 -- indexed standard field
-- Problematic: filter on non-indexed custom field
SELECT Id, Name FROM Order__c
WHERE Customer_Segment__c = 'Enterprise' -- not indexed
-- On 5M records this triggers a table scan
-- Request custom index for Customer_Segment__c via Salesforce Support
-- After indexing, the query becomes selective and fast
Using EXPLAIN to Diagnose Slow Queries
SOQL EXPLAIN (available via the Tooling API and the Developer Console query plan tool) reveals how the query engine intends to execute a query — before you run it against production data.
The EXPLAIN output includes a cardinality estimate (how many rows the engine thinks will be returned), a cost value (relative execution cost — lower is better), and the leading operation type (Index, TableScan, or others). A TableScan operation on a large object is the diagnostic signal you are looking for.
-- Using EXPLAIN via Tooling API
GET /services/data/v60.0/query?explain=
SELECT+Id,Name,Amount
FROM+Opportunity
WHERE+StageName='Closed+Won'
AND+CloseDate=THIS_YEAR
-- Response includes:
{
"plans": [{
"cardinality": 42000,
"fields": ["StageName"],
"leadingOperationType": "TableScan", // bad
"relativeCost": 4.2 // high cost
}]
}
-- Vs a well-indexed query:
{
"plans": [{
"cardinality": 180,
"fields": ["AccountId"],
"leadingOperationType": "Index", // good
"relativeCost": 0.02 // low cost
}]
}
Common SOQL Anti-Patterns
These are the query patterns that cause the most performance problems in production Salesforce orgs.
Non-selective status filters: WHERE Status__c = 'Active' on an object where 90% of records are Active. The filter is technically correct but non-selective — it returns most records. Add a date constraint (AND CreatedDate = LAST_N_DAYS:90) to make the filter set selective.
Negative filters on large objects: WHERE Stage != 'Closed Won'. Negative filters (!=, NOT IN, NOT LIKE) are inherently non-selective because they potentially match most records. Never use negative filters as the primary filter on a large object — pair them with a selective positive filter first.
Filter on formula fields: Formula fields are not stored in the database — they are calculated at query time. A WHERE clause on a formula field forces Salesforce to evaluate the formula for every row before filtering. This is always a full table scan regardless of the result size. Move formula logic to a stored field (populated by a trigger or flow) if the field must be queried.
LIKE with leading wildcard: WHERE Name LIKE '%Smith%'. A leading wildcard defeats index lookup — the engine cannot use the Name index to find 'Smith' anywhere in the string. LIKE 'Smith%' (trailing wildcard only) can use the index. LIKE '%Smith%' cannot.
Query Optimisation Techniques
When a query is slow, these techniques consistently improve performance.
Add a date range as the primary filter: CreatedDate or LastModifiedDate are indexed on every object. Adding a date range as the first filter dramatically reduces the scan range even on objects without other selective fields.
Filter by parent ID before filtering by fields: WHERE AccountId IN :accountIds AND Status__c = 'Active' is faster than WHERE Status__c = 'Active' AND AccountId IN :accountIds — the relationship filter (indexed) narrows the result set before the non-indexed field filter is evaluated.
Reduce the field list: SELECT only the fields you need. Wide queries (many fields) consume more heap and take longer to serialise. Never SELECT * equivalent — always name specific fields.
Use LIMIT strategically: If you only need the first N results (e.g., "get the most recent 10 cases for this account"), add ORDER BY + LIMIT to stop the engine as soon as it has enough results rather than fetching all matching rows.
-- Optimised query combining multiple techniques
SELECT Id, Subject, Status, CreatedDate
FROM Case
WHERE AccountId IN :accountIds -- 1. selective relationship filter
AND CreatedDate = LAST_N_DAYS:90 -- 2. date range narrows result
AND Status != 'Closed' -- 3. negative filter after selective positive
ORDER BY CreatedDate DESC -- 4. sorted
LIMIT 200 -- 5. bounded result set
Key Takeaways
- SOQL performs an index seek (fast) when the WHERE filter is selective (<5-10% of rows), or a table scan (slow) when non-selective. The query engine decides which to use based on selectivity estimates.
- Automatically indexed fields include Id, OwnerId, CreatedDate, External ID fields, Unique fields, and relationship fields. All other fields are unindexed unless you request a custom index via Salesforce support.
- SOQL EXPLAIN (via Tooling API or Developer Console) reveals whether a query will use an index or a table scan before the query runs against production data — use it for any new query on large objects.
- Critical anti-patterns: non-selective status filters (most records match), negative filters as primary filters, formula field filters (always a full scan), and LIKE with leading wildcards.
- Optimisation techniques: add date range as primary filter, filter by parent relationship ID first, select only required fields, and use LIMIT to bound result sets.
- Formula fields cannot be indexed — if a formula field must be queried frequently, store its value in a real field populated by automation, and query the stored field instead.
Test Your Understanding
1. A SOQL query filters on a custom field "Region__c" on a 2-million-record object. The field is not an External ID. EXPLAIN shows "leadingOperationType": "TableScan". What is the most appropriate fix?
2. A developer writes WHERE Description__c LIKE '%invoice%' to find records mentioning "invoice" in a long text field. What is wrong with this query on a large object?
3. You have a query: WHERE StageName = 'Closed Won' on Opportunity with 5 million records, 60% of which are Closed Won. The query is slow. Adding AND CreatedDate = THIS_YEAR reduces matching records to 200,000. Will this likely fix the performance problem?
Discussion & Feedback