- What Salesforce data storage costs and when archiving becomes a financial priority
- The four archiving patterns: soft delete, External Objects, data warehouse, and cold storage
- How to define archiving eligibility criteria and build the archive selection logic
- Protecting report accuracy during and after archiving
- Compliance considerations: what cannot be deleted and how to handle it
- The BigObjects option for keeping archived data query-accessible within Salesforce
The Salesforce Storage Problem
Salesforce charges for data storage beyond the included allocation (10GB + 20MB per user for Enterprise Edition). When an org exceeds its allocation, additional storage must be purchased in 500MB or 1GB increments at significant cost — typically $125-$300 per 500MB per month depending on edition and contract. For large orgs with millions of legacy records, annual storage overage costs can run to hundreds of thousands of dollars. Archiving is a cost reduction initiative as much as a data management practice.
Beyond direct storage cost, data volume affects Salesforce performance. Reports and SOQL queries that scan large tables take longer as the record count grows. Lookup searches that return results from a 10-million-record Account table are slower than the equivalent search against a 1-million-record table. Large data volumes also affect backup processing, sandbox refresh times, and the duration of platform maintenance events. Active archiving reduces all of these operational impacts.
The challenge is that archiving must not break existing functionality. Reports that show "closed cases year-to-date" depend on those cases existing in Salesforce. Dashboards tracking historical opportunity pipeline depend on historical opportunity records. Users who need to reference a 3-year-old case to understand a customer's issue history depend on that case being accessible. The archiving strategy must define what "accessible" means for archived records — read-only in Salesforce, queryable in an external archive, or retrievable on request.
The Four Archiving Patterns
Soft delete (status flag archiving) marks records as archived using a custom field (Is_Archived__c = true) and excludes them from standard list views, searches, and reports using filter criteria. Records remain in Salesforce storage but are invisible in normal operations. This is the simplest pattern and preserves all Salesforce functionality for archived records — they can be unarchived by clearing the flag. The downside: records still consume storage, so this pattern doesn't reduce storage costs.
External Objects archiving moves data to an external archive system (a separate database or data warehouse) and creates External Object definitions in Salesforce that allow users to query the archived data via Salesforce Connect. Archived records are removed from Salesforce storage but remain accessible through the Salesforce UI via External Object related lists. The storage cost is eliminated, but archived records have the External Object limitations (no search, limited SOQL, callout latency).
Data warehouse archiving exports records to a data warehouse (Snowflake, BigQuery, Redshift) and deletes them from Salesforce. The data warehouse becomes the archive for historical analysis — reports requiring historical data are built in the BI layer against the warehouse rather than directly in Salesforce reports. This is the most economical approach for pure storage reduction but requires reporting infrastructure investment and user training to access archived data outside Salesforce.
BigObjects archiving moves records into Salesforce BigObjects — a native Salesforce high-scale storage option designed for archival data. BigObjects are queryable within Salesforce using SOQL (with limitations) and do not count against standard data storage limits. They have separate storage pricing that is significantly cheaper than standard data storage. BigObjects provide the best compromise: data stays in Salesforce, storage cost is reduced, and the data remains queryable via Apex and certain reports.
Defining Eligibility and Building Archive Logic
Archiving eligibility criteria define which records can be safely moved out of active storage. Eligibility typically has three components: age threshold (records older than X months/years), status threshold (records in a terminal status: Closed, Resolved, Lost, Cancelled), and no-active-dependency check (no active related records that reference the candidate, such as an open Case that references a Closed Opportunity via a custom lookup).
The archive selection query must be idempotent — running it twice should not double-archive records. Using a cursor-based approach with an archive timestamp or a batch ID ensures that the archive job can be resumed if it fails mid-run without re-processing already-archived records. External ID fields on the archive target should use the Salesforce Record ID as the archive key, enabling lookups from the archive back to the original record metadata if needed.
// Archive selection query — Closed Cases older than 3 years
// with no active child records
SELECT Id, CaseNumber, Subject, Status, AccountId, ClosedDate,
Description, Priority, Origin, SystemModstamp
FROM Case
WHERE Status = 'Closed'
AND ClosedDate < :Date.today().addYears(-3)
AND Id NOT IN (
SELECT ParentId FROM Case
WHERE Status != 'Closed'
AND ParentId != null
)
AND Is_Archived__c = false
ORDER BY ClosedDate ASC
LIMIT 50000 -- Process in batches to avoid timeout
Protecting Report Accuracy
The most common archive failure is a report that used to show correct historical totals and now shows different numbers after records are archived. This happens when reports count or sum records that are no longer in Salesforce. The architecture fix is to ensure that all metrics that span the archive boundary are computed before archiving and stored as summary records that remain in Salesforce even after the detail records are archived.
For example, before archiving Closed Cases from fiscal year 2022, compute and store the annual case volume and resolution metrics by Account as summary records in a Custom Object (Annual_Case_Summary__c). Reports showing year-over-year case trends can query the summary records for historical years and the live Case data for the current year, without depending on 3-year-old individual Case records being present in Salesforce.
An alternative is to ensure that all historical reporting is served from the data warehouse rather than from Salesforce reports directly. If the reporting strategy moves historical data access to the warehouse (which retains the full history), Salesforce reports only need to cover the operational window (current quarter, rolling 12 months), and archiving older records has no impact on the reporting layer. This approach requires BI tooling investment but is the cleanest long-term architecture for large-scale archiving programmes.
Compliance Constraints and BigObjects
Some records cannot be archived due to compliance requirements. Financial transaction records, medical records, legal case records, and certain communication records have mandatory retention periods defined by regulation — HIPAA (6 years), SOX (7 years), FINRA (6 years), UK Companies Act (6 years). Records within their mandatory retention period cannot be deleted, only moved to a compliant archive system. The archiving strategy must categorise records by their compliance retention status before any deletion occurs.
Salesforce BigObjects provide a compelling option for compliance archiving. BigObjects support up to 1 billion records per object, use index-based access, and are priced significantly below standard data storage. A BigObject designed to mirror the Case schema can store the full compliance archive of closed cases while remaining queryable within Salesforce via Apex (SOQL against BigObjects has restrictions — no aggregate functions, no OFFSET, index-based access only). Users can access archived cases through custom Apex-backed components that query the BigObject rather than standard SOQL against the Case object.
Key Takeaways
- Salesforce data storage has direct cost implications beyond the included allocation. Storage audits should identify the top 3-5 objects consuming storage before designing an archiving strategy.
- Four archiving patterns: soft delete (simplest, no storage reduction), External Objects (data accessible in Salesforce UI via callout), data warehouse archiving (maximum cost reduction, requires BI infrastructure), and BigObjects (Salesforce-native archive at lower storage cost, queryable via Apex).
- Archive eligibility requires three criteria: age threshold, terminal status, and no-active-dependency check. Eligibility queries must be idempotent to support safe batch processing and resume-on-failure.
- Protect report accuracy by pre-computing summary records for metrics that span the archive boundary before archiving detail records. Or migrate historical reporting to the data warehouse and limit Salesforce reports to the operational window.
- Compliance-mandated retention records (HIPAA, SOX, FINRA) cannot be deleted within their retention period — only moved to compliant archive. Categorise records by compliance status before any deletion.
- Archive ContentDocument files alongside records — files typically consume more storage than the records themselves. An archiving programme that doesn't include files has limited storage cost impact.
Test Your Understanding
1. An org implements soft-delete archiving by setting Is_Archived__c = true on Cases older than 3 years. After 6 months of operation, the storage usage report shows no reduction in data storage. What is the cause?
2. A Salesforce report shows "Total Cases Closed — Last 5 Years" by Account. The team plans to archive Cases older than 3 years to a data warehouse and delete them from Salesforce. What must be done before deletion to protect this report?
3. A healthcare organisation wants to archive Salesforce Case records from 2018 (8 years old) to reduce storage costs. The Cases contain patient interaction records subject to HIPAA. What must the archiving strategy account for?
Discussion & Feedback