← Back to Integration & Data
INTG-020 Integration & Data 20 min read For: Salesforce Architects & Tech Leaders

Data Archiving Strategy: Moving Records Out Without Breaking Reports

Salesforce storage costs money, and large orgs accumulate years of data that nobody accesses but everybody is afraid to delete. Archiving is the answer — but it requires careful architecture to ensure that historical reports, compliance lookups, and user expectations remain intact after records are removed from the active org.

VS

Vishal Sharma

Salesforce Architecture Specialist · Updated May 2026

What you will learn...
  • What Salesforce data storage costs and when archiving becomes a financial priority
  • The four archiving patterns: soft delete, External Objects, data warehouse, and cold storage
  • How to define archiving eligibility criteria and build the archive selection logic
  • Protecting report accuracy during and after archiving
  • Compliance considerations: what cannot be deleted and how to handle it
  • The BigObjects option for keeping archived data query-accessible within Salesforce

The Salesforce Storage Problem

Salesforce charges for data storage beyond the included allocation (10GB + 20MB per user for Enterprise Edition). When an org exceeds its allocation, additional storage must be purchased in 500MB or 1GB increments at significant cost — typically $125-$300 per 500MB per month depending on edition and contract. For large orgs with millions of legacy records, annual storage overage costs can run to hundreds of thousands of dollars. Archiving is a cost reduction initiative as much as a data management practice.

Beyond direct storage cost, data volume affects Salesforce performance. Reports and SOQL queries that scan large tables take longer as the record count grows. Lookup searches that return results from a 10-million-record Account table are slower than the equivalent search against a 1-million-record table. Large data volumes also affect backup processing, sandbox refresh times, and the duration of platform maintenance events. Active archiving reduces all of these operational impacts.

The challenge is that archiving must not break existing functionality. Reports that show "closed cases year-to-date" depend on those cases existing in Salesforce. Dashboards tracking historical opportunity pipeline depend on historical opportunity records. Users who need to reference a 3-year-old case to understand a customer's issue history depend on that case being accessible. The archiving strategy must define what "accessible" means for archived records — read-only in Salesforce, queryable in an external archive, or retrievable on request.

💡
Storage audit before archiving: Before designing an archiving strategy, run a storage usage analysis to identify where the storage is actually consumed. Salesforce's storage detail report (Setup > Storage Usage) shows storage by object type. Often, 70-80% of storage is consumed by three to five objects (typically Cases, Activities, Attachments/Files). Focus archiving efforts on the largest consumers for maximum cost reduction impact.

The Four Archiving Patterns

Soft delete (status flag archiving) marks records as archived using a custom field (Is_Archived__c = true) and excludes them from standard list views, searches, and reports using filter criteria. Records remain in Salesforce storage but are invisible in normal operations. This is the simplest pattern and preserves all Salesforce functionality for archived records — they can be unarchived by clearing the flag. The downside: records still consume storage, so this pattern doesn't reduce storage costs.

External Objects archiving moves data to an external archive system (a separate database or data warehouse) and creates External Object definitions in Salesforce that allow users to query the archived data via Salesforce Connect. Archived records are removed from Salesforce storage but remain accessible through the Salesforce UI via External Object related lists. The storage cost is eliminated, but archived records have the External Object limitations (no search, limited SOQL, callout latency).

Data warehouse archiving exports records to a data warehouse (Snowflake, BigQuery, Redshift) and deletes them from Salesforce. The data warehouse becomes the archive for historical analysis — reports requiring historical data are built in the BI layer against the warehouse rather than directly in Salesforce reports. This is the most economical approach for pure storage reduction but requires reporting infrastructure investment and user training to access archived data outside Salesforce.

BigObjects archiving moves records into Salesforce BigObjects — a native Salesforce high-scale storage option designed for archival data. BigObjects are queryable within Salesforce using SOQL (with limitations) and do not count against standard data storage limits. They have separate storage pricing that is significantly cheaper than standard data storage. BigObjects provide the best compromise: data stays in Salesforce, storage cost is reduced, and the data remains queryable via Apex and certain reports.

Defining Eligibility and Building Archive Logic

Archiving eligibility criteria define which records can be safely moved out of active storage. Eligibility typically has three components: age threshold (records older than X months/years), status threshold (records in a terminal status: Closed, Resolved, Lost, Cancelled), and no-active-dependency check (no active related records that reference the candidate, such as an open Case that references a Closed Opportunity via a custom lookup).

The archive selection query must be idempotent — running it twice should not double-archive records. Using a cursor-based approach with an archive timestamp or a batch ID ensures that the archive job can be resumed if it fails mid-run without re-processing already-archived records. External ID fields on the archive target should use the Salesforce Record ID as the archive key, enabling lookups from the archive back to the original record metadata if needed.

// Archive selection query — Closed Cases older than 3 years
// with no active child records
SELECT Id, CaseNumber, Subject, Status, AccountId, ClosedDate,
       Description, Priority, Origin, SystemModstamp
FROM Case
WHERE Status = 'Closed'
  AND ClosedDate < :Date.today().addYears(-3)
  AND Id NOT IN (
    SELECT ParentId FROM Case
    WHERE Status != 'Closed'
      AND ParentId != null
  )
  AND Is_Archived__c = false
ORDER BY ClosedDate ASC
LIMIT 50000  -- Process in batches to avoid timeout

Protecting Report Accuracy

The most common archive failure is a report that used to show correct historical totals and now shows different numbers after records are archived. This happens when reports count or sum records that are no longer in Salesforce. The architecture fix is to ensure that all metrics that span the archive boundary are computed before archiving and stored as summary records that remain in Salesforce even after the detail records are archived.

For example, before archiving Closed Cases from fiscal year 2022, compute and store the annual case volume and resolution metrics by Account as summary records in a Custom Object (Annual_Case_Summary__c). Reports showing year-over-year case trends can query the summary records for historical years and the live Case data for the current year, without depending on 3-year-old individual Case records being present in Salesforce.

An alternative is to ensure that all historical reporting is served from the data warehouse rather than from Salesforce reports directly. If the reporting strategy moves historical data access to the warehouse (which retains the full history), Salesforce reports only need to cover the operational window (current quarter, rolling 12 months), and archiving older records has no impact on the reporting layer. This approach requires BI tooling investment but is the cleanest long-term architecture for large-scale archiving programmes.

Compliance Constraints and BigObjects

Some records cannot be archived due to compliance requirements. Financial transaction records, medical records, legal case records, and certain communication records have mandatory retention periods defined by regulation — HIPAA (6 years), SOX (7 years), FINRA (6 years), UK Companies Act (6 years). Records within their mandatory retention period cannot be deleted, only moved to a compliant archive system. The archiving strategy must categorise records by their compliance retention status before any deletion occurs.

Salesforce BigObjects provide a compelling option for compliance archiving. BigObjects support up to 1 billion records per object, use index-based access, and are priced significantly below standard data storage. A BigObject designed to mirror the Case schema can store the full compliance archive of closed cases while remaining queryable within Salesforce via Apex (SOQL against BigObjects has restrictions — no aggregate functions, no OFFSET, index-based access only). Users can access archived cases through custom Apex-backed components that query the BigObject rather than standard SOQL against the Case object.

💡
Archive attachments and files first: In most Salesforce orgs, ContentDocument (Files) and Attachments consume more storage than record data. A Case with five 2MB attachments consumes 10MB of storage for the files alone — multiple times the storage of the Case record itself. An archiving programme that removes old Case records but leaves the attached files in Salesforce will have limited storage impact. Archive files alongside records.

Key Takeaways

  • Salesforce data storage has direct cost implications beyond the included allocation. Storage audits should identify the top 3-5 objects consuming storage before designing an archiving strategy.
  • Four archiving patterns: soft delete (simplest, no storage reduction), External Objects (data accessible in Salesforce UI via callout), data warehouse archiving (maximum cost reduction, requires BI infrastructure), and BigObjects (Salesforce-native archive at lower storage cost, queryable via Apex).
  • Archive eligibility requires three criteria: age threshold, terminal status, and no-active-dependency check. Eligibility queries must be idempotent to support safe batch processing and resume-on-failure.
  • Protect report accuracy by pre-computing summary records for metrics that span the archive boundary before archiving detail records. Or migrate historical reporting to the data warehouse and limit Salesforce reports to the operational window.
  • Compliance-mandated retention records (HIPAA, SOX, FINRA) cannot be deleted within their retention period — only moved to compliant archive. Categorise records by compliance status before any deletion.
  • Archive ContentDocument files alongside records — files typically consume more storage than the records themselves. An archiving programme that doesn't include files has limited storage cost impact.

Test Your Understanding

1. An org implements soft-delete archiving by setting Is_Archived__c = true on Cases older than 3 years. After 6 months of operation, the storage usage report shows no reduction in data storage. What is the cause?

Soft-delete archiving requires a 30-day grace period before storage is released — the reduction will appear after 30 days
Soft-delete archiving does not remove records from Salesforce storage — it only hides them from views and reports by filtering on the archived flag. Records still occupy storage. To reduce storage costs, records must be actually deleted from Salesforce or moved to BigObjects.
The archiving job failed — Is_Archived__c was not actually set on the old Cases due to a permission issue

2. A Salesforce report shows "Total Cases Closed — Last 5 Years" by Account. The team plans to archive Cases older than 3 years to a data warehouse and delete them from Salesforce. What must be done before deletion to protect this report?

Nothing — the report filter "Last 5 Years" will automatically switch to query the data warehouse for years 3-5 after the records are deleted
Pre-compute and store the annual/quarterly case count totals by Account as summary records in Salesforce before deleting the detail records. Update the report to use these summary records for historical years, or migrate this historical reporting to the data warehouse BI layer.
Change the report to filter "Last 3 Years" before deletion — this ensures the report only queries data that remains in Salesforce

3. A healthcare organisation wants to archive Salesforce Case records from 2018 (8 years old) to reduce storage costs. The Cases contain patient interaction records subject to HIPAA. What must the archiving strategy account for?

HIPAA records must be deleted immediately after they are 7 years old — retaining them beyond 7 years creates compliance risk
HIPAA requires healthcare records to be retained for 6 years (medical records) to 7+ years depending on state. 2018 records are 8 years old and may be eligible for deletion — but the archive strategy must confirm the specific retention requirement, ensure the archive system is HIPAA-compliant, and maintain a certificate of destruction if records are permanently deleted.
Healthcare records can never be archived or deleted from Salesforce — they must remain in active storage indefinitely

Discussion & Feedback