- Examine the structural differences between Data Storage and File Storage allocations.
- Understand the financial mechanics of Salesforce storage overage blocks and billing tags.
- Analyse relational archiving patterns using Heroku Connect and AWS S3 architectures.
- Deploy production-ready Batch Apex pruning logic with complete recycle bin purging.
- Conduct a multi-year TCO financial analysis to evaluate native storage vs off-platform archiving.
The Economics of Salesforce Storage: Data vs File Allocations
For data architects and CFOs alike, managing the storage footprint within Salesforce represents a critical element of platform governance. Salesforce divides storage allocations into two distinct categories: **Data Storage** and **File Storage**. Each operates under different scaling parameters, technical structures, and commercial terms. Understanding the boundaries between these allocations is vital for ensuring system performance and preventing unexpected overage costs.
Data Storage is used for structured database records—standard and custom objects (e.g., Accounts, Contacts, Leads, custom SObjects, and Task records). Salesforce calculates data storage usage based on a flat allocation of **2 KB per record**, regardless of the number of custom fields populated or the physical byte size on disk. The only exceptions are specific high-volume system records, such as Article types (4 KB) and Campaign Members (250 bytes). This 2 KB logical reservation acts as an index allocation limit rather than a physical storage measurement, meaning that even a sparse record containing only a single ID still consumes 2 KB of your entitlement.
File Storage, on the other hand, is dedicated to unstructured binary files—attachments, documents, images, and ContentVersion records. Unlike Data Storage, File Storage is calculated based on the actual physical byte size of the uploaded files. An organisation's default File Storage allocation is significantly larger and far cheaper than its Data Storage counterpart, reflecting the lower infrastructure cost of hosting static files in raw cloud storage vs low-latency relational database tables.
The Storage Allocation Discrepancy: By default, an Enterprise or Unlimited Edition Salesforce org is provisioned with a baseline of 10 GB of Data Storage and 10 GB of File Storage. As your user count scales, Data Storage increases by 20 MB or 120 MB per user (depending on the licence type), while File Storage scales by a generous 2 GB per user. Because structured relational storage is heavily optimised for low-latency queries across millions of rows, Salesforce places a premium price on extra Data Storage capacity.
This dynamic creates a significant commercial risk for high-transaction environments. A business processing millions of automated integration records, system log records, or temporary staging records can quickly consume its 10 GB Data Storage baseline, resulting in severe overage exposure while their File Storage capacity remains largely unused.
Storage Overage Billing Mechanics: Understanding the High-Cost Overage Tax
When an organisation exceeds 100% of its contractual Data Storage allocation, Salesforce does not immediately halt platform operations. Relational write operations continue to execute, preventing business disruption. However, the system administrator will receive automated warning alerts, and the tenant will be flagged within Salesforce's internal billing system. This triggers a formal compliance audit, and the Salesforce Account Executive (AE) will issue an overage invoice or require the purchase of additional storage blocks.
The cost of native Salesforce Data Storage represents one of the highest markups in the enterprise software industry. Standard list pricing for Salesforce Data Storage is approximately **£250 per month for a 500 MB block**. This translates to a staggering £3,000 per year for a mere 500 MB of relational storage. For high-volume orgs requiring an extra 50 GB of native storage, the annual overage cost can quickly escalate to **£300,000 at full list price**.
In contrast, extra File Storage is relatively inexpensive, priced at roughly £5 per month for a 10 GB block. This massive price difference is often referred to by IT directors as the "Salesforce Storage Tax", and it highlights the financial necessity of off-platform data archiving strategies.
Let us examine a typical multi-year data-growth trajectory and cost profile for a fast-scaling enterprise organisation that relies solely on native Salesforce storage expansion:
| Contract Year | Projected Active Records | Calculated Storage Required | Contractual Entitlement | Overage Deficit | Annual Storage Cost (Native List Price) |
|---|---|---|---|---|---|
| Year 1 | 5,000,000 records | 10 GB | 10 GB baseline | 0 GB | £0 |
| Year 2 | 15,000,000 records | 30 GB | 10 GB baseline | 20 GB (40 blocks) | £120,000 |
| Year 3 | 35,000,000 records | 70 GB | 10 GB baseline | 60 GB (120 blocks) | £360,000 |
| Year 4 | 75,000,000 records | 150 GB | 10 GB baseline | 140 GB (280 blocks) | £840,000 |
As this financial trajectory demonstrates, a standard transactional org that grows to 75 million records over four years faces an annual storage liability of £840,000 if it relies entirely on native storage expansion. While negotiations and volume discounts can reduce this list price by 30% to 50%, the baseline expense remains highly inefficient compared to modern cloud storage costs. Active commercial planning is required to mitigate this risk through technical archiving architectures.
Archiving Patterns: Implementing Off-Platform Data Storage Architectures (Heroku, AWS)
To bypass the high cost of native storage, enterprise architects must design robust off-platform archiving patterns. The goal is to move cold data (historical records no longer required for daily operations but needed for compliance or analytical reporting) out of the Salesforce production database while maintaining seamless accessibility for business users.
The two primary patterns for off-platform archiving are **Relational Data Archiving via Heroku Connect** and **Unstructured File Archiving via AWS S3 and Files Connect**.
Pattern A: Relational Data Archiving using Heroku Connect & Heroku Postgres
Heroku Connect provides bidirectional data synchronisation between Salesforce and a Heroku Postgres database. By configuring a synchronisation mapping, historical records are automatically mirrored to the Postgres database. Once synchronised, an automated scheduled Apex job inside Salesforce can purge those records from the core platform database, instantly reclaiming the 2 KB data storage quota per record. Business users can still view the archived records inside Salesforce using **Salesforce Connect and External Objects**, which query the Heroku Postgres database in real-time via OData endpoints.
Pattern B: File Archiving via Amazon S3 and Salesforce Files Connect
For organisations processing massive volumes of customer attachments, storing those files as native `ContentVersion` records is highly inefficient. Instead, files are uploaded directly to an **Amazon S3** bucket using standard AWS REST APIs or custom LWC upload components. Salesforce **Files Connect** is then configured to expose those S3 files as external file links inside the Salesforce UI. This pattern allows the organisation to utilise S3's incredibly cheap storage rates (£0.02 per GB per month) while maintaining user access within the Salesforce platform.
Here is an architectural overview of these two primary off-platform archiving flows:
+-----------------------------------------------------------------+
| Salesforce Core Platform |
| |
| +------------------+ +-----------------+ |
| | Active SObjects | | ContentVersion | |
| | (Account, etc) | | (Attachments) | |
| +------------------+ +-----------------+ |
+-------------|--------------------------------------|------------+
| |
| [Sync via Heroku Connect] | [Upload to S3 via API]
v v
+---------------+ +---------------+
| Heroku Postgres| | Amazon S3 |
| (Archived | | (Unstructured |
| Relational) | | Files) |
+---------------+ +---------------+
| |
+<====================================>+
[Files Connect Reference]
By deploying these architectures, organisations can maintain a lean Salesforce database footprint, ensuring optimal query performance, faster sandbox refreshes, and complete protection against overage billing charges.
Building an Automated Data Pruning and Lifecycle Policy
To successfully operate an off-platform archiving system, administrators must implement an automated data pruning and lifecycle policy. Simply deleting records from the Salesforce UI is insufficient. When records are deleted, they are placed in the Salesforce Recycle Bin, where they **continue to count against your Data Storage limits** for up to 15 days, or until the recycle bin is physically emptied.
A production-ready data lifecycle policy requires an automated Batch Apex class that identifies records older than a specific threshold (e.g., Tasks or Cases older than 36 months), deletes them in manageable batches to avoid governor limits, and explicitly purges them from the recycle bin using `Database.emptyRecycleBin()` to immediately reclaim the physical storage quota.
Here is a complete, production-grade Apex Batch Class designed to automate data pruning and storage reclamation:
global class DataLifecyclePruner implements Database.Batchable, Database.Stateful {
private Integer totalPurgedRecords = 0;
global Database.QueryLocator start(Database.BatchableContext bc) {
// Define a threshold date (e.g., closed tasks older than 3 years)
Datetime archivingThreshold = System.now().addMonths(-36);
return Database.getQueryLocator([
SELECT Id FROM Task
WHERE IsClosed = true
AND CreatedDate < :archivingThreshold
]);
}
global void execute(Database.BatchableContext bc, List scope) {
if (!scope.isEmpty()) {
Integer batchSize = scope.size();
totalPurgedRecords += batchSize;
try {
// Step 1: Execute standard database delete
delete scope;
// Step 2: Empty the Recycle Bin immediately to release storage allocation
Database.emptyRecycleBin(scope);
System.debug('Successfully pruned and purged ' + batchSize + ' task records.');
} catch (DmlException ex) {
System.debug(LoggingLevel.ERROR, 'DML failure during pruning batch: ' + ex.getMessage());
}
}
}
global void finish(Database.BatchableContext bc) {
System.debug('DataLifecyclePruner completed execution.');
System.debug('Total tasks permanently purged: ' + totalPurgedRecords);
// Optional: Send compliance confirmation email to the Salesforce Admin distribution group
notifyComplianceTeam(totalPurgedRecords);
}
private void notifyComplianceTeam(Integer purgedCount) {
Messaging.SingleEmailMessage mail = new Messaging.SingleEmailMessage();
mail.setToAddresses(new String[] {'compliance-officer@organisation.com'});
mail.setSubject('Salesforce Data Pruning Automation Report');
mail.setPlainTextBody('The automated DataLifecyclePruner batch job completed successfully.\n\n' +
'Total Task records permanently deleted and purged from the Recycle Bin: ' + purgedCount + '\n' +
'Physical storage reclaimed: ' + (purgedCount * 2) + ' KB.\n\n' +
'This run complies with the corporate data retention policy.');
if (!Test.isRunningTest()) {
Messaging.sendEmail(new Messaging.SingleEmailMessage[] { mail });
}
}
}
To run this class automatically every Saturday night at midnight, schedule it using the following system scheduled Apex script in the Developer Console:
// Schedule pruning execution for every Sunday at 12:00 AM
String cronPattern = '0 0 0 ? * SUN';
System.schedule('Weekly Storage Data Pruning', cronPattern, new DataLifecyclePruner());
By enforcing this scheduled pruning logic, the organisation ensures that transactional data volume does not accumulate unchecked, keeping the core database footprint stable and compliant with standard licensing parameters.
Cost-Benefit Analysis: Native Salesforce Storage expansion vs Off-Platform Archiving Systems
Before initiating any archiving project, data architects and CFOs must conduct a rigorous financial Total Cost of Ownership (TCO) analysis. While off-platform archiving architectures (Heroku, AWS) eliminate the "Salesforce Storage Tax", they introduce upfront developer costs, infrastructure maintenance costs, and integration subscription fees.
To evaluate these options fairly, we must model a 3-year TCO comparison for a high-volume Salesforce instance growing by **50 GB of relational data per year**.
Option A: Expand Native Salesforce Storage
- Requires purchasing native 500 MB storage blocks at £250/month.
- To support a 50 GB expansion, the organisation must purchase 100 storage blocks.
- Annual Cost: 100 blocks * £3,000/block = £300,000.
- Three-Year Cumulative Cost: **£900,000** (assuming a flat 50 GB overage profile).
Option B: Implement an Off-Platform Archiving System (Heroku Connect & AWS S3)
- Heroku Connect sync and Postgres database subscription: £30,000 per year.
- Amazon S3 storage and Files Connect API fees: £2,400 per year.
- Upfront implementation cost (50 developer hours at £150/hour): £7,500.
- Annual maintenance and administration hours: £5,000 per year.
- Three-Year Cumulative Cost: **£114,700**.
Let us examine the comprehensive financial cost-benefit comparison over a three-year period:
| Cost Category | Option A: Native Storage Expansion | Option B: Off-Platform Archiving System | Net Savings via Archiving |
|---|---|---|---|
| Upfront Build / Integration Cost | £0 | £7,500 | -£7,500 |
| Year 1 Subscription / Hosting Fees | £300,000 | £32,400 | £267,600 |
| Year 2 Subscription / Hosting Fees | £300,000 | £37,400 (inc maintenance) | £262,600 |
| Year 3 Subscription / Hosting Fees | £300,000 | £37,400 (inc maintenance) | £262,600 |
| Three-Year Cumulative TCO | £900,000 | £114,700 | £785,300 |
The financial conclusion is clear. While native Salesforce storage expansion requires zero technical setup effort, the cumulative overage costs represent a massive waste of corporate capital. By investing £7,500 in custom Apex code and deploying a Heroku-based archiving architecture, the organisation secures **over £785,000 in net savings** over three years. For any enterprise operating a high-volume Salesforce org, designing and executing a proactive off-platform data lifecycle policy is not just an architectural best practice; it is a critical commercial necessity.
Key Takeaways
- Salesforce calculates structured Data Storage at a flat 2 KB logical allocation per record, regardless of actual field volumes.
- Extra native Data Storage is highly expensive, costing £250/month per 500 MB block, whereas File Storage is comparatively cheap.
- Relational archiving using Heroku Connect and Salesforce Connect exposes off-platform data seamlessly via External Objects.
- Organisations must build scheduled Apex batch jobs with explicit emptyRecycleBin calls to permanently reclaim storage space.
- A comprehensive three-year TCO analysis shows that off-platform archiving architectures can yield over £785,000 in net savings.
Checkpoint: Test Your Understanding
1. How does Salesforce calculate Data Storage consumption for standard custom object records?
2. Why is running a standard delete command in Apex insufficient to immediately reclaim your Data Storage quota?
3. Which off-platform archiving pattern is recommended to move relational data while keeping it visible inside Salesforce?
Discussion & Feedback