- How CRM Analytics stores data: the dataset model and its columnar storage architecture
- Live connections via Salesforce Direct and the SAQL vs direct SOQL query model
- Dataflow and Recipe-based ETL: importing, transforming, and joining data into datasets
- When to use live data vs imported datasets for different analytics use cases
- Data freshness guarantees: schedule-based sync limitations and near-real-time options
- External data ingestion: bringing non-Salesforce data into CRM Analytics
The CRM Analytics Data Architecture
CRM Analytics (CRMA) is a native Salesforce analytics platform that operates on a separate analytical data engine within the Salesforce platform. Its primary data storage model is the Dataset — a columnar, compressed, indexed snapshot of data imported into CRMA's analytical engine. Datasets are the foundation of CRMA dashboards and lenses; they are what makes CRMA fast at aggregations, filtering, and cross-object queries that Salesforce reports cannot efficiently handle.
Columnar storage in CRMA means data is stored column-by-column (all values for "Amount" together, all values for "StageName" together) rather than row-by-row. This is highly optimised for analytical query patterns — when a dashboard filters by StageName and aggregates Amount, CRMA reads only those two columns from storage. This physical storage optimisation is why CRMA dashboards handle millions of rows with sub-second response times that would be impossible with SOQL against the transactional Salesforce database.
CRMA supports a second data access mode: Live connections that query Salesforce data directly without importing it into CRMA datasets. Live connections use SOQL-like queries that execute against the live Salesforce org in real-time. This provides always-current data but gives up CRMA's columnar query performance — live connection queries have the same performance characteristics (and governor limits) as SOQL queries against the standard Salesforce database.
Dataflows and Recipes: Importing Data into CRMA
The primary method of getting Salesforce data into CRMA datasets is through Dataflows (the original CRMA ETL mechanism) or Recipes (the newer, more visual approach). Both tools extract data from Salesforce objects, apply transformation steps (joins, aggregations, field calculations, filters), and output the result to one or more CRMA datasets.
A typical Dataflow for a Sales Analytics dashboard might join Opportunity, Account, User, and Quota records into a single denormalised dataset that the dashboard queries with a single SAQL query rather than traversing relationships at query time. This denormalisation is intentional — CRMA's columnar engine excels at flat, wide datasets that precompute the joins that would otherwise happen at query time in SOQL.
Recipes provide a visual, drag-and-drop interface for building the same ETL logic that Dataflows implement in JSON configuration. Recipes support all the same operations — joins, aggregations, formulas, filters, buckets — with a more accessible UX. The key operational difference: Recipes run on the newer CRMA compute infrastructure and generally have better performance for complex multi-step transformations than equivalent Dataflows on large datasets.
// CRMA SAQL query against a pre-built dataset
// This queries an Opportunity dataset (denormalized, pre-joined)
q = load "opportunities_v2";
q = filter q by 'CloseDate_Year' == "2026";
q = group q by 'Account.Industry', 'StageName';
q = foreach q generate
'Account.Industry' as 'Industry',
'StageName' as 'Stage',
sum('Amount') as 'TotalAmount',
count() as 'OpportunityCount';
q = order q by 'TotalAmount' desc;
q = limit q 100;
// Fast sub-second response on 5M+ rows because of columnar storage
Live Connections: When Real-Time Beats Performance
Live connections (Salesforce Direct) query Salesforce data in real-time without any import step. A CRMA dashboard built on a live connection always reflects the current state of Salesforce — a manager dashboard showing open opportunities by rep would show changes made moments ago, whereas an imported dataset would show data from the last scheduled sync (typically several hours old).
The performance trade-off is significant. A live connection query against 500,000 Opportunity records executes as a SOQL query against the Salesforce transactional database, taking 5-30 seconds or more depending on the query complexity and org load. The same query against a CRMA dataset takes under 1 second. For dashboards with sub-second performance requirements or very large datasets, live connections are not appropriate.
Live connections are appropriate for: small datasets where query performance is acceptable (executive-level dashboards showing totals across thousands rather than millions of records), use cases where data freshness within minutes is a business requirement (real-time operations dashboards for call center managers), and data that changes too frequently to justify the overhead of a scheduled sync (hourly or more frequent changes).
Data Freshness: Schedule-Based Sync and Its Limits
The primary limitation of imported CRMA datasets is data freshness. Dataflows and Recipes run on schedules — typically nightly, every few hours, or at minimum intervals of 1 hour on most CRMA configurations. Dashboard users see data that is hours old, not minutes old. For most executive and management analytics use cases (pipeline reviews, forecasting, historical trend analysis), this latency is acceptable. For operational use cases (queue management, real-time SLA monitoring), it is not.
CRMA's incremental sync capability reduces dataset refresh time for large objects by extracting only records changed since the last sync, using the SystemModstamp field as the delta cursor. An incremental Recipe for a 10-million-record Opportunity dataset that runs hourly might only process 5,000-10,000 changed records per run — taking minutes rather than hours. Incremental sync requires careful Recipe design to handle deletes correctly (deleted records are not captured by SystemModstamp incremental queries) and to correctly merge incremental updates into the full dataset.
For near-real-time CRMA dashboards that must show data within minutes of a Salesforce change, the architecture requires a streaming ingestion path: publishing Platform Events or CDC events to an external processing layer that pushes updates to CRMA via the CRMA External Data API. This is a complex architecture that is only justified for operational dashboards with strict freshness requirements.
External Data Ingestion for Multi-Source Analytics
One of CRMA's significant advantages over standard Salesforce reports is its ability to blend Salesforce data with data from external sources. The External Data API allows non-Salesforce CSV data to be uploaded into CRMA datasets. Financial data from an ERP, product performance data from a marketing platform, web analytics from Google Analytics — all can be imported into CRMA and joined with Salesforce data in Recipes to build unified cross-system dashboards.
The architecture for external data ingestion is: extract data from the external system, transform it to CRMA's expected CSV format, and upload via the External Data API. The upload creates or updates a CRMA dataset. This dataset can then be joined with Salesforce-sourced datasets in a Recipe to produce the blended analytical dataset. The join key is typically a Salesforce Account ID or External ID that exists in both datasets.
CRMA's AI/ML capabilities (Einstein Discovery) work against any dataset — including those containing external data. A predictive model that combines Salesforce opportunity characteristics with external product usage metrics and financial health indicators from an ERP can produce significantly more accurate predictions than a model trained only on Salesforce CRM data. This multi-source dataset capability is the key architectural advantage of CRMA over standard Salesforce reporting.
Key Takeaways
- CRMA datasets use columnar storage optimised for analytical aggregations — this is why CRMA handles millions of rows with sub-second response times that SOQL against the transactional database cannot match.
- Dataflows and Recipes are CRMA's ETL mechanisms for importing Salesforce data into datasets. Denormalising related objects into a single wide dataset at import time enables efficient single-query analytics without relationship traversal at query time.
- Live connections query Salesforce data in real-time (always current) but at SOQL performance — 5-30+ seconds for large datasets. Use live connections for small datasets or high-freshness requirements; use imported datasets for performance-sensitive large-scale analytics.
- Imported datasets are typically hours old due to scheduled sync. Incremental sync via SystemModstamp reduces sync duration for large objects. Near-real-time freshness requires a streaming ingestion architecture via Platform Events and the External Data API.
- Incremental sync does not capture deletes — design Recipes to handle deleted records separately (typically via a full refresh at lower frequency alongside incremental updates).
- External Data API enables non-Salesforce data (ERP, marketing platforms, web analytics) to be joined with Salesforce data in CRMA Recipes, creating unified cross-system analytics dashboards that standard Salesforce reporting cannot produce.
Test Your Understanding
1. A call center manager needs a real-time operations dashboard showing open Case queue by priority and agent, updated within 2 minutes of any Case status change. Should this dashboard use a live connection or an imported dataset?
2. A CRMA Recipe uses incremental sync based on SystemModstamp to update the Opportunity dataset hourly. After running for 6 months, a data analyst notices that Opportunity records deleted in Salesforce are still appearing in CRMA dashboards. What is the cause?
3. A sales leader wants a CRMA dashboard showing Opportunity win rate alongside product usage metrics from a SaaS platform (not in Salesforce). What CRMA capability enables this?
Discussion & Feedback