Managing Large Data Volumes in Salesforce

Salesforce

EmpowerCodes

Oct 28, 2025

As enterprises scale digitally, Salesforce often becomes the central hub for customer operations, case management, and sales intelligence. But as data grows, performance challenges emerge. Companies migrating from older CRMs or importing data over many years can suddenly face millions of records across Accounts, Contacts, Opportunities, and custom objects.

Managing large data volumes in Salesforce isn’t simply about storage—it’s about maintaining platform performance, ensuring data quality, avoiding governor limits, and enabling reliable reporting. In 2025, Salesforce continues to introduce automation enhancements, Data Cloud updates, and more efficient APIs, but core data management fundamentals remain essential. This guide will help you navigate strategies, patterns, and tools to handle large data volumes intelligently.

What Counts as Large Data Volume?

Salesforce considers LDV (Large Data Volumes) when record counts exceed:

5+ million records per object
Complex relationships across objects
High transactional load (e.g., constant updates)

Even at 2–3 million records, admins can begin noticing:

Slower SOQL queries
Longer report run times
Batch Apex failures
Indexing issues

LDV is about both size and behavior.

Challenges of Large Data Volumes

When data grows, organizations face:

1. Performance Degradation

Queries slow down; users wait longer for screens to load.

2. Increased Storage Cost

Salesforce storage is expensive compared to external databases.

3. Governor Limits

Bulk DML operations can quickly hit limits.

4. Reporting Slowdowns

Dashboards timeout, requiring filters or extracts.

5. Data Quality Issues

Duplicate records multiply faster.

Understanding these challenges is the first step to solving them.

Core Strategies for Managing Large Data Volumes

Modern Salesforce implementations rely on a blend of data modeling, indexing, query optimization, and archiving. Let’s explore these approaches in detail.

1. Data Modeling Optimization

Poor schema decisions often surface when data grows.

Denormalization Might Help

Unlike traditional relational databases, Salesforce performance improves when:

Data is flattened
Lookups are minimized
Wide objects are avoided

Excessive relationships slow your queries dramatically.

Use Skinny Tables

Salesforce Support can enable skinny tables, which:

Contain frequently queried fields
Improve performance by avoiding joins

This is recommended for heavy Account or Opportunity screens.

2. Use Selective SOQL Queries

Highly selective filters drastically improve query time.

A selective query narrows down results using:

indexed fields
specific ranges
equality checks

Avoid:

!=
NOT LIKE
OR filters without indexing

2025 enhancements to indexing rules now better support formula fields, but native indexes still outperform.

3. Leverage Indexing Strategies

Indexes are your best friend at scale. Salesforce provides:

Standard Indexes on:

RecordType
OwnerId
CreatedDate
LastModifiedDate

Custom Indexes can be requested for:

Email fields
Number fields
Lookup fields

Indexed fields drastically speed up:

List views
Reports
Trigger-based queries

4. Archiving Old or Inactive Records

Not every record must live in Salesforce forever.

Archiving strategies include:

Moving data to Heroku Postgres
Offloading to external data lakes
Using Big Objects for history storage

Archival candidates:

Closed Opportunities older than 5 years
Cases older than 3 years
Completed Tasks/Events

Archiving reduces load and improves responsiveness.

5. Use Big Objects for Historical Data

Big Objects are built for massive scale:

Billions of records
Append-only storage
Optimized querying via indexes

Perfect for:

IoT event history
Order logs
Case audits
Platform event retention

Big Objects are cost-friendly compared to standard storage.

6. Leverage Salesforce Data Cloud

With Salesforce investing heavily in Data Cloud (formerly CDP), 2025 offers deeper integration with:

Unified customer profiles
Event streaming
External data querying

Salesforce Data Cloud can:

Store huge datasets
Query externally without import
Stream engagement signals

This removes the need to push everything into your core org.

7. Optimize Reports & Dashboards

Reports struggle with LDV when:

Filters are broad
Custom summary formulas are heavy

Best practices:

Always filter by indexed fields
Avoid cross-object tracking when possible
Use bucket fields sparingly

Scheduling reports off-peak hours helps reduce load.

8. Batch Apex is Essential

When processing millions of records, use:

Batch Apex for async execution
QueryLocator if record count unknown
Database.Stateful for progress tracking

Batch Apex avoids timeouts while respecting governor limits.

In 2025, Salesforce improved observability on batches, making debugging easier.

9. Partitioning Data

If records are region-based or year-based:

Partition by territory
Partition by fiscal year
Use separate list views per segment

Filtering improves both indexing and caching.

10. Use External Objects

Rather than importing everything, you can reference external systems:

External Objects (via OData)
External data search
Salesforce Connect

This avoids storage costs and governor limit issues.

Great for:

Order histories
Invoice archives
Shipping logs

The application layer queries only what's needed.

11. Avoid Trigger Bottlenecks

Triggers on high-volume objects must follow strict patterns:

No SOQL inside loops
No DML inside loops
Bulkify everything
Queue chained operations

Use Platform Events or Async Apex for multi-step processes.

12. Deduplication is Critical

Duplicates amplify performance problems.

Tools like:

Duplicate Management Rules
DemandTools
Data Loader IO Dedup

Can automatically merge or block repeats.

Clean data = faster queries.

13. Implement Data Retention Policies

Data retention isn’t just technical—it’s compliance-driven.

Policies define:

What data stays
When data gets archived
Legal holding periods

Documenting retention prevents performance surprises.

14. Use Robotics for Bulk Loading

When importing data:

Disable validation rules temporarily
Turn off workflow notifications
Pause Process Builder flows

Better approach:

Use APIs optimized for bulk ( Bulk API v2 )
Break files into smaller chunks

Loading data intelligently preserves performance.

15. Monitoring Tools

Use tools like:

Salesforce Event Monitoring
Optimizer App
Debug Logs
Query Plan Tool

The Query Plan Tool shows:

Selectivity
Query cost
Index usage

Essential for LDV tuning.

Governor Limits to Watch Closely

When dealing with huge datasets:

CPU time limit
Heap size limit
SOQL query limits
DML row limits

Batch Apex and Queueable Apex are built to work within these constraints.

2025 Trends in Large Data Volume Management

Emerging trends include:

AI-Assisted Data Alerts

Einstein now flags low-performing queries automatically.

Metadata-Driven Partitioning

Tools suggest the best partition keys.

Event-Driven Offloading

CDC triggers archival automations.

Data Cloud Federation

Query remote warehouses instantly.

These trends reduce reliance on the core org’s storage.

When to Consider a Second Salesforce Org

Some enterprise cases require:

Regional org separation
Performance isolation
Data privacy segmentation

Multi-org strategies must include:

MDM governance
Integration layer
Identity management

Common Mistakes to Avoid

Storing logs inside Salesforce
Querying without filters
Ignoring indexes
Allowing multiple triggers per object
Not archiving closed records
Building large attachments inside the org

These mistakes create scalability nightmares.

Conclusion

Managing large data volumes in Salesforce is a balancing act between performance, storage cost, and data quality. With the right combination of indexing strategies, selective filtering, archival policies, Big Objects, Batch Apex, and Data Cloud integrations, you can operate with massive datasets while preserving user experience.

As data-driven decision-making accelerates in 2025, Salesforce’s evolving platform—combined with intelligent architectural choices—ensures your org scales gracefully without sacrificing speed or reliability.

Organizations that invest early in LDV planning experience fewer surprises, faster query execution, and more stable automations. In the end, it’s not about how much data you store—it’s about how efficiently you can use it.