Salesforce Data Cloud Primer

As part of a Salesforce Data Cloud series, this initial post sets out to clarify and explain exactly what Data Cloud actually is, what it’s not, and why it’s a game changer.

Contents

An Evolutionary Platform
How Salesforce Data Cloud Works
Data Lake
Data Warehouse
Data Lakehouse
CRM
MDM
System of Record
Data Cloud: Next Generation Architecture

An Evolutionary Platform

Salesforce Data Cloud is arguably the most significant platform to be released in the history of Salesforce. However, its journey to market has been more evolutionary than revolutionary. Salesforce started on this journey over five years ago, when they first announced Customer 360 at Salesforce Connections in June 2019. Then at Dreamforce in the same year, the platform was renamed Customer 360 Audiences, which was then renamed to Salesforce CDP in May 2021, then Genie at Dreamforce in 2022, before settling on Data Cloud in February this year. It’s been quite hard to keep up.

Salesforce Data Cloud Timeline

How Data Cloud Works

Data Cloud is a purpose-built platform that enables organizations to connect all their customer data, at scale. Whether that data resides in a mobile app, data lake, data warehouse, Salesforce platform, or is collected from user interactions on your website, Data Cloud provides the ability to ingest data from multiple sources, either as a batch process or in real-time, then harmonizes the data into a structured, canonical data model and applies reconciliation rules to unify individual records to single profile that adapts to a customers’ activity and behavior.

Then you can analyze, segment and activate this data, automating customer experiences across the complete suite of Salesforce Customer 360 products. Whether it’s injecting a Contact into a Marketing Cloud journey or converting a Lead in Sales Cloud, or creating a case in Service Cloud — Data Cloud can fulfill marketing, sales and service automation use cases.

How Salesforce Data Cloud Works

Some organizations assume that they don’t need Data Cloud, as they already use data platforms. Perhaps they’re already storing customer data in a data lake, and using a data warehouse for business intelligence activities. But it’s important to understand that Data Cloud is actually neither of those platforms, but both of them.

Data Lake

A data lake provides a convenient repository to store data quickly, where you can deposit raw copies of structured, unstructured or semi-structured data files, without needing to perform data modeling at time of ingestion. But the problem with data lakes is they can quickly become data swamps, flooded with outdated and incomplete data. The net result is that it can be hard to extract data from data lakes, which aren’t optimized for querying at scale.

Data Warehouse

On the other hand, a data warehouse, also referred to as ‘Enterprise Data Warehouse’ or ‘EDW’, provides a central data repository which is specifically optimized for analytics and reporting purposes. Data warehouses provide fast and efficient data analysis, while also enhancing data quality through data cleansing, data organization and data governance processes. Data warehouses tend to be more performant than data lakes, but they can be more expensive and limited in their ability to scale. Additionally, they can form data silos, which means they are often incompatible with other data sets. That makes it hard for users in other parts of an organization to access and use the data.

Data Lakehouse

Data Cloud, however, is built on a data lakehouse. A data lakehouse is a data platform, which merges the benefits of data lakes and data warehouses into a single data architecture, so data teams can accelerate their data processing as they no longer need to straddle two disparate data systems to complete and scale advanced analytics, like machine learning.

CRM

But what about a CRM? After all, isn’t that a data platform? Well, yes, it stores customer data — but that’s where the similarities end. CRMs are used for managing customer relationships and sales engagements, pipelines, customer interactions, business transactions and facilitating sales and service processes. And by design, a CRM is built for storing known customer data. If they’re unknown, then they simply don’t exist. Also, traditionally, CRM platforms store data in a transactional database that’s optimized for data integrity. But to use this data at scale, for tasks like analytics or machine learning, it’s necessary to copy this data to another system to process it.

MDM

Data Cloud isn’t a Master Data Management (or MDM) platform either. MDM platforms are enterprise software products that create and manage a central, persisted system of record for master data, through a semantic reconciliation process. While Data Cloud provides data normalization, it doesn’t provide a golden record and — rather it creates a unified customer profile that changes and adapts based on an individual’s activity.

System of Record

Data Cloud is not a substitute for a CRM, MDM or any other platform, and it’s never the first touchpoint in a data lifecycle. You still need a platform (or platforms) that generate a system of record (or unique identifier) to register that first entry point for your data — whether it’s an order, support case, or customer record. Once this identifier has been established, then your data can be ingested into Data Cloud, which in turn provides a fabric layer that orchestrates all your data from different sources. And unlike transactional data stores, records in Data Cloud are fluid and designed to provide that moment-in-time insight into an individual’s profile, their intent and behavior — all of which can change at any time.

Data Cloud: Next Generation Architecture

Salesforce has built one of the first — and arguably the most successful cloud CRM platforms of all time. At its core, Salesforce uses a transactional data store that follows a single logical operation sequence to provide atomicity of record operations. This approach ensures that the database can cancel, or undo, a transaction or operation that is not completed appropriately. While transactional data databases provide a high degree of data integrity, the downside is that these databases are designed for processing transactions, not analysis or transformations. In short, they don’t process or scale big data well.

Additionally, Salesforce has been a serial acquirer for almost two decades. Platforms like ExactTarget (now Marketing Cloud Engagement), Pardot (Marketing Cloud Account Engagement) and Demandware (Commerce Cloud), to mention just a few, all use different database architectures and platforms. While these platforms have effectively been integrated into Sales and Service Cloud, integration isn’t seamless and data has to be replicated across platforms.

Data Cloud addresses both of these paradigms though decoupling existing Salesforce platforms and harmonizing data into a normalized, canonical data model where users can run analysis and predictions across the enterprise on a highly scalable microservices architecture that enables thousands of requests per second — storing billions of profiles, while also providing a petabyte-scale analytics environment.

Data Cloud is set to form the foundation of the next generation cloud architecture for Salesforce and is set to be a game changer for both Salesforce and their customers.

You may be interested in

Looking Behind the Curtain of Einstein AI: How Salesforce’s AI Powers Smarter Decisions

Have you ever wondered what’s happening behind the scenes when you see Einstein AI recommendations, predictions, or automated actions? This post unpacks what’s happening under the hood—no complex mathematics, just clear, intuitive insights into the algorithms at play that help businesses make smarter, data-informed decisions. What Is Salesforce Einstein AI? Think of Einstein AI as […]

Read More

Everything you need to know about Salesforce Data Cloud Credits

Everything You Need to Know About Salesforce Data Cloud Credits

Credits power usage in Salesforce Data Cloud — every time you perform an action in Data Cloud, credits are consumed. Credit consumption is based on different actions performed in the platform, where the actual consumption rate is based on the platform feature, the complexity of that feature, and ultimately the underlying compute cost of the […]

Read More

Sign up for the latest tips & news from CloudKettle

Thank you for subscribing.