As its name implies, Salesforce Data Cloud is a data platform — before you can begin using the platform, you first need to get data into it. And while Data Cloud provides various options for importing data, it’s important to select an optimal method to integrate data sources into the platform. Unfortunately in data architecture, integrations are often implemented in a poorly considered way, which can compromise data accessibility, quality and scalability. This post in the Data Cloud Primer series describes the different ingestion methods for importing data into Data Cloud and considerations for designing robust data pipelines.
The Salesforce Interactions SDK (or Software Development Kit) enables developers to capture interactions for both known and pseudonymous website visitors and store these events in Data Cloud. In turn, this data can be used to build behavioral profiles and audience segmentation. The SDK is implemented by a client-side script on the source website that provides various methods to send event payloads to Data Cloud. The following data types can be captured in Data Cloud using the SDK:
- Profile data for an individual, like user identity, phone and email
- eCommerce data including cart and cart items, orders and product catalog entries
- Consent tracking to capture when a user provides consent (the SDK will only send events only if a customer has consented to tracking)
Engagement Mobile SDK
This SDK is formerly known as the Marketing Cloud Mobile SDK, which is used to send in-app messaging, inbox (push) messaging and location-based messaging to mobile apps using Marketing Cloud MobilePush. Similar to the Interactions SDK, the Engagement Mobile SDK has been extended to include a Data Cloud module that enables profile, eCommerce and consent data events (supported by the Interactions SDK) to be tracked in Data Cloud, in addition to mobile messaging and app-related behavioral events.
The Engagement Mobile SDK is available for both iOS and Android mobile platforms and can be used without requiring MobilePush integration.
Data Cloud includes a set of connectors that enables data from Salesforce products to be ingested into Data Cloud using a configurable interface, without requiring custom development. Platform connectors include:
- B2C Commerce Connector for importing product and customer order data.
- Marketing Cloud Connector for importing engagement data for email, SMS and mobile push events — additionally, you can import up to 20 data extensions per account (across all business units).
- Marketing Cloud Personalization Connector for importing website events including, user profiles, behavioral data (like page views) and order data — up to 5 Personalization datasets can be imported.
- Salesforce CRM for importing records from standard and custom objects from one or more Sales and Service Cloud orgs.
All connectors include ‘bundles’ which are a set of predefined data sets from the source platform that align with common use cases. Bundles not only determine source data sets and fields, but also automatically map fields to the respective object and fields to Data Cloud standard Data Model Objects (DMOs). These predefined data mappings can also be customized.
MuleSoft Anypoint Connector
MuleSoft Anypoint is an integration platform that enables API integration across different databases, SaaS platforms, storage resources, and network services, through hundreds of pre-built connectors. The MuleSoft Anypoint Connector for Data Cloud enables data to be ingested from other Anypoint Connectors, either as a streaming or bulk process. Additionally, the Connector can be used to publish insights from Data Cloud into upstream platforms.
Cloud Storage Connectors
Data Cloud supports bulk importing and exporting data from and to popular object storage services including:
- Amazon S3
- Microsoft Azure Storage
- Google Cloud Storage
These connectors are well suited for batch ingestion of voluminous datasets, as data files can be up to 200GB in size, with up to 1,000 maximum batch files for each scheduled run. Storage connectors provide a simple and convenient method for transferring data to Data Cloud on a scheduled basis, particularly for organizations that already run platforms and manage their data on these popular cloud computing services.
Secure File Transfer Protocol (or SFTP) is an industry-standard network protocol for securely transferring large data files. Data Cloud can import CSV files from SFTP servers and supports up to 4.5GB file size in a single data stream. The vast majority of enterprise platforms support exporting CSV files, which when used in combination with a file transfer process or platform, makes SFTP a ubiquitous method for bulk importing data into Data Cloud.
While there are various “out-of-the-box” connectors that enable declarative-style integration methods to Data Cloud without requiring any custom development, there are scenarios when data could be required to be programmatically loaded into the platform, either in near real-time or as a batch process. The Data Cloud Ingestion API fulfills both requirements through supporting both streaming and bulk data imports.
Using the Streaming API, developers can build a JSON formatted payload that aligns to the data schema defined in a deployed data stream. This API follows a “fire and forget” approach, where a response is immediately returned and the imported data is processed asynchronously by the platform in near real-time, approximately every 3 minutes. This API is best suited for small batches of records (not exceeding 200KB). Use cases include:
- Visitors signing up on a website that triggers a database change
- An order fulfillment platform, where an order or shipment status changes
- A website chatbot conversation that is initiated by a website visitor
- Hotel or travel purchases completed on an online booking platform
The Bulk Ingestion API allows large data sets to be created, updated or deleted in Data Cloud, where CSV files with a file size of up to 150MB (and up to 100 files per job) can be imported. This API follows a similar multi-step process to the Salesforce Bulk API, where a job is first programmatically created, then CSV data is uploaded to the job, then the job is closed and the uploaded data is enqueued for processing. This API is best suited for transferring large amounts of data at a regular interval, for example, daily or weekly. Possible use cases include:
- Daily customer transactional data from a financial service provider
- Point-of-sale data generated from in-store customer transactions
- Customer loyalty status or points balances from a loyalty management system
- Subscriber engagement data from a third-party messaging platform
Identifying an appropriate connector, protocol, SDK or API is just the first stage in designing an integration to Data Cloud. How that data is then prepared and transferred for import — referred to as a ‘data pipeline’ — is equally important, as poor pipeline architecture can undermine the integration and worse still, the integrity of your data.
Anti-patterns often surface in pipeline architectures. An anti-pattern is similar to a pattern, but while it may appear as a working solution, it’s the complete opposite of best practice. Anti-patterns typically arise when integration is done without any planning, design, or documentation. For example, a Data Cloud user may configure Amazon S3 Connector to import membership data from an S3 bucket. Data is exported from a source system to the S3 bucket, but the user is unaware how long the data export process takes and there is no validation of the exported data. The data stream runs on a predefined schedule, before the data has completed copying to the bucket — and even when the data file is available, data fails to import as required fields are missing.
When building data pipelines for Data Cloud, quality is key. It is recommended to establish processes that validate required fields and data schemas prior to file import, then report on exceptions, so they can be remediated. Additionally, the platform has Guidelines and Limits for ingesting data. Ensure that data file properties and operations fall within these defined thresholds. Also, monitor the Data Stream Refresh History for errors. There are several scenarios when data stream refresh may fail, and the Refresh History page can be used to identify and troubleshoot errors, as they occur.
If you would like to learn more about how to implement and integrate Salesforce Data Cloud in your organization following best practices, reach out today.