Configuring Data Feeds

Data feeds are enabled by Customer Care and delivered using FTP or Amazon S3.

This section provides an overview of data feed options and the one-time configuration process.

FTP File Delivery

Data feed data can be delivered to an Adobe or customer hosted FTP location.

If you select to have data uploaded to your FTP server, you must provide Adobe with the appropriate username, password, and upload path. You must implement your own process to manage disk space on the server, as Adobe does not delete any data from the server.

sFTP File Delivery

Data feed data can be delivered to an Adobe or customer hosted sFTP location.

If you select to have data uploaded to your FTP server, you must provide Adobe with the appropriate username and upload path. You must implement your own process to manage disk space on the server, as Adobe does not delete any data from the server.

Amazon S3 File Delivery

If you don't want to worry about managing disk space or encrypting your data, you can now upload your files to an Amazon S3 bucket. Amazon will automatically encrypt the data at rest (on the Amazon servers). When you download the data, it gets decrypted automatically.

If you select to have data uploaded via Amazon S3, you must provide Adobe Client Care with a Bucket name, an Access Key ID, a Secret Key and a folder name.

BucketOwnerFullControl setting for Amazon S3 data feeds

The common use case for Amazon S3 is that the Amazon Web Services (AWS) account owner creates a bucket, then creates a user that has permission to create objects in that bucket, and then provides credentials for that user. In this case, the objects of a user belongs to the same account, and the account owner implicitly has full control of the object (read, delete, etc). This is similar to how FTP delivery works.

AWS also makes it possible for a user to create objects in a bucket that belong to a completely different user account. For example, if two AWS users, userA and userB, do not belong to the same AWS account but want to create objects in other buckets. If userA creates a bucket, say bucketA, he or she can create a bucket policy that explicitly allows userB to create objects in bucketA even though the user doesn't own the bucket. This can be advantageous because it doesn't require that userA and userB to exchange credentials. Instead, userB provides userA with their account number, and userA creates a bucket policy that essentially says "let userB create objects in bucketA".

BucketOwnerFullControl provides cross-account rights to create objects in other buckets. If userB uploads an object to userA's bucket, userB still "owns" that object, and by default, userA is not granted any permissions to that object even though userA owns the bucket—objects do not inherit permissions from the parent bucket. UserB must explicitly grant userA permissions because userB is still the object's owner. For this cross-account upload, AWS provides a BucketOwnerFullControl ACL by specifying that the use of this ACL by the bucket owner (userA) and is granted full permissions to the object (read, write, delete, etc), even though the object is "owned" by userB.

Delivery Formats and Contents

Daily: Data for each day is delivered after it is processed in a single zipped file, or in multiple zipped files each containing approximately 2 GB of uncompressed data. You receive a single delivery for each day.

Hourly: Data for each hour is delivered in a single zipped file that contains all data received during that hour. You receive 24 separate deliveries for each day, with each file delivered after the data for that hour is processed.

Note: Due to the potential size of data feed zip files, make sure your ETL process uses a 64-bit zip utility.

Hourly Data Feeds

It is important to understand that the term “hourly” describes the time frame of the data that is sent with each individual data export, and not the time frame in which the delivery occurs. Hourly data feeds are processed and delivered in a best-effort fashion. However, there are several factors that can impact the delivery time of an hourly data feed including:

  • Report suite latency (i.e. unannounced spike in traffic)
  • Upstream processing
  • Peek and non-peak hours
  • Internet connection speeds

For hourly data feeds the expectation is that 95% of the time the feed will deliver within 12 hours of the close of that hour’s worth of data. Data feeds for report suites with high traffic volume may take longer to process and deliver.

Receiving an hourly data feed is different then receiving daily feed with multiple file delivery. When receiving hourly data feeds the data for each day is split into 24 files based on the data collected during that hour, and each file is delivered as soon as it is available. A daily feed that is delivered in multiple files is delivered once per day after the previous day's data is processed, and is spilt into 2GB increments based on the amount of data collected.

Data Backfills for Hourly Data Feeds

If you request data for earlier dates when setting up a new hourly data feed, data for dates more than 60 days ago might be delivered in daily format instead of hourly.

In this case, you will not receive 24 separate deliveries for these days, instead, you will receive a single delivery with a midnight timestamp that contains all of the data for that day. If you are requesting this type of backfill, Make sure your ETL is configured to process daily deliveries.

Multiple File Delivery

You can select single file or multiple file delivery when the data feed is created. When setting up a daily feed, we recommend selecting multiple file delivery, due to the significant performance increases gained when compressing and uncompressing files that are larger than 2 GB. Multiple file delivery makes it easier to process data in parallel. Data files are always split on a complete record and can be easily concatenated after extraction.

One Time Configuration Process

Task

Performed By

Description

Select data columns

Customer

Review the clickstream data columns and determine the data you would like to receive. Adobe also provides a recommended column set that can be selected.

(FTP only) Select FTP location

Purchase an S3 bucket from Amazon

Customer

Select an FTP location where Adobe should deliver data feed files. Adobe can provide FTP hosting for the files if preferred.

Contact Adobe Customer Care to configure the data feed.

Customer

Contact Customer Care through your Supported User and provide:

  • The report suite that contains the data you want in the feed.
  • The columns you want in the data set.
  • Daily or hourly data delivery. If Daily, select single file or multiple file delivery (multiple recommended).
  • (FTP only) FTP hostname, credentials, and path.
  • (Amazon S3 only) bucket name, access key ID, secret key, and folder name.

Delivery Process

Task

Performed By

Description

Data collection

Adobe

Server calls are collected and processed in Adobe Data Collection servers.

Feed generation

Adobe

After data is processed for the delivery period (previous hour or previous day), the data is exported to the data feed. The feed is stored in delimited format and compressed.

Delivery to customer

Adobe

The compressed data is transferred to either Amazon S3 or a customer-hosted or Adobe-hosted FTP site. When complete, a manifest file (or .fin file for older feeds) is transferred indicating that the delivery is complete.

Data download

Customer

The customer monitors S3 or the FTP for the manifest file. This file contains details on all files that were delivered.

Manifest file processing

Customer

The manifest file is read and each listed file is downloaded.

Data is uncompressed and processed

Customer

Downloaded files are uncompressed and processed.

After you have configured your data feed, continue to Data Feed Contents to understand what files you will receive.