Data Feed Contents

This section describes the files found in a data feed delivery.

Manifest File

The manifest file contains the following details about each file that is part of the uploaded data set:

  • file name
  • file size
  • MD5 hash
  • number of records contained in the file

The manifest file follows the same format as a Java JAR manifest file.

The manifest file is always delivered last as a separate .txt file, so that its existence indicates that the complete data set for that request period has already been delivered. Manifest files are named according to the following:

<report_suite_id>_YYYY_MM_DD.txt

A typical manifest file contains data similar to the following:

Datafeed-Manifest-Version: 1.0
	Lookup-Files: 1
	Data-Files: 1
	Total-Records: 611

	Lookup-File: bugzilla_2012-09-09-lookup_data.tar.gz
	MD5-Digest: af6de42d8b945d4ec1cf28360085308
	File-Size: 63750

	Data-File: 01-bugzilla_2012-09-09.tsv.gz
	MD5-Digest: 9c70bf783cb3d0095a4836904b72c991
	File-Size: 122534
	Record-Count: 611

Every manifest file contains a header, indicating the total number of lookup files, data files, and total number of records in all data files. This header is followed by multiple sections containing information for each file included in the data feed delivery.

Some feeds are configured to receive a rsid_YYYY-MM-DD.fin file instead of a .txt manifest. The .fin indicates that the upload is complete, but it contains no metadata about the upload.

Lookup Files

Lookup files do not contain hit data, these are supplemental files that provide the column headers for the hit data, and lookup files to translate the IDs found in the data feed to actual values. For example, a value of "497" in the browser column indicates that the hit came from "Microsoft Internet Explorer 8".

Note that the column_headers.tsv and event_list.tsv are specific to the data feed and report suite. Other files, such as browser.tsv, are generic.

The lookup files are delivered together in a compressed zip named according to the following:

<report_suite_id>_<YYYY-mm-dd>-<HHMMSS>-lookup_data.<compression_suffix>
  • column_headers.tsv (customized for this data feed)
  • browser.tsv
  • browser_type.tsv
  • color_depth.tsv
  • connection_type.tsv
  • country.tsv
  • javascript_version.tsv
  • languages.tsv
  • operating_systems.tsv
  • plugins.tsv
  • resolution.tsv
  • referrer_type.tsv
  • search_engines.tsv
  • event_lookup.tsv (customized for this data feed)

For hourly delivery, lookup files are delivered only with the data for the first hour of each day.

Hit Data Files

Hit data is provided in a hit_data.tsv file. The amount of data in this file is determined by the delivery format (hourly or daily, and single or multiple files). This file contains only hit data. The column headers are delivered separately with the lookup files. Each row in this file contains a single server call.

Delivery Contents

Note: The files are encoded using ISO-8859-1.

The actual files delivered by Adobe vary based on the type of data feed that you have configured. Find the configuration that matches your data feed in the following table for a description of the delivered files.

The time (HHMMSS) indicated in a file name always indicates the beginning of the date range for the data in the file, regardless of when the file was produced or uploaded.

Delivery Format Description
Daily, single file

After data is collected for a day, you will receive a delivery that contains the following:

  • a single compressed data file.
  • A manifest file.

The data file is delivered with the following name:

<report_suite>_<YYYY-mm-dd>.<compression_suffix>

Where <compression_suffix> is either tar.gz or zip.

When extracted, the data file contains a single hit_data.tsv file with all data for that day, as well as the compressed lookup files described above.

The hit data file size varies greatly depending on the number of variables actively used and amount of traffic on the report suite. However, on average, a row of data is approximately 500B (compressed) or 2KB (uncompressed). Multiplying this by the number of server calls can provide a rough estimate on how large a data feed file will be.

Daily, multiple file

After data is collected for a day, you will receive a delivery that contains the following:

  • One or more compressed data files, broken into 2 GB chunks.
  • A manifest file.

Each data file is delivered with the following name:

<index>-<report_suite>_<YYYY-mm-dd>.<compression_suffix>

Where <index> is an incrementing file index from 1 to n, given n files, and <compression_suffix> is either tar.gz or zip.

When extracted, each data file contains a single hit_data.tsv that contains approximately 2 GB of uncompressed data, as well as the compressed lookup files described above.

Hourly, single file

After data is collected for an hour, you will receive a delivery that contains the following:

  • a single data file.
  • A manifest file.

The data file is delivered with the following name:

<report_suite>_<YYYY-mm-dd>-<HHMMSS>.<compression_suffix>

Where <compression_suffix> is either tar.gz or zip.

When extracted, the data file contains a single hit_data.tsv file with all data for that hour. The compressed lookup files described above are delivered only with the data for the first hour of each day.

Hourly, multiple file

After data is collected for an hour, you will receive a delivery that contains the following:

  • One or more compressed data files, broken into 2 GB chunks.
  • A manifest file.

Each data file is delivered with the following name:

<index>-<report_suite>_<YYYY-mm-dd>-<HHMMSS>.tsv.<compression_suffix>

Where <index> is an incrementing file index from 1 to n, given n files, and <compression_suffix> is either gz or zip

When extracted, each data file contains a single hit_data.tsv that contains approximately 2 GB of uncompressed data. The compressed lookup files described above are delivered only with the data for the first hour of each day.