Outbound Hadoop Sequence Files

Export data from Audience Manager into your own Hadoop instance using a native binary Hadoop Sequence File format (SEQ).

Advantages

In addition to text files, the Hadoop ecosystem provides support for Hadoop Sequence Files (SEQ). Hadoop SEQ files are flat file structures which consist of serialized key-value pairs, in a binary format. The advantages of using binary SEQ files over text files are:

  • Binary SEQ files are more compact than text files.
  • Binary SEQ files can be split and processed in parallel.
  • In text files, each line needs to be parsed. Binary SEQ files do not have this limitation.

File Name Format

File Name Elements

Outbound binary Hadoop SEQ file names contain the following required and optional elements:

Note: Note: The style elements (monospace text, italics, brackets [ ] ( ), etc.) in this document indicate code elements and options. See Style Conventions for Code and Text Elements for more information.

SYNC-TYPE_DID_MASTER-DPID_[PID-ALIAS]_SYNC-MODE_TIMESTAMP[-SPLIT_NUMBER].sync.seq

File Name Element Description

SYNC-TYPE

Refers to the data transfer methods. Transfer methods include:

  • FTP - Transfer using SFTP
  • Amazon S3 - Transfer to Amazon AWS

DID

Destination ID.

In Audience Manager, a destination is the instance of the integration where you can map your targetable segments. Customers can have multiple destinations, depending on the business requirement.

MASTER-DPID

Data-provider or data source ID. This ID identifies the type of User ID present in the file content. Most common User ID keys are:

  • 20914 - Google Advertiser ID (raw, unhashed)
  • 20915 - Apple ID for Advertisers (raw, unhashed)
  • Vendor ID - 3rd party user IDs (web/cookie)

PID-ALIAS

Optional. The customer identifier from the 3rd party platform.

SYNC-MODE

Sync mode is a macro placeholder that adds a label to the file name based on synchronization type. Synchronization types include full and incremental. They'll appear in the file name as iter or full.

  • iter: Indicates an "iterative" or incremental synchronization. An incremental file contains only new data collected since the last synchronization..
  • full: Indicates a "full" synchronization. A fully synchronized file contains old data and any new data collected since the last synchronization.

TIMESTAMP

A 13-digit UNIX timestamp in milliseconds, in the UTC time zone.

[-SPLIT_NUMBER]

Optional. An integer. Identifies part of a file that's been split into multiple parts to improve processing times. The number indicates which part of the original file the data belongs to.

The original file will not have any split number. The first split file will start with 1. See examples below.

.seq

Identifies the file as a Hadoop Sequence File

File Name Examples

Files sent over to Amazon S3 location, with PID-ALIAS="XYZCustomer" and with Google Advertiser IDs in the file content.

File Type Example
Incremental
  • S3_1234_20914_XYZCustomer_iter_1486140844000.sync.seq
  • S3_1234_20914_XYZCustomer_iter_1486140844000-1.sync.seq
  • S3_1234_20914_XYZCustomer_iter_1486140844000-10.sync.seq
Full
  • S3_1234_20914_XYZCustomer_full_1486140844000.sync.seq
  • S3_1234_20914_XYZCustomer_full_1486140844000-1.sync.seq
  • S3_1234_20914_XYZCustomer_full_1486140844000-10.sync.seq

File Contents: Sample Line and Parameters

This section describes the fields, syntax, and conventions used to organize information in a Hadoop Sequence File.

Example: Basic File Format

A properly formatted line in a SEQ file could look similar to the sample below. This file entry indicates that user 00131685864660975100567715905662003423 qualified for segments 872123, 856456 and 853789 at the time expressed in UNIX timestamp 1491187665. The \N fields represent empty placeholders and have no significance for the file transfer.

00131685864660975100567715905662003423    \N    et:outbound    \N    872123,856456,853789
00131685864660975100567715905662008143    d_mid:00131685864660975100567715905662003423    \N   
\N    \N    \N    \N    \N    \N    \N        \N    \N    \N    1491187665    1491187665    
1491187665    \N    0    \N    \\N    00131685864660975100567715905662003423    0

Start using SEQ files

There is no UI control in the Audience Manager interface to enable Outbound SEQ file transfers. Talk to your Audience Manager consultant or Customer Care to set up Outbound SEQ file transfers.