XML Decoder Groups

The processing of XML files as log sources the definition of a decoder within a Log Processing Dataset Include file to define decoders for extracting data from the XML file.

Note: Defining XML decoder groups for XML log sources requires knowledge of the XML file's structure and contents, the data to be extracted, and the fields in which that data is stored. This section provides basic descriptions of the parameters that you can specify for decoders. The manner in which you use any decoder depends on the XML file that contains your source data.

For information about format requirements for XML log sources, see XML Log Sources. For assistance with defining XML decoders, contact Adobe.

The top level of an XML decoder is a decoder group (XMLDecoderGroup), which is a set of decoder tables that you use to extract data from an XML file of a particular format. If you have XML files of different formats, then you must define a decoder group for each format. Each decoder group consists of one or more decoder tables.

The following table describes the Tables parameter and all of the subparameters that you must specify to define an XML decoder group.

XMLDecoderGroup
Parameter Description
Tables

Each table in a decoder group represents one level of data to be extracted from the XML file. For example, if you want to extract data about visitors, then you would create a decoder table that consists of the information you want to extract for each visitor. You also can create decoder tables within decoder tables (see Children).

To add a table to a decoder group
  • Right-click Tables and click Add new > XMLDecoderTable.
Fields

The extended fields (for example, x-trackingid, x-email) in which the data is stored. The data to be stored in the field is determined by the Path and/or Operation subfields.

The Path is the field's level within the structured XML file. A field's path is relative to the path of the table in which it is defined. Examples include tag.tag.tag or tag.tag.tag.@attribute. Note that paths are case-sensitive.

An Operation is applied to each line in the specified path to produce an output. The following operations are available:
  • LAST: The field takes the value of the path's last occurrence in the XML file.
  • RANDOM: Assigns a random value to the field. This operation is useful if you need to generate a unique id, such as for the x-trackingid field.
  • INHERIT: The defined field inherits its value from the parent table's corresponding field.
  • "constant ": The constant must be enclosed in quotation marks. You can use a constant operation to check for the existence of a particular path; if the path exists, then the field is assigned the constant's value.

To add a field to a decoder table

  • Right-click Fields, then click Add new > XMLDecoderField. Define Field, Operation and Path as appropriate.
Path

The level within the structured XML file for which the decoder table contains information. For a child XML decoder table, the path is relative to the parent table's path. Note that paths are case-sensitive.

For example, if your XML file contains the structure:

<logdata>

<visitor>

 

...

 

</visitor>

</logdata> 

then the path would be logdata.visitor.

Table

The value of this parameter should always be "Log Entry."

Note: Do not change this value without consulting Adobe.
Children

Optional. One or more embedded decoder tables. Each child includes the Fields, Path, and Table parameters described above.

To add a child to a decoder table

  • Right-click Children and click Add new > XMLDecoderTable. Define Field, Operation and Path as appropriate.

To use an XML file as a log source for a dataset, XML decoder groups and tables must be defined to extract the information that is to be processed into the dataset. In this example, you can see how to define decoder groups and tables for a sample XML log source for a web dataset.

The following XML file contains information about a website visitor, including a visitor ID, email address, physical address, and information about the visitor's page views.

Since we have a single XML file, we need only one decoder group, which we name "Sample XML Format." This decoder group applies to any other XML files of the same format as this file. To begin constructing XML decoder tables within this decoder group, we must first determine what information we want to extract and the fields in which the data will be stored.

In this example, we extract information about the visitor and the page views associated with that visitor. To do this, we create a top-level (parent) XML decoder table with information about the visitor and an embedded (child) XML decoder table with information about that visitor's page views.

Information for the parent (visitor) table is as follows
  • A data type identifier for each row of data in the XML file. We use VISITOR as our identifier so that we can quickly identify rows of data pertaining to the visitor and not to the page views. We can store this value in the x-rowtype field.
  • The visitor's ID, which we store in the x-trackingid field.
  • The visitor's email address (contact.email), which we store in the x-email field.
  • The visitor's registration status. If the visitor is a registered user, then we can store the value "1" in the x-is-registered field.
  • The Path value is logdata.visitor, and the Table value is Log Entry. For information about these parameters, see the XMLDecoderGroup table above.
Information for the child (page views) table is as follows:
  • A data type identifier for each row of data in the XML file. We use "PAGEVIEW" as our identifier so that we can quickly identify rows of data pertaining to the visitor's page views and not to the visitor only. We store this value in the x-rowtype field.
  • The visitor's ID. This value is inherited from the parent table and is stored in the x-trackingid field.
  • The timestamp of each page view, which is stored in the x-event-time field.
  • The URI of each page view, which is stored in the cs-uri-stem field.
  • The Path value is pageview, and the Table value is "Log Entry." For information about these parameters, see the XMLDecoderGroup table above.

The following screen capture shows a portion of Log Processing Dataset Include file with the resulting XML decoder group for the sample XML file based on the discussed structure of the parent and child XML decoder tables.

A table showing the output of this decoder for our sample XML file looks something like the following:

x-rowtype cs--uri-stem x-email x-is-registered x-event-time x-tracking-id
VISITOR foo@bar.com 1 1
PAGEVIEW /index.html 2006-01-01 08:00:00 1
PAGEVIEW / 2006-01-01 08:00:30 1

You can create a table like the one above in data workbench by using a field viewer interface. For information about the field viewer interface, see Dataset Configuration Tools.