Log Files

Information about integrating event data from flat files that are not .vsl files.

The file containing the event data must meet the following requirements:
  • Each event data record in the file must be represented by one line.
  • The fields within a record must be separated, whether empty or not, by an ASCII delimiter. The data workbench server does not require you to use a specific delimiter. You may use any character that is not a line-ending character and does not appear anywhere within the event data itself.
  • Each record in the file must contain:
    • A tracking ID
    • A time stamp
  • To specify start and end times for data processing, each file name must be of the form:
    • YYYYMMDD-SOURCE.log

    where YYYYMMDD is the Greenwich Mean Time (GMT) day of all of the data in the file, and SOURCE is a variable identifying the source of the data contained in the file.

    Note: Please contact Adobe Consulting Services for a review of the log files that you plan to incorporate into the dataset.

Parameters

For log files log sources, the parameters in the following table are available.

Note: The processing of log file log sources requires additional parameters that are defined in a Log Processing Dataset Include file, which contains a subset of the parameters included in a Log Processing.cfg file as well as special parameters for defining decoders for extracting data from the log file. For information about defining decoders for log file log sources, see Text File Decoder Groups.
Log Processing.cfg: Log Files
Parameter Description
Name The identifier for the log file source.
Log Paths

The directories where the log files are stored. The default location is the Logs directory. A relative path refers to the installation directory of the data workbench server.

You can use wildcard characters to specify which log files to process:
  • * matches any number of characters.
  • ? matches a single character.

For example, the log path Logs\*.log matches any file in the Logs directory ending in .log.

If you want to search all subdirectories of the specified path, then you must set the Recursive parameter to true.

If the files are to be read from a data workbench server's File Server Unit, then you must enter the appropriate URI(s) in the Log Paths parameter. For example, the URI/Logs/*.log matches any .log file in the Logs directory. See Configuring a Data Workbench Server File Server Unit.

Log Server Information (Address, Name, Port, and so on) necessary to connect to a file server. If there is an entry in the Log Server parameter, the Log Paths are interpreted as URIs. Otherwise, they are interpreted as local paths. See Configuring a Data Workbench Server File Server Unit.
Compressed True or false. This value should be set to true if the log files to be read by the data workbench server are compressed gzip files.
Decoder Group The name of the text file decoder group to be applied to the log file log source. This name must match exactly the name of the corresponding text file decoder group specified in the Log Processing Dataset Include file. See Text File Decoder Groups.
Log Source ID

This parameter's value can be any string. If a value is specified, this parameter enables you to differentiate log entries from different log sources for source identification or targeted processing. The x-log-source-id field is populated with a value identifying the log source for each log entry. For example, if you want to identify log entries from a log file source named LogFile01, you could type from LogFile01, and that string would be passed to the x-log-source-id field for every log entry from that source.

For information about the x-log-source-id field, see Event Data Record Fields.

Mask Pattern

A regular expression with a single capturing subpattern that extracts a consistent name used to identify the source of a series of log files. Only the file name is considered. The path and extension are not considered for the regular expression matching. If you do not specify a mask pattern, then a mask is generated automatically.

For the files Logs\010105server1.log and Logs\010105server2.log, the mask pattern would be [0-9]{6}(.*). This pattern extracts the string "server1" or "server2" from the file names above.

See Regular Expressions.

Recursive True or false. If this parameter is set to true, all subdirectories of each path specified in Log Paths are searched for files matching the specified file name or wildcard pattern. The default value is false.
Reject File The path and file name of the file containing the log entries that do not meet the conditions of the decoder.
Use Start/End Times

True or false. If this parameter is set to true and Start Time or End Time is specified, then all files for this log source must have file names starting with dates in ISO format (YYYYMMDD). It is assumed that each file contains data for one GMT day (for example, the time range starting at 0000 GMT on one day and ending at 0000 GMT the following day). If the log sources file names do not begin with ISO dates, or if the files contain data that do not correspond to a GMT day, then this parameter must be set to false to avoid incorrect results.

Note: If the naming and time range requirements described above are satisfied for the log files and you set this parameter to true, the specified text file decoder group limits the files read to those whose names have ISO dates that fall between the specified Start Time and End Time. If you set this parameter to false, the data workbench server reads all of the log files during log processing to determine which files contain data within the Start Time and End Time range.

For information about the Start Time and End Time parameters, see Data Filters.

In this example, the dataset is constructed from two types of log sources.

Log Source 0 specifies log files generated from event data captured by Sensor. This data source points to a directory called Logs and to all of the files in that directory with a .vsl file name extension.

Log Source 1 points to all of the files in the Logs directory with a .txt file name extension. The decoder group for this log source is called “Text Logs.”

You should not delete or move log files after the data sources for a dataset have been defined. Only newly created log files should be added to the directory for the data sources.