Calculating Metrics

Describes how to calculate common metrics using clickstream data feeds.

Pre vs. Post column

Bots

Bots are excluded from data feeds according to the bot rules defined for your report suite.

Date filtering

Include rows from the date range you want included by filtering the date_time field. The date_time field is human readable (for example, YYYY-MM-DD HH:MM:SS) and is adjusted to the time zone of the report suite. For example, date_time starts with "2013-12" includes hits from December 2013.

Event string

The event string in event_list and post_event_list contains a comma-delimited list of events, which may have a value and/or a unique ID. We recommend doing all processing on the post_event_list because it is de-duplicated and has already applied logic to remove duplicate events with the same ID (see Event Serialization).

For event ID to name mapping, see the event lookup delivered with your data feed.

For more information on events, see Events.

Formulas for common metrics

The following table contains instructions to calculate several common metrics.

Metric How to calculate
Page Views

Page views can be calculated by counting when there is either a value in post_pagename or post_page_url.

You can use similar logic to count custom links:

  • post_page_event = 100 to count custom links.
  • post_page_event = 101 to count download links.
  • post_page_event = 102 to count exit links.
Visits
  1. Exclude all rows where exclude_hit > 0.
  2. Exclude all rows with hit_source = 5,7,8,9. 5, 8, and 9 are summary rows uploaded using data sources. 7 represents transaction ID data source uploads that should not be included in visit and visitor counts. See Hit Source Lookup.
  3. Combine post_visid_high, post_visid_low, visit_num, and visit_start_time_gmt*. Count unique number of combinations.

*In rare circumstances, internet irregularities, system irregularities, or the use of custom visitor IDs can result in duplicate visit_num values for the same visitor ID that are not the same visit. To avoid resulting issues, also include visit_start_time_gmt when counting visits.

Visitors
  1. Exclude all rows where exclude_hit > 0.
  2. Exclude all rows with hit_source = 5,7,8,9. 5, 8, and 9 are summary rows uploaded using data sources. 7 represents transaction ID data source uploads that should not be included in visit and visitor counts. See Hit Source Lookup
  3. Combine post_visid_high with post_visid_low. Count unique number of combinations.
Event instances

When an event is set on a hit, post_event_list contains the event. The post_event_list is de-duplicated and is recommended to determine event instances.

For example:

post_event_list = 1,200

Indicates an instance of purchase and event1.

  1. Exclude all rows where exclude_hit > 0.
  2. Exclude all rows with hit_source = 5,8,9. These are summary rows uploaded using data sources. See Hit Source Lookup.
  3. Count the number of times the event lookup value appears in post_event_list.
eVar instances

When an eVar is set on a hit, event_list contains an instance of that eVar.

For example:

post_event_list = 100,101,106

Indicates an instance of eVar1, eVar2, and eVar7. This means that a value for these three eVars was set on the hit.

To calculate instances for eVars, use the same logic explained in Event instances above, but count the number of times the eVar lookup appears in the post_event_list.

Time Spent

To calculate time spent, you must group hits by visit, then order them according to the hit number within the visit.

  1. Exclude all rows where exclude_hit > 0.
  2. Group hits for a visit by concatenating visid_high, visid_low, and visit_num.
  3. Order hits for each visit by visit_page_num.
  4. Using page_event, filter the types of hits you want.
  5. Find hits where the value you want to track time spent is set. For example:
    hit 1: post_prop1=red
    hit 2: post_prop1=blue
  6. Subtract the post_cust_hit_time for hit 1 from the post_cust_hit_time for hit 2 to determine the seconds between these two hits. The result is the time spent for post_prop1=red. If this results in a negative number, it indicates that the hit was received out-of-order and the calculation should be discarded.

This logic can be extended to calculate time spent for other values. When calculating time spent, Analytics calculates time spent based on the time the value was set in a track (page_event=0) or trackLink (page_event=10|11|12) call, to the time of the next page view (track call).

When reporting time spent for a specific period, marketing reports & analytics and ad hoc analysis evaluate hits beyond the reporting period to determine time spent for values within the reporting period, except when the start and/or end date of the time period is on a monthly boundary. Due to the complexity of these calculations, it might be difficult to match the time spent metrics exactly. Data warehouse does not evaluate hits beyond the reporting period.

Revenue, orders, units

Currency conversion is applied to the post_product_list according to the settings for the report suite, so using that column is recommended.

  1. Exclude all rows where exclude_hit > 0.
  2. Exclude all rows with hit_source = 5,8,9. 5-9 represent summary rows uploaded using data sources. See Hit Source Lookup.
  3. Ignore purchase data for rows where duplicate_purchase = 1. This flag indicates that the purchase is a duplicate (meaning that a hit with the same purchaseID was already recorded).
  4. The post_product_list uses the same syntax as s.products, so you can parse this string to calculate metrics. For example:

    ;Cross Trainers;1;69.95,;Athletic Socks;10;29.99

    By parsing this string, you can determine that 1 pair of cross trainers were purchased for $69.95, and that total revenue from this purchase was $99.94.

Note: Analytics allows currency events that contain product revenue to be passed in through the events string, so you might need to account for revenue that is not in the products string. See Numeric/Currency Events in s.events.