Internet Audience Measurement

Internet Audience Measurement

Status

Working Document

Author

@Ralf Cornehl (KASRL) (Unlicensed)

Date

10.05.2021

Version

1.6

Version

Date

Author

Comments

Version

Date

Author

Comments

1.0

21.02.2018

@Ralf Cornehl (KASRL) (Unlicensed)

Initial

1.1

12.04.2019

@Ralf Cornehl (KASRL) (Unlicensed)

Update

Geo-location Attribution

1.2

21.05.2019

@Hendrik Vermeylen (KASRL) (Unlicensed)

Device Type Recognition

1.3

18.01.2020

@Ralf Cornehl (KASRL) (Unlicensed)

FAQ added

1.4

26.10.2020

@Ralf Cornehl (KASRL) (Unlicensed)

SDK Development
SDK Testing (QA)
SDK Maintenance
SDK Release Cycles

1.5

07.12.2020

@Ralf Cornehl (KASRL) (Unlicensed)

IP-Address handling
Cookie handling

1.6

10.05.2021

@Ralf Cornehl (KASRL) (Unlicensed)

Kantar Browser Meter

 

Content

The site centric and user centric measurement system we propose is based on our proven Kantar - Media Division solution and features:

IAM-SDK

Desktop-Web-SDK

Data acquisition of the site centric measurement system is facilitated by a state of the art script-tag  that allows to track users on ALL devices
(including mobile devices like iPad/iPhone) beyond the lifetime of their browser-cookies and is still compliant to ECC privacy guidelines.
The Tag is delivered as a java script package that is easily integrated into the web content.
Client specific dimensions and variables can be defined (like content areas).

 

<html> <head> </head> <body> <script src="spring.js"> </script> <script language="Javascript"> var sp_e0 = { "s":"testsite", "cp":"Path/to/your/page", "url": document.location.href }; spring.c(sp_e0); </script> </body> </html>

 
The Tag can be used either synchronous or asynchronous. Asynchronous loading prevents page-loading to be delayed but does register fewer Pageviews.

Mobile-Web-SDK

The sensor to facilitate the measurement of mobile browser usage is made up as a “cascade” of technologies to allow for the most accurate (re)-identification
of users visiting mobile enabled websites even on non-cooperative devices. The system uses a combination of technologies for this.
The sensor is being deployed to websites as a Javascript-Package.

Mobile-App-SDK

The sensor for measuring mobile applications consists of a number of SDKs to be integrated into the mobile applications of participating publishers.
Currently the system extensively supports iOS and Android applications. The libraries are able to track the life cycle events of a mobile application
like "started", "foreground", "background" and "closed".
Additionally, Internet content rendered into a "Webview" of the mobile application can be tagged and subsequently be reported. 
To be compliant with European standards in terms of data protection and privacy, the libraries also offer an opt-out option that can be easily triggered from the applications.

Integration

The integration is typically done in less than a work day.

VAM-SDK

The Streaming measurement system we propose is based on our proven Kantar - Media Division solution and features an easily integrable in-player SDK for all modern players
It gives accurate measurement and reporting of the streaming usage (Audio or Video) on Live or recorded or downloaded content (Offline Mode).

Data-Flow of Data Acquisition - System Schematics: (Web-Video-Player)

 

The measurement SDK can be integrated with ANY player and the measurement system supports the following technologies:

Desktop-Player-SDK

  • HTML 5

  • Flash (unsupported since end of 2020)

  • Flash Action Script 3 (unsupported since end of 2020)

  • Flash OSMF (unsupported since end of 2020)

  • Brightcove

  • Silverlight

Mobile-App-Player-SDK

  • iOS

  • Android

Special Platform-Player-SDK

  • Apple TV (TVJS/TVML)

  • Apple TV (tvOS)

  • Chromecast

  • Cordova

  • Electron

  • LibJscript

  • Playstation 3 and 4

  • Roku

  • TAL (SmartTV)

  • Xbox

  • Other technologies (like proprietary players) can be integrated via a generic JS interface (adapter)

Special SDK or Plugin Development

On special occasions Kantar can develop and provide a standalone measurement SDK or special plugin in cooperation with broadcasters when their current video player framework
is not covered by the above-mentioned standard SDKs and thus needs a separated treatment on software development and implementation.

Integration

The integration is typically done in less than a work day.

SDK Development

Kantar’s software development for SDKs is being operated on an “agile” software development approach using a Scrum method [https://en.wikipedia.org/wiki/Scrum_(software_development)].
The development sprint intervals are regularly set to a 3-weeks cycle. For covering special development needs (e.g. initial development for platform support)
development sprints can be extended up to 4 weeks, thus keeping the highest flexibility for possible development iterations and achieving constant developer feedback loops.
Respecting the availability of development resources on sprint planning and integration of a development buffer allows for timely “ad-hoc”-handling of urgent SDK issues surfacing during running development sprints.

SDK Testing (QA)

The Kantar QA process for SDK products is integrated into the overall agile development cycle. For all our products, Kantar maintains sets of regression scenarios, which are the basis for any release test.
These sets will be extended, respectively adjusted, if new features or changes in current business logic appear in the latest development iteration.

As for the Kantar QA workflow, the QA department will receive a feature complete SDK artifact (RC = release candidate) from the implementing team
together with a brief description of any new features or adjustments, once the initial implementation of all requirements has been reached.
The QA department then will adopt any updates into their set of regression scenarios.
Based on these scenarios and the received RC the integration test will be performed and documented.
If the RC meets all requirements, it will be declared by QA team as accepted, which subsequently triggers the build of the final release artifact (SDK production release) which is identical to the latest RC.
If any findings or defects appear during testing of a RC, the corresponding SDK artifact will be declared as rejected and the development team will be informed about the findings or defects.
In this case, the development team needs to adjust the current SDK artifact in regard to the finding and build a new RC which then again will be sent to the QA department for testing purposes.
This interaction between the QA department and the corresponding development team is cyclic, until a RC is declared as accepted.

Technology-wise, we divide QA into internal and integration testing.

Internal tests cover internal details of a SDK product – granular components, such as algorithms, classes or functions.
These tests are implemented as unit tests, using the unit testing framework of the corresponding programming language.
As a result, they are fully automated and integrated into our build pipelines, which ensures, that every software artifact produced by a development team is fitting these tests.

Integration tests cover cases, in which the whole software artifact is integrated into the environment, in which it will be running in production.
This covers any interaction with the environment, e.g. the operation system, as well as complex orchestration of various sub-components or algorithms.
Kantar relies heavily on automation in this area as well, in order to ensure consistency. For front-end products Kantar uses state of the art automation frameworks
like Selenium (https://www.selenium.dev/) and Appium (http://appium.io/) to implement the automation of these tests. To cover as many platforms as possible during these tests,
we are using the cloud service Saucelabs (https://saucelabs.com), which provides the possibility to access a wide range of real devices as well as device simulations through the cloud.

Kantar also enabled a more abstract layer on top of that which encapsulated the technical details and provides human readable specification of an integration scenario.
This behaviour driven attempt reduces the barriers in communication between market stakeholders and the Kantar QA team and ensures,
that integration testing provides valuable information on how the product will behave in a customer scenario.

For the processing product, Kantar will using for 2021 the Azure big data technologies and the Azure test procedure they are implemented.
Kantar also enabled a more abstract layer on top of that which encapsulated the technical details and provides human readable specification of an integration scenario.
This behaviour driven attempt reduces the barriers in communication between market stakeholders and the Kantar QA team and ensures,
that integration testing provides valuable information on how the product will behave in a customer scenario.

SDK Maintenance

Kantar is constantly and regularly performing reviews on browser and operating system (OS) vendors in determining changes to happen upon introducing new releases for them.
Impact assessment of those vendor announcements on Kantar SDKs and/or broadcaster feedback on SDKs given via Kantar’s own service desk will lead to development inputs
for Scrum sprint planning in building up work packages for initial software development and/or bug fixing by simultaneously respecting Kantar’s available development resources.

SDK Release Cycles

In following up the main technology vendors like Apple for iOS, Google for Android and additional constant review on browser vendors
and video player frameworks Kantar observes upcoming release cycles given for the respective platform.
Major updates and/or upgrades on them are adopted for Kantar’s SDK release cycle planning.

Most common time patterns for individual platforms have been established by vendors over the course of the past years.
Google is releasing new major versions of Android mostly during summertime (July/August).
Apple is doing so for iOS by announcing and introducing new smartphone or tablet models in late summer or autumn (August/September).

For covering the major releases of those vendors Kantar is planning major SDK releases within the same time frames as well.
Internal code reviews and broadcaster feedback on SDKs may lead to bugfix or hotfix scenarios which are being provided in form of intermediate minor releases on an ad-hoc basis.

FAQ

How does a tracking request look like?

The Kantar Media Division requests are HTTP-GET-requests. They contain all the information required for measurement.

Example for a website measurement request:

Example
GET 302 Redirect to: /blank.gif http://test.tns-cs.net/j0=,,,r=https%3A%2F%2Fwww.google.com;+,cp=test+url=www.test.com;;;?lt=ig36v70v&x=1600x900x24 GET 200 image/gif http://test.tns-cs.net/blank.gif

See the explanation of the request elements in the below table.

Element of the request

Description

Element of the request

Description

http://test.tns-cs.net

call to counting domain

r=https%3A%2F%2Fwww.google.com

r-variable = referrer

cp=test

cp-variable = content path set to unify content

url=www.test.com

url-variable = URL where the content is placed

lt=ig36v70v

lt-variable = random parameter to avoid browser-caching

x=1600x900x24

x-variable = screen resolution

Example for a video measurement request

Heartbeats sent from the SDK

This section briefly explains what the heartbeats sent from the libraries should look like. A concrete example of a viewing session is used.

Content Stream is started and the first Request transmitted

Please use the record layout descriptions below for reference.

First Request Statement

The actual record output should look similar to below:
First play state: 0+0+mbeswh

http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+1+mbeswj;;+dur=1501+vt=2;;;

Please use the record layout descriptions below for reference.

Variable

Description

Variable

Description

counting domain

"tns-cs.net" = Norwegian counting domain
e.g. “sitename” = “example" for a site

pl

player = own player name (set by the broadcaster)

plv

player version = own player version (set by the broadcaster)

sx

width of the stream window

sy

height of the stream window

stream

stream name (set by the broadcaster)

cq

content-ID = broadcaster's content ID (set by the broadcaster)

uid

unique Id of the view sequence

pst

play state = list of viewing intervals on the stream

dur

stream length in seconds (set by the broadcaster)

vt

view time in seconds (time of visual contact with the stream)

After Viewing 2 min of the Stream

The output records should look similar to the records below, note the time of the “heartbeat” records = play states records at 21, 41, 61,.... seconds

http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+21+mbeswj;;+dur=1501+vt=22;;; http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+41+mbeswj;;+dur=1501+vt=42;;; http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+61+mbeswj;;+dur=1501+vt=62;;; http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+83+mbeswj;;+dur=1501+vt=84;;; http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+105+mbeswj;;+dur=1501+vt=106;;;

Stopping the Stream after 2:00 min

Last play state: 1+121+mbeswj = 120 sec playtime.

http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+121+mbeswj;;+dur=1501+vt=124;;;
  • Note that the "uid" (uid=3f3tv5p) and stream name "stream" (stream=od) remained the same during the whole view sequence.
    This should always be the case when the implementation is correct.

  • If the "uid" or "stream" (=stream name) changes during the observed view sequence, there is something wrong with the implementation
    and as a consequence more than one stream view is being counted for this single view sequence.

  • The above example is a generic one.

  • For the streaming project by continuously viewing a stream you should be seeing heartbeats sent out at 0,1,20,40,60,80,100,120,.. seconds.

There may be 1 or 2 seconds added to every heartbeat due to internal workings of the library.

How can I control those requests?

Step-by-step Instructions for running an HTTP-Request-Analyser:

  • On a standard laptop / desktop device → Open the browser

  • Using Chrome: Press ”CTRL+SHIFT+i”. This will bring the information screen along the bottom of the browser screen
    (On other browser, you might need specific browser plug-ins)

  • The info screen contains several tabs across the page, but the one that matters is “network”. Select “network”

  • In the main browser window → Open the website being tested, i.e. your web page

  • As soon as you load the web page, you will notice a stream of events occurring in the information screen along the bottom of the screen

  • Click in the information screen to make sure it has focus, now click on the filter icon.
    This will allow you to enter a value in the search box

  • Enter “tns-cs.net” (= the receiving server see example above) and click the option “filter”.
    You will now see only the requests going and coming to and from the website tracking project systems

  • If you return focus to the website you can now test the tracking function 
    and watch the results in the HTTP-request data scrolling along the information screen

  • This data is NOT captured automatically so you MUST copy and paste ALL HTTP-requests after the test has been completed;
    this data should be shared with us in order for the implementation to be signed off.

Why is a 302 redirect used?

When a request hits our servers, it is measured and then answered back with a http-302-redirect response (“temporarily moved”).
This 302-redirect forces caching mechanisms such as proxies or browser-cache, to request the resource anew from the server.

However, the related RFC2616 is not completely implemented here. In replying the redirect-URL to the client, only the URI-part is sent (host name of the server omitted).
This leads to a higher performance of the system and a reduction of the transfer volumes.
For repeated requests, the saving is about 10 bytes per response compared to full responses.

Additionally in the local client, only one copy of the blank.gif is allocated and processed.

How is the IP-Address used?

Kantar only uses the IP-address in a truncated form (last octet removed) on run-time (for milliseconds) in RAM on our measurement boxes and after that they are discarded.
Our measurement boxes even don’t have any physical storage capacity for this purpose.

The IP-address is needed to execute the HTTP-traffic operations (= Internet communication) and to allow for geolocation attribution.

To be GDPR compliant they are never getting used afterwards in any of our data processing after measurement nor are they stored elsewhere. 

This procedure of IP-address handling has already been audited by local data privacy organizations in Germany when we as formerly being “spring” were still part of the AGOF measurement,
which had now been taken over by a company called INFOnline using our technology approach.
Same approach is also being in production for Switzerland (Netmetrix) and Austria (ÖWA) (and in our current measurement projects across the globe).
All these countries explicitly mentioned above are known for their strict GDPR regulations.
This approach considers already in the measurement context a “privacy by design” model.

How are cookies used?

Data acquisition of the site centric measurement system is facilitated on browsers by a state- of-the-art script-tag that allows to track users on desktop and mobile devices
(including such mobile devices like tablets and smartphones) beyond the lifetime of their browser identifiers and which is still compliant to ECC privacy guidelines.
The “Tag” is delivered as a java script package that is easily to be integrated into the web-player context.

The broadcaster’s content (video or audio) is typically hosted on a specific domain, which is considered as the content domain = “First Party Domain”.
The measurement requests are sent to a different domain, which is considered as the measurement domain = “Third Party Domain” in the broadcaster’s context.     

Available browser identifiers are being used to uniquely identify a video viewing sequence (session handling) by the device’s browser.

The i00-HTTP Cookie

The i00 HTTP-cookie content consists of: (in hexadecimal form)

- box-id (first 4 digits)
- time stamp (8 digits)
- counter (4 digits)
(- serial (4 digits) irrelevant for the HTTP-context)

The HTML-5 Local Storage Cookie

The c (HTML5-cookie) is only a parsed cookie through our boxes and set by the java script (16 hexadecimal form)

if(!this.nlso) try { var l = localStorage.getItem('i00'); if(l) return '&c='+l; else { var ta = '0000', id = ta + Math.ceil((new Date()).getTime()/1000).toString(16) + (0x8000|Math.random()*0xffff).toString(16) + ta; localStorage.setItem('i00',id); }

 

The difference:

- the i00-cookie is set by the box as active component
- the HTML5-cookie is NOT set by the box (=passive behavior)

For cookie handling and processing see chapter: Client-Resolving

How to define a page view?

A PageImpression/PageView describes the call of one Webpage by the user.
The following requirements have to be met before a PageImpression/PageView can be counted and referred to a special counting-ID:

  • The page has to meet the FQDN (https://en.wikipedia.org/wiki/Fully_qualified_domain_name) for the website (or alias/redirect).

  • The page has to belong to the site, either in look and feel or by a clear and obvious optical ID.

  • Each call of the page may be counted only once.

  • The call of the page has to be user-induced.

The following examples describe user-induced actions and substantial changes, which could either be counted or not counted.

User-induced actions: (counted)

  • Call of a new page or new parts of the page, caused by mouse click or keyboard entry.

  • Call of same page or same parts of the page (reloads), caused by mouse click or keyboard entry.

  • Open a browser.

Non user-induced actions: (not counted)

  • Call of a new page or new parts of the page by automatic forwarding (beside redirects and alias).

  • Call of the same page the same parts of the page by automatic reload (e.g. news ticker).

  • The call of a page by closing a window.

  • The call via robots/spiders and similar.

Substantial change: (counted)

  • Changes of text passages, whose context is in the main focus of the page.

  • Changes of visual, multimedia contents, whose context is in the main focus of the page.

  • Asking a new question in quiz games/surveys.

  • Loading of new picture within a picture gallery (slide-show).

Non substantial change: (not counted)

  • Changes of the page by crossing with the mouse (mouse-over)

  • Shift of monitor contents by aid of mouse or keyboard

  • Entry of single signs, whereas the content change is to represent the input characters

  • Selection of monitor contents by aid of mouse or keyboard (e.g. select box)

  • Scrolling with mouse or keyboard within one page

  • Change of color (text, picture, background etc.)

  • Change of layout of the page, without changing the content

Conclusion: User-induced means every action of a user which geared to call a page, in order to cause a substantial change of the site content.

How is the last page view handled?

When a client has more than one activity within a session, the "duration" (page viewtime) of the current activity is assumed to be ended by the next.
So the viewtime is not computable for all activities, because the last activity in a session has no successor in time.

To display figures for all activities in the system, a projection is used:

  • the average duration of all measurable activities (i.e. with successors) is computed

  • and that average duration is projected to all activities.

Assumption: the missing viewtimes are well represented by the existing ones.

Example:

  • a certain webpage has 100 pageviews and a measurable viewtime for 90 pageviews is counted

  • for 10 pageviews the vietime is missing

  • the system collected 180 seconds for those 90 pageviews

The average viewtime per measurable pageview is them 180/90 seconds = 2 seconds.
Which is - by assumption - the average of all 100 pageviews too. 

Conclusion: The total viewtime (sum) for the 100 pageviews is 200 seconds.

How is a client defined?

In the first step, the presence of a cookie in the request is checked. Two types of cookies are possible, either a conventional, third-party HTTP-cookie (i00) or
a cookie passed via URL-Parameter. The i00-cookie has a higher priority, i.e. it will always be used when present while the cookie set via URL is intended
to cope with low third-party cookie acceptance. It will be set if possible (e.g. html5 cookie) and used if no i00-cookie is available.
If no cookie can be set an "ident" as browser-"fingerprint" (combination of IP-address and user agent info) is created.

Two session containers are created, one for "idents" without cookies and one for those with a cookie. A client may move between the two types.
Keywords in this context are the evolving of a session (first contact without cookie, following by contacts with set cookie) and session uniqueness.

  • If no valid cookie is contained, a new session is established.

  • If a session that is assigned to a certain "ident" can be found, the event is added to this session.

  • Otherwise a new session is created and returned.

Conclusion: Client = the same cookie or same browser "fingerprint" (combination of IP-address and user agent info)

How is a session defined?

The metric “session” is used in our measurement system.

Difference to other system that count “visits”: The metric "session" is counted every single hour, whereas the metric "visit" is counted only once for the certain hour when the usage begins.

A session is a collection of events (like page impressions) with the same cookie or fingerprint (combination of IP-address and user agent info), if the cookie is missing.

Sessions can be computed:

  • Within a single web site (the standard)

  • Across all web sites (networks)

There is no logout, so a timeout is used. In a single session the time between two events is less than 30 minutes.
So after an idle time of 30 minutes a new session is initiated. (following international standards)

The following information is provided for every session

  • the start time of session,

  • the time of the last event in the session,

  • the signature of the session, formed as an MD5 hash of the user agent and the IP-address,

  • the cookie of the session, if available

  • the number of page impressions

  • the external referrer at beginning of the session

  • a list of (dimensional) properties of events (such as the pixel code)

  • a list of properties of the session (references to dimensions, which are stored once per session only, such as user agent, geo-location, screen resolution)

  • a list of dates of events

Conclusion: The session context is kept over the switch of the hour as long as the idle time between user interactions is not longer than 30 minutes

How will Adblock or similar tools influence the tracking?

What it is and how it operates: