Internet Audience Measurement
Status | Working Document |
|---|---|
Author | @Ralf Cornehl (KASRL) (Unlicensed) |
Date | 10.05.2021 |
Version | 1.6 |
Version | Date | Author | Comments |
|---|---|---|---|
1.0 | 21.02.2018 | @Ralf Cornehl (KASRL) (Unlicensed) | Initial |
1.1 | 12.04.2019 | @Ralf Cornehl (KASRL) (Unlicensed) | Update Geo-location Attribution |
1.2 | 21.05.2019 | @Hendrik Vermeylen (KASRL) (Unlicensed) | Device Type Recognition |
1.3 | 18.01.2020 | @Ralf Cornehl (KASRL) (Unlicensed) | FAQ added |
1.4 | 26.10.2020 | @Ralf Cornehl (KASRL) (Unlicensed) | SDK Development |
1.5 | 07.12.2020 | @Ralf Cornehl (KASRL) (Unlicensed) | IP-Address handling |
1.6 | 10.05.2021 | @Ralf Cornehl (KASRL) (Unlicensed) |
- 1 Content
- 2 IAM-SDK
- 2.1 Desktop-Web-SDK
- 2.2 Mobile-Web-SDK
- 2.3 Mobile-App-SDK
- 2.4 Integration
- 3 VAM-SDK
- 4 SDK Development
- 5 SDK Testing (QA)
- 6 SDK Maintenance
- 7 SDK Release Cycles
- 8 FAQ
- 8.1 How does a tracking request look like?
- 8.2 How can I control those requests?
- 8.3 Why is a 302 redirect used?
- 8.4 How is the IP-Address used?
- 8.5 How are cookies used?
- 8.6 How to define a page view?
- 8.7 How is the last page view handled?
- 8.8 How is a client defined?
- 8.9 How is a session defined?
- 8.10 How will Adblock or similar tools influence the tracking?
- 9 Fraud Prevention
- 10 Client-Resolving
- 10.1 Internet-Client-Resolving
- 10.2 App-Client-Resolving
- 10.2.1 Identifying mobile iOS Users
- 10.2.1.1 Advertising ID
- 10.2.1.2 ID for Vendors
- 10.2.1.3 Mac Address (deprecated)
- 10.2.2 Identifying mobile Android Users
- 10.2.2.1 Android ID
- 10.2.2.2 Google Advertising ID
- 10.2.2.3 UDID
- 10.2.3 Mobile ID Priority a.k.a. Client Cascade
- 10.2.1 Identifying mobile iOS Users
- 11 Geo-location Attribution
- 12 Device Type Recognition
- 13 Kantar Browser Meter
- 13.1 Purpose of the Kantar Browser Meter (a.k.a KBM)
- 13.2 General Use Case
- 13.3 YouTube Use Case
- 13.4 KBM Configuration Object
- 13.5 KBM Language Settings
- 13.6 KBM Life-Cycle Events
- 13.6.1 Sleep Mode
- 13.6.2 Input Panel
- 13.6.3 OptOut / OptIn
- 13.7 KBM Deployment and Installation
- 14 User Centric Measurement
- 15 Hybrid Measurement
- 16 Collection Server (Box)
- 17 Data Processing (Cluster)
- 18 Data Export
- 18.1 Aggregated Data
- 18.1.1 Export Report Data
- 18.2 Target Data
- 18.3 Session-Based Data
- 18.4 Panel Data
- 18.4.1 Panelists data
- 18.4.2 Usage Data
- 18.1 Aggregated Data
- 19 Display Tools
- 20 Glossary
Content
The site centric and user centric measurement system we propose is based on our proven Kantar - Media Division solution and features:
Site Centric Measurement (IAM-Sensor)
Mobile/Tablet Site Centric Measurement (Mobile-Web-Sensor)
Mobile/Tablet App Measurement (Mobile-App-Sensor)
Web-Player Streaming Measurement (Desktop-Player-Sensor)
Mobile/Tablet Streaming App Measurement (Mobile-App-Player-Sensor)
Special Platform Streaming Measurement (Special Platform-Player-Sensor)
IAM-SDK
Desktop-Web-SDK
Data acquisition of the site centric measurement system is facilitated by a state of the art script-tag that allows to track users on ALL devices
(including mobile devices like iPad/iPhone) beyond the lifetime of their browser-cookies and is still compliant to ECC privacy guidelines.
The Tag is delivered as a java script package that is easily integrated into the web content.
Client specific dimensions and variables can be defined (like content areas).
<html>
<head>
</head>
<body>
<script src="spring.js">
</script>
<script language="Javascript">
var sp_e0 = {
"s":"testsite",
"cp":"Path/to/your/page",
"url": document.location.href
};
spring.c(sp_e0);
</script>
</body>
</html>
The Tag can be used either synchronous or asynchronous. Asynchronous loading prevents page-loading to be delayed but does register fewer Pageviews.
Mobile-Web-SDK
The sensor to facilitate the measurement of mobile browser usage is made up as a “cascade” of technologies to allow for the most accurate (re)-identification
of users visiting mobile enabled websites even on non-cooperative devices. The system uses a combination of technologies for this.
The sensor is being deployed to websites as a Javascript-Package.
Mobile-App-SDK
The sensor for measuring mobile applications consists of a number of SDKs to be integrated into the mobile applications of participating publishers.
Currently the system extensively supports iOS and Android applications. The libraries are able to track the life cycle events of a mobile application
like "started", "foreground", "background" and "closed".
Additionally, Internet content rendered into a "Webview" of the mobile application can be tagged and subsequently be reported.
To be compliant with European standards in terms of data protection and privacy, the libraries also offer an opt-out option that can be easily triggered from the applications.
Integration
The integration is typically done in less than a work day.
VAM-SDK
The Streaming measurement system we propose is based on our proven Kantar - Media Division solution and features an easily integrable in-player SDK for all modern players
It gives accurate measurement and reporting of the streaming usage (Audio or Video) on Live or recorded or downloaded content (Offline Mode).
Data-Flow of Data Acquisition - System Schematics: (Web-Video-Player)
The measurement SDK can be integrated with ANY player and the measurement system supports the following technologies:
Desktop-Player-SDK
HTML 5
Flash (unsupported since end of 2020)
Flash Action Script 3 (unsupported since end of 2020)
Flash OSMF (unsupported since end of 2020)
Brightcove
Silverlight
Mobile-App-Player-SDK
iOS
Android
Special Platform-Player-SDK
Apple TV (TVJS/TVML)
Apple TV (tvOS)
Chromecast
Cordova
Electron
LibJscript
Playstation 3 and 4
Roku
TAL (SmartTV)
Xbox
Other technologies (like proprietary players) can be integrated via a generic JS interface (adapter)
Special SDK or Plugin Development
On special occasions Kantar can develop and provide a standalone measurement SDK or special plugin in cooperation with broadcasters when their current video player framework
is not covered by the above-mentioned standard SDKs and thus needs a separated treatment on software development and implementation.
Integration
The integration is typically done in less than a work day.
SDK Development
Kantar’s software development for SDKs is being operated on an “agile” software development approach using a Scrum method [https://en.wikipedia.org/wiki/Scrum_(software_development)].
The development sprint intervals are regularly set to a 3-weeks cycle. For covering special development needs (e.g. initial development for platform support)
development sprints can be extended up to 4 weeks, thus keeping the highest flexibility for possible development iterations and achieving constant developer feedback loops.
Respecting the availability of development resources on sprint planning and integration of a development buffer allows for timely “ad-hoc”-handling of urgent SDK issues surfacing during running development sprints.
SDK Testing (QA)
The Kantar QA process for SDK products is integrated into the overall agile development cycle. For all our products, Kantar maintains sets of regression scenarios, which are the basis for any release test.
These sets will be extended, respectively adjusted, if new features or changes in current business logic appear in the latest development iteration.
As for the Kantar QA workflow, the QA department will receive a feature complete SDK artifact (RC = release candidate) from the implementing team
together with a brief description of any new features or adjustments, once the initial implementation of all requirements has been reached.
The QA department then will adopt any updates into their set of regression scenarios.
Based on these scenarios and the received RC the integration test will be performed and documented.
If the RC meets all requirements, it will be declared by QA team as accepted, which subsequently triggers the build of the final release artifact (SDK production release) which is identical to the latest RC.
If any findings or defects appear during testing of a RC, the corresponding SDK artifact will be declared as rejected and the development team will be informed about the findings or defects.
In this case, the development team needs to adjust the current SDK artifact in regard to the finding and build a new RC which then again will be sent to the QA department for testing purposes.
This interaction between the QA department and the corresponding development team is cyclic, until a RC is declared as accepted.
Technology-wise, we divide QA into internal and integration testing.
Internal tests cover internal details of a SDK product – granular components, such as algorithms, classes or functions.
These tests are implemented as unit tests, using the unit testing framework of the corresponding programming language.
As a result, they are fully automated and integrated into our build pipelines, which ensures, that every software artifact produced by a development team is fitting these tests.
Integration tests cover cases, in which the whole software artifact is integrated into the environment, in which it will be running in production.
This covers any interaction with the environment, e.g. the operation system, as well as complex orchestration of various sub-components or algorithms.
Kantar relies heavily on automation in this area as well, in order to ensure consistency. For front-end products Kantar uses state of the art automation frameworks
like Selenium (https://www.selenium.dev/) and Appium (http://appium.io/) to implement the automation of these tests. To cover as many platforms as possible during these tests,
we are using the cloud service Saucelabs (https://saucelabs.com), which provides the possibility to access a wide range of real devices as well as device simulations through the cloud.
Kantar also enabled a more abstract layer on top of that which encapsulated the technical details and provides human readable specification of an integration scenario.
This behaviour driven attempt reduces the barriers in communication between market stakeholders and the Kantar QA team and ensures,
that integration testing provides valuable information on how the product will behave in a customer scenario.
For the processing product, Kantar will using for 2021 the Azure big data technologies and the Azure test procedure they are implemented.
Kantar also enabled a more abstract layer on top of that which encapsulated the technical details and provides human readable specification of an integration scenario.
This behaviour driven attempt reduces the barriers in communication between market stakeholders and the Kantar QA team and ensures,
that integration testing provides valuable information on how the product will behave in a customer scenario.
SDK Maintenance
Kantar is constantly and regularly performing reviews on browser and operating system (OS) vendors in determining changes to happen upon introducing new releases for them.
Impact assessment of those vendor announcements on Kantar SDKs and/or broadcaster feedback on SDKs given via Kantar’s own service desk will lead to development inputs
for Scrum sprint planning in building up work packages for initial software development and/or bug fixing by simultaneously respecting Kantar’s available development resources.
SDK Release Cycles
In following up the main technology vendors like Apple for iOS, Google for Android and additional constant review on browser vendors
and video player frameworks Kantar observes upcoming release cycles given for the respective platform.
Major updates and/or upgrades on them are adopted for Kantar’s SDK release cycle planning.
Most common time patterns for individual platforms have been established by vendors over the course of the past years.
Google is releasing new major versions of Android mostly during summertime (July/August).
Apple is doing so for iOS by announcing and introducing new smartphone or tablet models in late summer or autumn (August/September).
For covering the major releases of those vendors Kantar is planning major SDK releases within the same time frames as well.
Internal code reviews and broadcaster feedback on SDKs may lead to bugfix or hotfix scenarios which are being provided in form of intermediate minor releases on an ad-hoc basis.
FAQ
How does a tracking request look like?
The Kantar Media Division requests are HTTP-GET-requests. They contain all the information required for measurement.
Example for a website measurement request:
Example
GET 302 Redirect to: /blank.gif http://test.tns-cs.net/j0=,,,r=https%3A%2F%2Fwww.google.com;+,cp=test+url=www.test.com;;;?lt=ig36v70v&x=1600x900x24
GET 200 image/gif http://test.tns-cs.net/blank.gifSee the explanation of the request elements in the below table.
Element of the request | Description |
|---|---|
| call to counting domain |
| r-variable = referrer |
| cp-variable = content path set to unify content |
| url-variable = URL where the content is placed |
| lt-variable = random parameter to avoid browser-caching |
| x-variable = screen resolution |
Example for a video measurement request
Heartbeats sent from the SDK
This section briefly explains what the heartbeats sent from the libraries should look like. A concrete example of a viewing session is used.
Content Stream is started and the first Request transmitted
Please use the record layout descriptions below for reference.
First Request Statement
The actual record output should look similar to below:
First play state: 0+0+mbeswh
http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+1+mbeswj;;+dur=1501+vt=2;;;Please use the record layout descriptions below for reference.
Variable | Description |
|---|---|
counting domain | "tns-cs.net" = Norwegian counting domain |
pl | player = own player name (set by the broadcaster) |
plv | player version = own player version (set by the broadcaster) |
sx | width of the stream window |
sy | height of the stream window |
stream | stream name (set by the broadcaster) |
cq | content-ID = broadcaster's content ID (set by the broadcaster) |
uid | unique Id of the view sequence |
pst | play state = list of viewing intervals on the stream |
dur | stream length in seconds (set by the broadcaster) |
vt | view time in seconds (time of visual contact with the stream) |
After Viewing 2 min of the Stream
The output records should look similar to the records below, note the time of the “heartbeat” records = play states records at 21, 41, 61,.... seconds
http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+21+mbeswj;;+dur=1501+vt=22;;;
http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+41+mbeswj;;+dur=1501+vt=42;;;
http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+61+mbeswj;;+dur=1501+vt=62;;;
http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+83+mbeswj;;+dur=1501+vt=84;;;
http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+cq=123456789+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+105+mbeswj;;+dur=1501+vt=106;;;Stopping the Stream after 2:00 min
Last play state: 1+121+mbeswj = 120 sec playtime.
http://example.tns-cs.net/j0=,,,pl=jwplayer+plv=version1+sx=640+sy=517;+,stream=od+uid=3f3tv5p+pst=,,0+0+mbeswh;+,1+121+mbeswj;;+dur=1501+vt=124;;;Note that the "uid" (uid=3f3tv5p) and stream name "stream" (stream=od) remained the same during the whole view sequence.
This should always be the case when the implementation is correct.If the "uid" or "stream" (=stream name) changes during the observed view sequence, there is something wrong with the implementation
and as a consequence more than one stream view is being counted for this single view sequence.The above example is a generic one.
For the streaming project by continuously viewing a stream you should be seeing heartbeats sent out at 0,1,20,40,60,80,100,120,.. seconds.
There may be 1 or 2 seconds added to every heartbeat due to internal workings of the library.
How can I control those requests?
Step-by-step Instructions for running an HTTP-Request-Analyser:
On a standard laptop / desktop device → Open the browser
Using Chrome: Press ”CTRL+SHIFT+i”. This will bring the information screen along the bottom of the browser screen
(On other browser, you might need specific browser plug-ins)The info screen contains several tabs across the page, but the one that matters is “network”. Select “network”
In the main browser window → Open the website being tested, i.e. your web page
As soon as you load the web page, you will notice a stream of events occurring in the information screen along the bottom of the screen
Click in the information screen to make sure it has focus, now click on the filter icon.
This will allow you to enter a value in the search boxEnter “tns-cs.net” (= the receiving server see example above) and click the option “filter”.
You will now see only the requests going and coming to and from the website tracking project systemsIf you return focus to the website you can now test the tracking function
and watch the results in the HTTP-request data scrolling along the information screenThis data is NOT captured automatically so you MUST copy and paste ALL HTTP-requests after the test has been completed;
this data should be shared with us in order for the implementation to be signed off.
Why is a 302 redirect used?
When a request hits our servers, it is measured and then answered back with a http-302-redirect response (“temporarily moved”).
This 302-redirect forces caching mechanisms such as proxies or browser-cache, to request the resource anew from the server.
However, the related RFC2616 is not completely implemented here. In replying the redirect-URL to the client, only the URI-part is sent (host name of the server omitted).
This leads to a higher performance of the system and a reduction of the transfer volumes.
For repeated requests, the saving is about 10 bytes per response compared to full responses.
Additionally in the local client, only one copy of the blank.gif is allocated and processed.
How is the IP-Address used?
Kantar only uses the IP-address in a truncated form (last octet removed) on run-time (for milliseconds) in RAM on our measurement boxes and after that they are discarded.
Our measurement boxes even don’t have any physical storage capacity for this purpose.
The IP-address is needed to execute the HTTP-traffic operations (= Internet communication) and to allow for geolocation attribution.
To be GDPR compliant they are never getting used afterwards in any of our data processing after measurement nor are they stored elsewhere.
This procedure of IP-address handling has already been audited by local data privacy organizations in Germany when we as formerly being “spring” were still part of the AGOF measurement,
which had now been taken over by a company called INFOnline using our technology approach.
Same approach is also being in production for Switzerland (Netmetrix) and Austria (ÖWA) (and in our current measurement projects across the globe).
All these countries explicitly mentioned above are known for their strict GDPR regulations.
This approach considers already in the measurement context a “privacy by design” model.
How are cookies used?
Data acquisition of the site centric measurement system is facilitated on browsers by a state- of-the-art script-tag that allows to track users on desktop and mobile devices
(including such mobile devices like tablets and smartphones) beyond the lifetime of their browser identifiers and which is still compliant to ECC privacy guidelines.
The “Tag” is delivered as a java script package that is easily to be integrated into the web-player context.
The broadcaster’s content (video or audio) is typically hosted on a specific domain, which is considered as the content domain = “First Party Domain”.
The measurement requests are sent to a different domain, which is considered as the measurement domain = “Third Party Domain” in the broadcaster’s context.
Available browser identifiers are being used to uniquely identify a video viewing sequence (session handling) by the device’s browser.
The i00-HTTP Cookie
The i00 HTTP-cookie content consists of: (in hexadecimal form)
- box-id (first 4 digits)
- time stamp (8 digits)
- counter (4 digits)
(- serial (4 digits) irrelevant for the HTTP-context)
The HTML-5 Local Storage Cookie
The c (HTML5-cookie) is only a parsed cookie through our boxes and set by the java script (16 hexadecimal form)
if(!this.nlso)
try {
var l = localStorage.getItem('i00');
if(l) return '&c='+l;
else {
var ta = '0000',
id = ta + Math.ceil((new Date()).getTime()/1000).toString(16) + (0x8000|Math.random()*0xffff).toString(16) + ta;
localStorage.setItem('i00',id);
}
The difference:
- the i00-cookie is set by the box as active component
- the HTML5-cookie is NOT set by the box (=passive behavior)
For cookie handling and processing see chapter: Client-Resolving
How to define a page view?
A PageImpression/PageView describes the call of one Webpage by the user.
The following requirements have to be met before a PageImpression/PageView can be counted and referred to a special counting-ID:
The page has to meet the FQDN (https://en.wikipedia.org/wiki/Fully_qualified_domain_name) for the website (or alias/redirect).
The page has to belong to the site, either in look and feel or by a clear and obvious optical ID.
Each call of the page may be counted only once.
The call of the page has to be user-induced.
The following examples describe user-induced actions and substantial changes, which could either be counted or not counted.
User-induced actions: (counted)
Call of a new page or new parts of the page, caused by mouse click or keyboard entry.
Call of same page or same parts of the page (reloads), caused by mouse click or keyboard entry.
Open a browser.
Non user-induced actions: (not counted)
Call of a new page or new parts of the page by automatic forwarding (beside redirects and alias).
Call of the same page the same parts of the page by automatic reload (e.g. news ticker).
The call of a page by closing a window.
The call via robots/spiders and similar.
Substantial change: (counted)
Changes of text passages, whose context is in the main focus of the page.
Changes of visual, multimedia contents, whose context is in the main focus of the page.
Asking a new question in quiz games/surveys.
Loading of new picture within a picture gallery (slide-show).
Non substantial change: (not counted)
Changes of the page by crossing with the mouse (mouse-over)
Shift of monitor contents by aid of mouse or keyboard
Entry of single signs, whereas the content change is to represent the input characters
Selection of monitor contents by aid of mouse or keyboard (e.g. select box)
Scrolling with mouse or keyboard within one page
Change of color (text, picture, background etc.)
Change of layout of the page, without changing the content
Conclusion: User-induced means every action of a user which geared to call a page, in order to cause a substantial change of the site content.
How is the last page view handled?
When a client has more than one activity within a session, the "duration" (page viewtime) of the current activity is assumed to be ended by the next.
So the viewtime is not computable for all activities, because the last activity in a session has no successor in time.
To display figures for all activities in the system, a projection is used:
the average duration of all measurable activities (i.e. with successors) is computed
and that average duration is projected to all activities.
Assumption: the missing viewtimes are well represented by the existing ones.
Example:
a certain webpage has 100 pageviews and a measurable viewtime for 90 pageviews is counted
for 10 pageviews the vietime is missing
the system collected 180 seconds for those 90 pageviews
The average viewtime per measurable pageview is them 180/90 seconds = 2 seconds.
Which is - by assumption - the average of all 100 pageviews too.
Conclusion: The total viewtime (sum) for the 100 pageviews is 200 seconds.
How is a client defined?
In the first step, the presence of a cookie in the request is checked. Two types of cookies are possible, either a conventional, third-party HTTP-cookie (i00) or
a cookie passed via URL-Parameter. The i00-cookie has a higher priority, i.e. it will always be used when present while the cookie set via URL is intended
to cope with low third-party cookie acceptance. It will be set if possible (e.g. html5 cookie) and used if no i00-cookie is available.
If no cookie can be set an "ident" as browser-"fingerprint" (combination of IP-address and user agent info) is created.
Two session containers are created, one for "idents" without cookies and one for those with a cookie. A client may move between the two types.
Keywords in this context are the evolving of a session (first contact without cookie, following by contacts with set cookie) and session uniqueness.
If no valid cookie is contained, a new session is established.
If a session that is assigned to a certain "ident" can be found, the event is added to this session.
Otherwise a new session is created and returned.
Conclusion: Client = the same cookie or same browser "fingerprint" (combination of IP-address and user agent info)
How is a session defined?
The metric “session” is used in our measurement system.
Difference to other system that count “visits”: The metric "session" is counted every single hour, whereas the metric "visit" is counted only once for the certain hour when the usage begins.
A session is a collection of events (like page impressions) with the same cookie or fingerprint (combination of IP-address and user agent info), if the cookie is missing.
Sessions can be computed:
Within a single web site (the standard)
Across all web sites (networks)
There is no logout, so a timeout is used. In a single session the time between two events is less than 30 minutes.
So after an idle time of 30 minutes a new session is initiated. (following international standards)
The following information is provided for every session
the start time of session,
the time of the last event in the session,
the signature of the session, formed as an MD5 hash of the user agent and the IP-address,
the cookie of the session, if available
the number of page impressions
the external referrer at beginning of the session
a list of (dimensional) properties of events (such as the pixel code)
a list of properties of the session (references to dimensions, which are stored once per session only, such as user agent, geo-location, screen resolution)
a list of dates of events
Conclusion: The session context is kept over the switch of the hour as long as the idle time between user interactions is not longer than 30 minutes
How will Adblock or similar tools influence the tracking?
What it is and how it operates: