Persistent tracking mechanisms in the wild (2024)

The Web never forgets: Persistent tracking mechanisms in the wild is the first large-scale study of three advanced web tracking mechanisms - canvas fingerprinting, evercookies and use of "cookie syncing" in conjunction with evercookies.

Read the paper »

About

The study is a collaboration between researchers Gunes Acar1, Christian Eubank2, Steven Englehardt2, Marc Juarez1, Arvind Narayanan2, Claudia Diaz1
1 KU Leuven, ESAT/COSIC and iMinds, Leuven, Belgium {gunes.acar, marc.juarez, claudia.diaz}@esat.kuleuven.be
2 Princeton University {cge,ste,arvindn}@cs.princeton.edu

Reference: G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, C. Diaz. The Web never forgets: Persistent tracking mechanisms in the wild. In Proceedings of CCS 2014, Nov. 2014. (Forthcoming)

Results

Canvas Fingerprinting

Persistent tracking mechanisms in the wild (1)

Background

Canvas fingerprinting is a type of browser or device fingerprinting technique that was first presented by Mowery and Shacham in 2012. The authors found that by using the Canvas API of modern browsers, one can exploit the subtle differences in the rendering of the same text to extract aconsistent fingerprint that can easily be obtained in a fraction of a second without user's awareness.

Persistent tracking mechanisms in the wild (2)

Results

By crawling the homepages of the top 100,000 sites we found that more than 5.5% of the crawled sites include canvas fingerprinting scripts. Although the overwhelming majority (95%) of the scripts belong to a single provider (addthis.com), we discovered a total of 20 canvas fingerprinting provider domains, active on 5542 of the top 100,000 sites.

On the right, collage of the images printed to canvas by various fingerprinting scripts discovered during the study. The images are intercepted using a modified browser (by instrumenting the ToDataURL method). Some blank space was cropped from images to save space.


Canvas Fingerprinting Scripts

The below table shows the summary of canvas fingerprinting scripts found on the homepages of top 100K Alexa sites.

Full list of sites using Canvas Fingerprinting »

Fingerprinting scriptNumber of
including sites
Text drawn into the canvas
ct1.addthis.com/static/r07/core130.js (and 17 others)5282Cwm fjordbank glyphs vext quiz
i.ligatus.com/script/fingerprint.min.js115http://valve.github.io
src.kitcode.net/fp2.js68http://valve.github.io
admicro1.vcmedia.vn/fingerprint/figp.js31http://admicro.vn/
amazonaws.com/af-bdaz/bquery.js26Centillion
*.shorte.st/js/packed/smeadvert-intermediate-ad.js14http://valve.github.io
stat.ringier.cz/js/fingerprint.min.js4http://valve.github.io
cya2.net/js/STAT/89946.js3ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz0123456789+/
images.revtrax.com/RevTrax/js/fp/fp.min.jsp3http://valve.github.io
pof.com2http://www.plentyoffish.com
*.rackcdn.com/mongoose.fp.js2http://api.gonorthleads.com
9 others*9(Various)
TOTAL5559
(5542 unique1)

*: Some URLs are truncated or omitted for brevity.
1: Some sites include canvas fingerprinting scripts from more than one domain.

Evercookies & Respawning

Persistent tracking mechanisms in the wild (3)

Background

Evercookies are designed to overcome the "shortcomings" of the traditional tracking mechanisms.By utilizing multiple storage vectors that are less transparent to users and may be more difficult to clear, evercookies provide an extremely resilient tracking mechanism, and have been found to be used by many popular sites to circumvent deliberate user actions1,2,3.

Results

We detected respawning by Flash cookies on 10 of the 200 most popular sites and found 33 different Flash cookies were used to respawn over 175 HTTP cookies on 107 of the top 10,000 sites.The below table shows the 10 top-ranked websites found to include respawning based on Flash cookies.
Country: The country where the website is based.
3rd*: The domains that are different from the first-party but registered for the same company in the WHOIS database.

Global rankSiteCountryRespawning (Flash) domainFlash cookie name1st/3rd Party
16sina.com.cnChinasimg.sinajs.cnstonecc_suppercookie.sol3rd*
17yandex.ruRussiakiks.yandex.rufuid01.sol1st
27weibo.comChinasimg.sinajs.cnstonecc_suppercookie.sol3rd*
41hao123.comChinaar.hao123.com$hao123$.sol1st
52sohu.comChinatv.sohu.comvmsuser.sol1st
64ifeng.comHong Kongy3.ifengimg.comwww.ifeng.com.sol3rd*
69youku.comChinairs01.netmt_adtracker.sol3rd
17856.comChinairs01.netmt_adtracker.sol3rd
196letv.comChinairs01.netmt_adtracker.sol3rd
197tudou.comChinairs01.netmt_adtracker.sol3rd

Cookie Syncing

Persistent tracking mechanisms in the wild (4)

Background

Cookie synchronization or cookie syncing is the practice of tracker domains passing pseudonymous IDs associated with a given user, typically stored in cookies, amongst each other.

Read the blog post that explains cookie syncing and our findings with animated diagrams: The hidden perils of cookie syncing (Freedom to Tinker)

Results

The below table shows the number of IDs known by the top 10 parties involved in cookie sync under both the policy of allowing all cookiesand blocking third-party cookies.

Full list of domains involved in Cookie Syncing »

All Cookies AllowedNo 3P Cookies
Domain# IDsDomain# IDs
gemius.pl33gemius.pl36
doubleclick.net322o7.net27
2o7.net27omtrdc.net27
rubiconproject.com25cbsi.com26
omtrdc.net24parsely.com16
cbsi.com24marinsm.com14
adnxs.com22gravity.com14
openx.net19cxense.com13
cloudfront.net18cloudfront.net10
rlcdn.com17doubleclick.net10

The table presents the comparison of high-level cookie syncing statisticswhen allowing and disallowing third-party cookies (top 3,000 Alexa domains).

StatisticThird party cookie policy
AllowBlock
# IDs1308938
# ID cookies1482953
# IDs in sync435347
# ID cookies in sync596353
# (First*) Parties in sync(407) 730(321) 450
# IDs known per party1 / 2.0 / 1 / 331 / 1.8 / 1 / 36
# Parties knowing an ID2 / 3.4 / 2 / 432 /2.3 / 2 / 22

The format of the bottom two rows isminimum/mean/median/maximum.
*Here we define a firstpartyas a site which was visited in the first-party contextat any point in the crawl.

Data

Due to the size of the files, data is available by request. Please feel free to email the authors for your requests. In the meantime, you can download a sample database.

Databases available for download

(DO = Digital Ocean, EC2 = Amazon EC2)

Name Size Machine # - Location (Provider) # of sites Flash enabled? cookie setting Data from previous crawls (Exp. #)
- Data loaded
Continuous Profile Comments
P01_alexa10k_05012014_fresh 114M 1 - N. Virginia (EC2) 10K yes Allow all no yes fresh profile
P04_alexa10k_05032014_fresh 306M 1 - N. Virginia (EC2) 10K yes Allow all no yes fresh profile
P06_alexa3k_05062014_fresh 84M 1 - N. Virginia (EC2) 3k yes Allow all No yes
P08_alexa3k_05062014_fresh 84M 2 - N. Virginia (EC2) 3k yes Allow all No yes
P09_alexa3k_05072014_flash 84M 2 - N. California (EC2) 3k yes Allow all (P6) - Flash yes loaded Flash from P6
P10_alexa3k_05072014_localStorage 77M 3 - N. Virginia (EC2) 3k yes Allow all (P6) - localStorage yes loaded localStorage from P6
P11_alexa3k_05072014_HTTP_cookies 90M 4 - N. Virginia (EC2) 3k yes Allow all (P6) - HTTP Cookies yes loaded cookies.sqlite from P6
P14_alexa3k_05122014_DNT 76M 1 - N. Virginia (EC2) 3k yes Allow all No yes DNT Enabled
P15_alexa3k_05122014_DNT 81M 2 - N. California (EC2) 3k yes Allow all No yes DNT Enabled
P16_alexa3k_05122014_no3Pcookies 55M 4 - N. Virginia (EC2) 3k yes Allow 1st party No yes Block third-part cookies
P17_alexa3k_05122014_no3Pcookies 55M 3 - N. Virginia (EC2) 3k yes Allow 1st party No yes Block third-part cookies
P21_alexa3k_06132014_opt-out 60M 5 - N. Virginia (EC2) 3k yes Allow all No yes Loaded Opt-out from: NAI, DAA, EDAA
P22_alexa3k_06132014_opt-out 64M 6 - N. California (EC2) 3k yes Allow all No yes Loaded Opt-out from: NAI, DAA, EDAA
L03_alexa10k_05032014_flash 295M 7- New York (DO) 10K yes Allow all (P1) - Flash no Flash loaded from P1
L04_alexa10k_05042014_flash 295M 7- New York (DO) 10K yes Allow all (P1) - Flash no Flash loaded from P1
L05_alexa10k_05042014_fresh 289M 8- New York (DO) 10K yes Allow all no no fresh profile
L06_alexa100k_flash_no3Pcookies 2.1G 9- Leuven (local machine) 100K yes Allow 1st party Flash, from pilot crawls no Flash from pilot crawls, everything else cleared, no POST data, isolated with chroot.

Code

The code developed during the study can be found at GitHub. This includes crawling infrastructure, modules for analysing browser profile data and crawl databases.

Press

Contact

Gunes Acar gunes.acar@esat.kuleuven.be
Christian Eubank cge@cs.princeton.edu
Steven Englehardt ste@cs.princeton.edu
Marc Juarez marc.juarez@esat.kuleuven.be
Arvind Narayanan arvindn@cs.princeton.edu
Claudia Diaz claudia.diaz@esat.kuleuven.be
Persistent tracking mechanisms in the wild (2024)

References

Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated:

Views: 6634

Rating: 4.1 / 5 (72 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.