Web Trackers Analysis (Alexa Top 800)

Table of Contents

Overview

Analysis of third-party tracking scripts present on Alexa Top 800 websites, visualizing the interconnected network of advertising and analytics providers.

Original Visualization

trackers-alexa-800.svg

Original hosted version: p.wal.sh/trackers

Key Findings (2012)

Major tracking networks identified:

  • DoubleClick (Google)
  • Comscore Beacon
  • Google Analytics
  • Facebook Connect
  • Quantcast
  • BlueKai
  • Various ad networks

Data Structure

The visualization was generated using Graphviz. The underlying data structure:

Methodology

  1. Crawled Alexa Top 800 sites
  2. Extracted third-party JavaScript includes
  3. Categorized by tracker type (analytics, advertising, social)
  4. Generated relationship graph using Graphviz

Historical Context

This research predates:

  • GDPR (2018)
  • CCPA (2020)
  • Safari ITP (2017)
  • Firefox ETP (2019)
  • Chrome's Privacy Sandbox initiative

The tracking landscape has evolved significantly with:

  • Cookie consent requirements
  • Third-party cookie deprecation plans
  • Privacy-focused browsers
  • Ad blocker adoption

Related Research

Modern Context (2024-2025)

The web tracking landscape has transformed dramatically since this original 2012 research. This section provides current context for understanding how tracking has evolved.

The Death of Third-Party Cookies

The foundational technology of 2012-era tracking is being phased out:

  • Chrome Privacy Sandbox: Google's multi-year initiative to replace third-party cookies with privacy-preserving APIs (Topics API, Attribution Reporting API, Protected Audience API)
  • Safari ITP (Intelligent Tracking Prevention): Blocks third-party cookies by default since 2020
  • Firefox ETP (Enhanced Tracking Protection): Strict mode blocks all third-party cookies
  • Brave, DuckDuckGo browsers: Aggressive blocking of all tracking by default

Modern Tracking Methods

Browser Fingerprinting

Replaces cookies with device/browser characteristics:

  • Canvas fingerprinting (WebGL, 2D canvas)
  • Audio context fingerprinting
  • Font enumeration
  • Screen resolution, timezone, language
  • WebRTC IP leaks
  • Hardware concurrency, device memory

First-Party Data Strategies

Companies now prioritize owned data:

  • Customer Data Platforms (CDPs): Segment, mParticle, Tealium
  • Server-side tagging: Google Tag Manager Server-Side, Cloudflare Zaraz
  • First-party cookies set via CNAME cloaking
  • Authenticated user tracking via login walls

Server-Side Tracking

Moves tracking from browser to server:

  • Facebook Conversions API
  • Google Enhanced Conversions
  • TikTok Events API
  • Removes client-side blockers from the equation

Probabilistic Identity Resolution

Links users across devices without deterministic IDs:

  • LiveRamp IdentityLink
  • The Trade Desk Unified ID 2.0
  • ID5 Universal ID
  • Contextual targeting resurgence

Privacy Regulations

GDPR (2016/2018 - EU)

  • Explicit consent required before tracking
  • Right to access, delete personal data
  • Fines up to 4% of global revenue
  • Cookie banners became ubiquitous

CCPA/CPRA (2020/2023 - California)

  • "Do Not Sell My Personal Information" requirements
  • Opt-out rights for data sharing
  • Private right of action for data breaches

Other Regulations

  • LGPD (Brazil), POPIA (South Africa), PDPA (Thailand)
  • US State laws: Virginia, Colorado, Connecticut, Utah
  • ePrivacy Regulation (EU, pending)

Google Analytics Evolution

Universal Analytics (2012-2023)

  • The dominant analytics platform during original research
  • Cookie-based, session-oriented
  • Officially sunset July 2023

Google Analytics 4 (GA4)

  • Event-based data model
  • Machine learning for gap-filling when cookies blocked
  • Privacy-centric design (no IP storage)
  • Consent Mode integration
  • Server-side GTM recommended

Ad Blockers and Privacy Tools

Mainstream adoption has disrupted tracking:

  • uBlock Origin: Most popular blocker, ~20% desktop users
  • AdGuard: Mobile and desktop
  • Pi-hole: Network-level blocking
  • NextDNS: DNS-based blocking
  • Browser built-in: Brave Shields, Firefox ETP, Safari

Current Major Trackers (2024-2025)

Compared to 2012, the landscape has consolidated:

Tracker 2012 Status 2024 Status
DoubleClick Active Merged into Google Marketing Platform
Google Analytics Growing GA4 (Universal Analytics deprecated)
Facebook Pixel Emerging Meta Pixel + Conversions API
Comscore Dominant Declining market share
Quantcast Active Pivoted to AI/ML advertising
BlueKai Active Acquired by Oracle (Oracle Data Cloud)
Amazon Minimal Major player (Amazon Ads)
TikTok Pixel N/A Rapidly growing
Apple SKAdNetwork N/A iOS attribution standard

Research Implications

The 2012 visualization represents a simpler era:

  • Third-party cookies worked reliably
  • Browser fingerprinting was rare
  • Regulations were minimal
  • Ad blockers had low adoption

Modern research would need to account for:

  • Server-side tracking invisible to crawlers
  • First-party cookie masquerading
  • Consent management platform (CMP) complexity
  • Regional variation in consent requirements

References

  • Alexa Top Sites (historical, discontinued 2022)
  • Ghostery tracker database
  • EFF Privacy Badger research
  • W3C Privacy Interest Group (PING)
  • IAB Europe Transparency and Consent Framework
  • Google Privacy Sandbox documentation
  • Web Almanac - HTTP Archive annual reports

Notes

  • Original data from October 2012
  • Graphviz 2.12 used for visualization
  • SVG preserved for historical reference
  • Modern context added January 2025

Author: Jason Walsh

j@wal.sh

Last Updated: 2026-01-10 17:13:13

build: 2026-01-11 18:30 | sha: 48a6da1