Web Trackers Analysis (Alexa Top 800)
Table of Contents
- Overview
- Original Visualization
- Key Findings (2012)
- Data Structure
- Methodology
- Historical Context
- Related Research
- Modern Context (2024-2025)
- References
- Notes
Overview
Analysis of third-party tracking scripts present on Alexa Top 800 websites, visualizing the interconnected network of advertising and analytics providers.
Original Visualization
Original hosted version: p.wal.sh/trackers
Key Findings (2012)
Major tracking networks identified:
- DoubleClick (Google)
- Comscore Beacon
- Google Analytics
- Facebook Connect
- Quantcast
- BlueKai
- Various ad networks
Data Structure
The visualization was generated using Graphviz. The underlying data structure:
Methodology
- Crawled Alexa Top 800 sites
- Extracted third-party JavaScript includes
- Categorized by tracker type (analytics, advertising, social)
- Generated relationship graph using Graphviz
Historical Context
This research predates:
- GDPR (2018)
- CCPA (2020)
- Safari ITP (2017)
- Firefox ETP (2019)
- Chrome's Privacy Sandbox initiative
The tracking landscape has evolved significantly with:
- Cookie consent requirements
- Third-party cookie deprecation plans
- Privacy-focused browsers
- Ad blocker adoption
Related Research
Modern Context (2024-2025)
The web tracking landscape has transformed dramatically since this original 2012 research. This section provides current context for understanding how tracking has evolved.
The Death of Third-Party Cookies
The foundational technology of 2012-era tracking is being phased out:
- Chrome Privacy Sandbox: Google's multi-year initiative to replace third-party cookies with privacy-preserving APIs (Topics API, Attribution Reporting API, Protected Audience API)
- Safari ITP (Intelligent Tracking Prevention): Blocks third-party cookies by default since 2020
- Firefox ETP (Enhanced Tracking Protection): Strict mode blocks all third-party cookies
- Brave, DuckDuckGo browsers: Aggressive blocking of all tracking by default
Modern Tracking Methods
Browser Fingerprinting
Replaces cookies with device/browser characteristics:
- Canvas fingerprinting (WebGL, 2D canvas)
- Audio context fingerprinting
- Font enumeration
- Screen resolution, timezone, language
- WebRTC IP leaks
- Hardware concurrency, device memory
First-Party Data Strategies
Companies now prioritize owned data:
- Customer Data Platforms (CDPs): Segment, mParticle, Tealium
- Server-side tagging: Google Tag Manager Server-Side, Cloudflare Zaraz
- First-party cookies set via CNAME cloaking
- Authenticated user tracking via login walls
Server-Side Tracking
Moves tracking from browser to server:
- Facebook Conversions API
- Google Enhanced Conversions
- TikTok Events API
- Removes client-side blockers from the equation
Probabilistic Identity Resolution
Links users across devices without deterministic IDs:
- LiveRamp IdentityLink
- The Trade Desk Unified ID 2.0
- ID5 Universal ID
- Contextual targeting resurgence
Privacy Regulations
GDPR (2016/2018 - EU)
- Explicit consent required before tracking
- Right to access, delete personal data
- Fines up to 4% of global revenue
- Cookie banners became ubiquitous
CCPA/CPRA (2020/2023 - California)
- "Do Not Sell My Personal Information" requirements
- Opt-out rights for data sharing
- Private right of action for data breaches
Other Regulations
- LGPD (Brazil), POPIA (South Africa), PDPA (Thailand)
- US State laws: Virginia, Colorado, Connecticut, Utah
- ePrivacy Regulation (EU, pending)
Google Analytics Evolution
Universal Analytics (2012-2023)
- The dominant analytics platform during original research
- Cookie-based, session-oriented
- Officially sunset July 2023
Google Analytics 4 (GA4)
- Event-based data model
- Machine learning for gap-filling when cookies blocked
- Privacy-centric design (no IP storage)
- Consent Mode integration
- Server-side GTM recommended
Ad Blockers and Privacy Tools
Mainstream adoption has disrupted tracking:
- uBlock Origin: Most popular blocker, ~20% desktop users
- AdGuard: Mobile and desktop
- Pi-hole: Network-level blocking
- NextDNS: DNS-based blocking
- Browser built-in: Brave Shields, Firefox ETP, Safari
Current Major Trackers (2024-2025)
Compared to 2012, the landscape has consolidated:
| Tracker | 2012 Status | 2024 Status |
|---|---|---|
| DoubleClick | Active | Merged into Google Marketing Platform |
| Google Analytics | Growing | GA4 (Universal Analytics deprecated) |
| Facebook Pixel | Emerging | Meta Pixel + Conversions API |
| Comscore | Dominant | Declining market share |
| Quantcast | Active | Pivoted to AI/ML advertising |
| BlueKai | Active | Acquired by Oracle (Oracle Data Cloud) |
| Amazon | Minimal | Major player (Amazon Ads) |
| TikTok Pixel | N/A | Rapidly growing |
| Apple SKAdNetwork | N/A | iOS attribution standard |
Research Implications
The 2012 visualization represents a simpler era:
- Third-party cookies worked reliably
- Browser fingerprinting was rare
- Regulations were minimal
- Ad blockers had low adoption
Modern research would need to account for:
- Server-side tracking invisible to crawlers
- First-party cookie masquerading
- Consent management platform (CMP) complexity
- Regional variation in consent requirements
References
- Alexa Top Sites (historical, discontinued 2022)
- Ghostery tracker database
- EFF Privacy Badger research
- W3C Privacy Interest Group (PING)
- IAB Europe Transparency and Consent Framework
- Google Privacy Sandbox documentation
- Web Almanac - HTTP Archive annual reports
Notes
- Original data from October 2012
- Graphviz 2.12 used for visualization
- SVG preserved for historical reference
- Modern context added January 2025