Web Trackers Analysis (Alexa Top 800)
Table of Contents
- 1. Overview
- 2. Original Visualization
- 3. Key Findings (2012)
- 4. Tracking Technology Evolution
- 5. Data Structure (Historical)
- 6. Methodology
- 7. Historical Context
- 8. Related Research
- 9. Modern Context (2024-2025)
- 10. References
- 11. Notes
1. Overview
Analysis of third-party tracking scripts present on Alexa Top 800 websites, visualizing the interconnected network of advertising and analytics providers.
2. Original Visualization
Original hosted version: p.wal.sh/trackers
3. Key Findings (2012)
Major tracking networks identified:
- DoubleClick (Google)
- Comscore Beacon
- Google Analytics
- Facebook Connect
- Quantcast
- BlueKai
- Various ad networks
4. Tracking Technology Evolution
The web tracking landscape has transformed dramatically since 2012. Third-party cookies dominated the original research period; regulatory pressure, browser-level blocking, and ad blocker adoption have since pushed the industry toward server-side and first-party strategies. The diagram below maps that transition from the 2012 mechanisms on the left to the 2024 replacements on the right.
5. Data Structure (Historical)
The original 2012 visualization was generated using Graphviz to show which tracker networks appeared on the top Alexa sites.
6. Methodology
- Crawled Alexa Top 800 sites
- Extracted third-party JavaScript includes
- Categorized by tracker type (analytics, advertising, social)
- Generated relationship graph using Graphviz
7. Historical Context
This research predates:
- GDPR (2018)
- CCPA (2020)
- Safari ITP (2017)
- Firefox ETP (2019)
- Chrome's Privacy Sandbox initiative
The tracking landscape has evolved significantly with:
- Cookie consent requirements
- Third-party cookie deprecation plans
- Privacy-focused browsers
- Ad blocker adoption
9. Modern Context (2024-2025)
The web tracking landscape has transformed dramatically since this original 2012 research. This section provides current context for understanding how tracking has evolved.
9.1. The Death of Third-Party Cookies
The foundational technology of 2012-era tracking is being phased out:
- Chrome Privacy Sandbox: Google's multi-year initiative to replace third-party cookies with privacy-preserving APIs (Topics API, Attribution Reporting API, Protected Audience API)
- Safari ITP (Intelligent Tracking Prevention): Blocks third-party cookies by default since 2020
- Firefox ETP (Enhanced Tracking Protection): Strict mode blocks all third-party cookies
- Brave, DuckDuckGo browsers: Aggressive blocking of all tracking by default
9.2. Modern Tracking Methods
9.2.1. Browser Fingerprinting
Replaces cookies with device/browser characteristics:
- Canvas fingerprinting (WebGL, 2D canvas)
- Audio context fingerprinting
- Font enumeration
- Screen resolution, timezone, language
- WebRTC IP leaks
- Hardware concurrency, device memory
9.2.2. First-Party Data Strategies
Companies now prioritize owned data:
- Customer Data Platforms (CDPs): Segment, mParticle, Tealium
- Server-side tagging: Google Tag Manager Server-Side, Cloudflare Zaraz
- First-party cookies set via CNAME cloaking
- Authenticated user tracking via login walls
9.2.3. Server-Side Tracking
Moves tracking from browser to server:
- Facebook Conversions API
- Google Enhanced Conversions
- TikTok Events API
- Removes client-side blockers from the equation
9.2.4. Probabilistic Identity Resolution
Links users across devices without deterministic IDs:
- LiveRamp IdentityLink
- The Trade Desk Unified ID 2.0
- ID5 Universal ID
- Contextual targeting resurgence
9.3. Privacy Regulations
9.3.1. GDPR (2016/2018 - EU)
- Explicit consent required before tracking
- Right to access, delete personal data
- Fines up to 4% of global revenue
- Cookie banners became ubiquitous
9.3.2. CCPA/CPRA (2020/2023 - California)
- "Do Not Sell My Personal Information" requirements
- Opt-out rights for data sharing
- Private right of action for data breaches
9.3.3. Other Regulations
- LGPD (Brazil), POPIA (South Africa), PDPA (Thailand)
- US State laws: Virginia, Colorado, Connecticut, Utah
- ePrivacy Regulation (EU, pending)
9.4. Google Analytics Evolution
9.4.1. Universal Analytics (2012-2023)
- The dominant analytics platform during original research
- Cookie-based, session-oriented
- Officially sunset July 2023
9.4.2. Google Analytics 4 (GA4)
- Event-based data model
- Machine learning for gap-filling when cookies blocked
- Privacy-centric design (no IP storage)
- Consent Mode integration
- Server-side GTM recommended
9.5. Ad Blockers and Privacy Tools
Mainstream adoption has disrupted tracking:
- uBlock Origin: Most popular blocker, ~20% desktop users
- AdGuard: Mobile and desktop
- Pi-hole: Network-level blocking
- NextDNS: DNS-based blocking
- Browser built-in: Brave Shields, Firefox ETP, Safari
9.6. Current Major Trackers (2024-2025)
Compared to 2012, the landscape has consolidated:
| Tracker | 2012 Status | 2024 Status |
|---|---|---|
| DoubleClick | Active | Merged into Google Marketing Platform |
| Google Analytics | Growing | GA4 (Universal Analytics deprecated) |
| Facebook Pixel | Emerging | Meta Pixel + Conversions API |
| Comscore | Dominant | Declining market share |
| Quantcast | Active | Pivoted to AI/ML advertising |
| BlueKai | Active | Acquired by Oracle (Oracle Data Cloud) |
| Amazon | Minimal | Major player (Amazon Ads) |
| TikTok Pixel | N/A | Rapidly growing |
| Apple SKAdNetwork | N/A | iOS attribution standard |
9.7. Research Implications
The 2012 visualization represents a simpler era:
- Third-party cookies worked reliably
- Browser fingerprinting was rare
- Regulations were minimal
- Ad blockers had low adoption
Modern research would need to account for:
- Server-side tracking invisible to crawlers
- First-party cookie masquerading
- Consent management platform (CMP) complexity
- Regional variation in consent requirements
10. References
- Alexa Top Sites (historical, discontinued 2022)
- Ghostery tracker database
- EFF Privacy Badger research
- W3C Privacy Interest Group (PING)
- IAB Europe Transparency and Consent Framework
- Google Privacy Sandbox documentation
- Web Almanac - HTTP Archive annual reports
11. Notes
- Original data from October 2012
- Graphviz 2.12 used for visualization
- SVG preserved for historical reference
- Modern context added January 2025