Performance Engineering Tools Reference
Tools & Technologies for "Performance Engineering in Practice" (Den Odell, Manning)

Table of Contents

1. Introduction

This document catalogs the key tools and technologies relevant to performance engineering as covered in Performance Engineering in Practice by Den Odell (Manning, 2026). Tools are organized by category and mapped to the book's chapters to help practitioners select the right tool for each stage of the performance engineering lifecycle.

The book's "Fast by Default" model emphasizes prevention over reaction: budgets, gates, and automated checks embedded into the development process. The tools below support that model across web, backend, mobile, and desktop platforms.

2. Profiling & Observability

Profiling and observability tools help engineers understand where time and resources are spent. They are central to chapters on identifying critical paths (Ch 4), catching slow code before it ships (Ch 7), catching runtime issues (Ch 9), and owning and observing performance in production (Ch 13).

2.1. Browser Profiling

2.1.1. Chrome DevTools

Chrome DevTools is the primary browser-based profiling tool for web performance engineers. The Performance panel provides CPU profiling, flame charts, network waterfalls, and rendering analysis.

  • Live Metrics Screen: Real-time Core Web Vitals (LCP, INP, CLS) as you interact with the page, without needing to record a trace.
  • CPU Throttling Calibration: Generates device-specific "low-tier mobile" and "mid-tier mobile" presets calibrated to your development machine (Chrome 134+).
  • Individual Request Throttling: Throttle specific network requests to simulate slow third-party resources.
  • Memory Panel: Heap snapshots, allocation timelines, and memory leak detection.
  • Lighthouse Integration: Run Lighthouse audits directly from DevTools.
  • Coverage Tab: Identify unused CSS and JavaScript.
  • AI-Assisted Insights: Automatic labeling and insight generation for performance traces.

Book relevance: Used in Ch 7 (profiling during development), Ch 17 (frontend web recipes for LCP/INP/CLS optimization), and Ch 3 (measuring Core Web Vitals as user-centered goals).

2.1.2. Lighthouse

Lighthouse is Google's automated auditing tool for web page quality. It runs a series of tests against a page and generates a report with scores from 0 to 100 across five categories: Performance, Accessibility, Best Practices, SEO, and Progressive Web Apps.

  • Performance Audits: 38 distinct audits covering Core Web Vitals, render blocking resources, image optimization, JavaScript execution, and more.
  • Stack Packs: Platform-specific recommendations (React, Angular, WordPress, etc.) curated by community experts.
  • Multiple Execution Modes: Run from Chrome DevTools, the command line (Node module), PageSpeed Insights, or as part of CI via Lighthouse CI.
  • Custom Plugins: Extend Lighthouse with domain-specific audits.
  • Budget Assertions: Define performance budgets and assert against them.

Book relevance: Central to Ch 6 (performance budgets), Ch 9 (synthetic monitoring in staging), Ch 12 (CI/CD performance gates), and Ch 17 (frontend optimization recipes).

2.1.3. WebPageTest

WebPageTest provides deep, real-browser performance analysis from globally distributed test locations. It is considered the gold standard for detailed web performance testing.

  • Global Test Locations: Physical test machines worldwide; test from locations close to your actual users.
  • Device Emulation: Simulate mobile device constraints via CPU throttling and network profiles.
  • Filmstrip View: Visual keyframe timeline of page load with up to 9 consecutive test runs.
  • Advanced Metrics: Speed Index, TTFB, FCP, LCP, CLS, TBT, and full Core Web Vitals.
  • Waterfall Charts: Detailed request-level breakdown including DNS, TCP, TLS, and content download.
  • Experiments: Run custom experiments (e.g., block third-party scripts) to measure their performance impact.
  • API & CI Integration: Automate tests and integrate into deployment pipelines via the Catchpoint API.
  • Performance Budgets: Set budgets and alerts for continuous monitoring.

Book relevance: Critical for Ch 11 (testing under realistic conditions from multiple locations), Ch 9 (synthetic monitoring), and Ch 3 (measuring real performance with lab data).

2.2. Backend Profiling

2.2.1. async-profiler (Java)

async-profiler is a low-overhead sampling profiler for Java (HotSpot JVM) that avoids the safepoint bias problem endemic to other Java profilers. It provides accurate CPU and memory profiling for production JVM workloads.

  • CPU Profiling: Sampling modes include CPU, WALL, CTIMER, and ITIMER for different accuracy/overhead tradeoffs.
  • Memory Allocation Profiling: TLAB-driven sampling tracks object allocations without bytecode instrumentation.
  • Lock Profiling: Samples lock contention to identify synchronization bottlenecks.
  • Flame Graph Output: Native flame graph generation for visualizing hot code paths.
  • JFR Output: Compatible with Java Flight Recorder format for analysis in JDK Mission Control.
  • Low Overhead: No bytecode instrumentation; does not inhibit JIT optimizations like escape analysis or allocation elimination.
  • Native & Kernel Frames: Profiles non-Java threads (GC, JIT compiler) and includes native/kernel stack frames.
  • Platform Support: Linux and macOS.

Book relevance: Essential for Ch 7 (profiling during development) and Ch 18 (backend API recipes, database query and service optimization).

2.2.2. py-spy (Python)

py-spy is a sampling profiler for Python programs that operates out-of-process, making it safe for production profiling without code modification or restarts.

  • Out-of-Process: Written in Rust; attaches to a running Python process without injection or modification.
  • Low Overhead: Default 100 Hz sampling rate with minimal performance impact.
  • Viewing Modes: record (flame graphs, speedscope files), top (real-time function-level CPU display), dump (snapshot call stacks for deadlock debugging).
  • Native Extension Profiling: Profile C/C++/Cython extensions with the --native flag.
  • Subprocess Profiling: Include child processes (multiprocessing, gunicorn pools) with --subprocesses.
  • GIL Detection: Track Global Interpreter Lock contention with --gil.
  • Non-blocking Mode: Avoid pausing the target process with --nonblocking.
  • Platform Support: Linux, macOS, Windows, FreeBSD; CPython 2.3-2.7 and 3.3-3.13.

Book relevance: Used in Ch 7 (development profiling without code changes) and Ch 18 (optimizing Python backend services).

2.2.3. perf (Linux)

perf is the Linux kernel's built-in performance analysis tool. It provides access to hardware performance counters, tracepoints, kprobes, and uprobes with very low overhead.

  • Hardware Counters: CPU cycles, instructions, cache misses, branch mispredictions, and other PMU events.
  • Software Events: Page faults, context switches, CPU migrations.
  • Tracepoints: Thousands of kernel tracepoints for scheduler, block I/O, networking, and filesystem events.
  • Dynamic Probes: kprobes (kernel) and uprobes (userspace) for tracing arbitrary function entry/exit.
  • Statistical Profiling: Sample-based profiling of entire system (kernel + userland) with perf record / perf report.
  • Flame Graphs: Generate CPU flame graphs via perf script piped to Brendan Gregg's FlameGraph tools.
  • Cache Analysis: perf c2c for cache-to-cache transfer and false sharing detection (Linux 4.10+).
  • Power Measurement: RAPL support for CPU power consumption profiling (Linux 3.14+).
  • Benchmarking Subsystem: perf bench provides microbenchmarks for scheduler, memory, futex, and other kernel subsystems.

Book relevance: Foundational for Ch 7 (low-level CPU and memory profiling), Ch 9 (runtime performance testing), and Ch 18 (backend optimization at the system level).

2.2.4. eBPF Tools (BCC / bpftrace)

eBPF (extended Berkeley Packet Filter) enables custom programs to run safely inside the Linux kernel for observability without kernel module development. BCC and bpftrace are the two primary frontends.

  • BCC (BPF Compiler Collection): Toolkit with 100+ ready-to-use tools and a Python/Lua frontend for writing custom eBPF programs. Best for complex, reusable tools and daemons.
  • bpftrace: High-level tracing language inspired by awk and DTrace. Best for one-liners and short investigation scripts.
  • Production-Safe Tools: Many tools (execsnoop, biolatency, tcplife, tcpretrans) have low enough overhead for continuous 24/7 use.
  • Comprehensive Observability: CPU scheduling, disk I/O latency, network connections, filesystem operations, memory allocation, and application-level tracing.
  • Flame Graphs: Generate on-CPU and off-CPU flame graphs.
  • Kernel-Level Visibility: Trace kernel functions, syscalls, and hardware events without kernel recompilation.

Book relevance: Mentioned in Ch 7 (advanced profiling), Ch 9 (identifying runtime bottlenecks), Ch 13 (production observability), and Ch 18 (deep backend investigation).

2.3. Distributed Tracing

2.3.1. Jaeger

Jaeger is a CNCF graduated project for end-to-end distributed tracing in microservice architectures. Originally developed at Uber, it maps the flow of requests across service boundaries.

  • Architecture: Client libraries, agent, collector, query service, and UI – all independently scalable.
  • Storage Backends: Cassandra, Elasticsearch, Kafka, Badger, and in-memory.
  • Service Dependency Graphs: Automatic visualization of inter-service dependencies.
  • Trace Search & Analysis: UI for querying traces by service, operation, tags, duration, and time range.
  • Adaptive Sampling: Dynamically adjust sampling rates based on traffic volume.
  • OpenTelemetry Native: Full compatibility with OpenTelemetry SDKs and OTLP protocol.

Book relevance: Used in Ch 4 (identifying critical paths in backend service chains), Ch 13 (production observability with distributed traces), and Ch 18 (debugging cross-service latency).

2.3.2. Zipkin

Zipkin is a distributed tracing system originally developed by Twitter, based on the Google Dapper paper. It collects timing data to troubleshoot latency problems in service architectures.

  • Architecture: Collector (HTTP/Kafka), storage (Cassandra, Elasticsearch, MySQL), query service, and web UI.
  • Trace Visualization: Service graphs showing dependencies; drill into individual spans with timing details.
  • Broad Language Support: Client libraries for Java, JavaScript, Python, Go, Ruby, C#, and more.
  • Configurable Sampling: Control the sampling rate to balance detail against overhead.
  • OpenTelemetry Compatible: Works as an exporter backend for OpenTelemetry instrumented applications.
  • Lightweight: Simpler architecture than Jaeger; good for smaller deployments.

Book relevance: Used in Ch 4 (tracing cross-service request paths), Ch 13 (production trace analysis), and Ch 18 (backend latency investigation).

2.3.3. OpenTelemetry

OpenTelemetry (OTel) is the CNCF standard for observability, unifying metrics, logs, and traces under a single vendor-neutral framework. It has become the default observability standard for cloud-native systems.

  • Unified Telemetry: Single SDK and protocol (OTLP) for metrics, traces, and logs.
  • Language Support: Native SDKs for 12+ languages including Java, Python, Go, JavaScript, .NET, Ruby, Rust, C++, Swift, and Erlang.
  • Vendor Neutral: Instrument once, export to any backend (Jaeger, Zipkin, Datadog, New Relic, Grafana, Prometheus, etc.) without code changes.
  • Auto-Instrumentation: Zero-code instrumentation for popular frameworks and libraries in several languages.
  • Collector: Standalone service for receiving, processing, and exporting telemetry data with filtering, batching, and routing.
  • Context Propagation: Automatic propagation of trace context across service boundaries and message queues.
  • Semantic Conventions: Standardized attribute names for HTTP, database, RPC, messaging, and other domains.

Book relevance: Foundation for Ch 13 (observability), Ch 4 (tracing critical paths), Ch 7 (instrumenting code during development), Ch 15 (future-proofing with vendor-neutral telemetry), and Ch 18 (backend recipes).

2.4. Application Performance Monitoring (APM)

2.4.1. Datadog

Datadog APM provides AI-powered code-level distributed tracing from browser and mobile applications through backend services and databases.

  • Distributed Tracing: End-to-end request traces with automatic service discovery and dependency mapping.
  • Continuous Profiling: Always-on code-level profiling to identify the most resource-consuming functions.
  • Error Tracking: Automatic exception capture with trace correlation.
  • Service Maps: Visual representation of request flow across architecture.
  • RUM Integration: Correlate backend traces with Real User Monitoring sessions.
  • Ingestion Controls: Sampling rate and retention filter management.
  • CI/CD Quality Gates: Deployment protection rules via Datadog monitors integrated with GitHub Actions.
  • Alerting: APM-specific monitors for hits, errors, and latency measures at service level.

Book relevance: Used in Ch 13 (production observability and alerting), Ch 12 (CI/CD quality gates using Datadog monitors), and Ch 9 (runtime issue detection).

2.4.2. New Relic

New Relic is a full-stack observability platform that combines APM, infrastructure monitoring, log management, browser and mobile monitoring, and synthetic checks.

  • Intelligent Workloads: Automatic discovery and mapping of complex dependencies for 360-degree performance views.
  • AI Insights (NRAI): Transforms telemetry data into plain-language actionable insights.
  • Digital Experience Monitoring: Browser, mobile, and synthetic monitoring including micro-frontend (MFE) support.
  • Distributed Tracing: Cross-service trace analysis with OpenTelemetry compatibility.
  • New Relic Lens: Query external data sources (Postgres, Snowflake) alongside telemetry data.
  • Service Level Management: Define and track SLIs/SLOs with automated alerting.
  • NRQL: Powerful query language for ad hoc performance analysis.

Book relevance: Used in Ch 13 (production observability and SLO tracking), Ch 9 (runtime performance analysis), and Ch 14 (scaling performance culture with shared dashboards).

2.4.3. Grafana (Stack)

Grafana is an open, composable observability platform that unifies metrics (Prometheus/Mimir), logs (Loki), traces (Tempo), and continuous profiling (Pyroscope) with powerful visualization dashboards.

  • Dashboard Visualization: Industry-standard dashboards for metrics, logs, traces, and profiles with rich query editors.
  • Application Observability: Out-of-the-box solution supporting both OpenTelemetry and Prometheus natively.
  • Knowledge Graph: Connects metrics, logs, traces, and profiles into a single intelligent system map.
  • Continuous Profiling: Adaptive profiling that scales data collection based on workload behavior (via Pyroscope).
  • Alerting: Unified alerting with support for Slack, PagerDuty, email, and custom webhooks.
  • AI-Powered Investigation: Grafana Assistant accelerates multi-step incident investigations.
  • k6 Integration: Native integration with Grafana k6 for load test result visualization.
  • Plugin Ecosystem: Hundreds of data source and panel plugins.

Book relevance: Central to Ch 13 (performance dashboards and observability), Ch 12 (CI/CD result visualization), Ch 14 (shared performance dashboards across teams), and Ch 9 (runtime monitoring).

3. Load Testing & Benchmarking

Load testing tools simulate realistic traffic to identify bottlenecks, breaking points, and regression under load. They are central to Ch 9 (runtime performance testing), Ch 11 (testing like the real world), and Ch 18 (backend API recipes).

3.1. Full-Featured Load Testing Frameworks

3.1.1. k6 (Grafana)

k6 is an open-source load testing tool by Grafana Labs with test scripts written in JavaScript and a high-performance engine written in Go.

  • JavaScript/TypeScript Tests: Write load tests as code in a familiar language.
  • Protocol Support: HTTP/1.1, HTTP/2, WebSockets, gRPC, and browser-level testing.
  • Checks & Thresholds: Define pass/fail criteria (e.g., P95 latency < 500ms, error rate < 1%) that integrate with CI.
  • Scenarios: Model complex traffic patterns with ramping VUs, constant arrival rate, and more.
  • Extensions: Plugin system for custom protocols and output backends.
  • Result Export: Output to Prometheus, Datadog, New Relic, InfluxDB, Timescale, and more.
  • Cloud Execution: Scale to 1M concurrent VUs / 5M req/s via Grafana Cloud.
  • CI/CD Integration: First-class GitHub Actions support.

Book relevance: Primary tool for Ch 9 (load testing), Ch 11 (realistic traffic simulation), Ch 12 (CI/CD integration), and Ch 18 (backend API stress testing).

3.1.2. Locust

Locust is a Python-based load testing framework that defines user behavior in regular Python code and can swarm systems with millions of simultaneous users.

  • Python Test Scripts: Write tests as plain Python classes with full access to the Python ecosystem.
  • Web UI: Real-time dashboard showing throughput, response times, and errors; adjust load while tests run.
  • Distributed Mode: Scale across multiple machines for high-throughput testing.
  • Event-Based: Uses gevent for efficient concurrency; a single process handles thousands of concurrent users.
  • Extensible: Plugin architecture for custom protocols beyond HTTP.
  • CI/CD Support: Headless mode (no UI) for automated pipeline execution.

Book relevance: Listed in Ch 9 as a recommended load testing tool alongside k6 and Gatling; useful for Ch 11 (realistic user journey simulation) and Ch 18 (Python backend load testing).

3.1.3. Gatling

Gatling is a high-performance load testing tool with an expressive DSL available in Java, Kotlin, and Scala, built on an asynchronous non-blocking architecture.

  • Multi-Language DSL: Write simulations in Java, Kotlin, or Scala with IDE support and type safety.
  • Asynchronous Architecture: Non-blocking I/O model capable of simulating large numbers of concurrent users with modest hardware.
  • Protocol Support: HTTP, HTTPS, WebSocket, JMS, SMTP.
  • HTTP Recorder: Capture browser actions or convert HAR files into test scenarios.
  • Detailed Reports: Automatic HTML reports with response time distributions, percentiles, and throughput charts.
  • CI/CD Integration: Maven/Gradle plugins; integrates with Jenkins, GitLab CI, and other pipelines.

Book relevance: Listed in Ch 9 as a recommended load testing tool; strong choice for JVM-based backend services covered in Ch 18.

3.1.4. Artillery

Artillery is a full-stack performance and reliability testing platform that combines load testing with browser-based testing via Playwright integration.

  • YAML Test Definitions: Simple, declarative test configuration in YAML files.
  • Protocol Support: HTTP, WebSocket, Socket.IO, GraphQL, and Playwright (browser-level).
  • Complex Scenarios: Request chains, multi-step transactions, and variable extraction.
  • Playwright Integration: Browser-based load testing for end-to-end user flows.
  • Distributed Execution: Serverless distributed architecture on AWS Fargate or Azure ACI.
  • 20+ Integrations: Monitoring, observability, and CI/CD integrations out of the box.
  • Node.js Ecosystem: Extensible with any Node.js module.

Book relevance: Listed in Ch 9 as a recommended tool; Playwright integration relevant to Ch 11 (browser-level real-world testing) and Ch 12 (CI/CD integration).

3.2. Lightweight HTTP Benchmarking Tools

3.2.1. wrk

wrk is an ultra-high-performance HTTP benchmarking tool written in C, designed for raw throughput measurement and stress testing.

  • Extreme Performance: 10x faster than Gatling, 100x+ faster than Artillery in throughput generation.
  • Multi-Threaded: Uses multiple threads and connections for maximum load generation.
  • Lua Scripting: Extend tests with Lua scripts for custom request generation, headers, and body content.
  • Minimal Footprint: Single binary with no dependencies.
  • Use Case: Best for flooding endpoints with high request volumes to test server capacity and connection handling.

Book relevance: Useful for Ch 7 (quick benchmarks during development), Ch 9 (stress testing), and Ch 18 (backend throughput testing).

3.2.2. hey

hey is an HTTP(S) load generator written in Go, serving as a modern replacement for ApacheBench (ab).

  • Simple CLI: Straightforward command-line interface for quick HTTP benchmarks.
  • Latency Histograms: Displays latency distribution including percentiles.
  • Concurrent Requests: Configurable concurrency and total request count.
  • Go-Based: Cross-platform single binary.

Book relevance: Quick development-time benchmarking for Ch 7 and Ch 18.

3.2.3. vegeta

vegeta is an HTTP load testing tool written in Go, designed for constant-rate load generation rather than concurrent-user models.

  • Constant Rate Testing: Maintains precise request rates (requests/second) for consistent, reproducible tests.
  • Detailed Latency Distributions: Captures and reports full latency histograms at all percentiles.
  • Pipeline-Friendly: Command-line first design; reads targets from stdin, outputs results in various formats.
  • CI-Friendly: Lightweight, automatable, integrates cleanly into CI workflows.
  • Low Resource Usage: High throughput with minimal memory consumption.

Book relevance: Ideal for Ch 9 (constant-rate load testing), Ch 12 (CI/CD integration), and Ch 18 (API SLO validation).

3.3. Microbenchmarking Harnesses

3.3.1. JMH (Java Microbenchmark Harness)

JMH is the standard Java microbenchmarking harness from the OpenJDK project, designed to produce reliable benchmark results despite JVM warmup, JIT compilation, and other optimizations.

  • JVM Optimization Control: Handles warmup iterations, JIT compilation effects, dead code elimination, and constant folding.
  • Benchmark Modes: Throughput, average time, sample time, and single-shot time.
  • Parameterization: @Param annotation for testing with multiple input values.
  • Profiler Integration: Built-in profilers for GC, stack sampling, and JFR integration.
  • Warm-up Control: Configurable warmup iterations (@Warmup), measurement iterations (@Measurement), and forking (@Fork).
  • State Management: @Setup and @TearDown annotations with configurable scope (trial, iteration, invocation).
  • Annotation-Driven: Declare benchmarks with @Benchmark; automatic harness generation via Maven.

Book relevance: Essential for Ch 7 (micro-benchmarking hot code paths) and Ch 18 (Java backend optimization).

3.3.2. BenchmarkDotNet (.NET)

BenchmarkDotNet is the standard .NET benchmarking library, transforming methods into benchmarks with reliable, statistically rigorous measurements.

  • Automatic Parameter Tuning: Automatically selects iteration counts, warmup durations, and other parameters for reliable results.
  • Error Detection: Warns about DEBUG mode, attached debuggers, hypervisors, and other environment issues that can skew results.
  • Memory Diagnostics: MemoryDiagnoser tracks GC collections and allocation rates per benchmark.
  • Threading Diagnostics: ThreadingDiagnoser reports lock contention and thread pool statistics.
  • JIT Diagnostics: InliningDiagnoser and DisassemblyDiagnoser for inspecting generated machine code.
  • Multi-Environment: Test across .NET Framework, .NET Core, Mono, and NativeAOT in a single run.
  • Statistical Engine: Uses the perfolizer statistical engine for confidence intervals and outlier detection.
  • Report Generation: Outputs text, HTML, CSV, and Markdown reports.

Book relevance: The .NET equivalent of JMH for Ch 7 (development-time benchmarking) and Ch 18 (backend .NET service optimization).

4. Bundle Analysis & Size Checking

Bundle analysis tools help engineers understand what code ships to users and enforce size budgets. They are central to Ch 8 (catching size problems during builds) and Ch 6 (turning performance goals into budgets).

4.1. Bundle Visualization

4.1.1. webpack-bundle-analyzer

webpack-bundle-analyzer is a webpack plugin and CLI utility that visualizes bundle contents as an interactive, zoomable treemap.

  • Three Size Metrics: Reports stat size (pre-transform), parsed size (post-minification), and gzip size.
  • Three Modes: Server mode (interactive HTTP server), static mode (single HTML file), and JSON mode.
  • Interactive Treemap: Zoom into modules and chunks to identify unexpectedly large dependencies.
  • Chunk Analysis: Shows how code is split across chunks and which modules appear in multiple chunks (duplication).

Book relevance: Primary tool for Ch 8 (bundle analysis and identifying large dependencies) and Ch 17 (JavaScript performance optimization).

4.1.2. source-map-explorer

source-map-explorer analyzes JavaScript bundle composition using source maps, providing accurate attribution of bundle size to original source files.

  • Source Map Based: Uses generated source maps for precise size attribution, even through complex build pipelines.
  • Framework Accurate: More accurate than webpack-bundle-analyzer for frameworks (like Angular) that add build transformations on top of webpack.
  • Treemap Visualization: Interactive visualization similar to webpack-bundle-analyzer.
  • Multiple Output Formats: HTML, JSON, and TSV output.

Book relevance: Used in Ch 8 (bundle analysis) and Ch 17 (frontend optimization), especially for Angular and other framework builds.

4.2. Size Budget Enforcement

4.2.1. bundlesize

bundlesize enforces file size limits on build output, failing CI builds when bundles exceed configured thresholds. Now in maintenance mode; consider bundlewatch as its successor.

  • Gzip Comparison: Compresses files before comparing to configured limits.
  • GitHub PR Status: Creates pass/fail status checks on pull requests.
  • Glob Patterns: Match multiple files with glob syntax.
  • CI Support: Travis CI, CircleCI, Wercker, and Drone.

Book relevance: Directly implements size budgets from Ch 6 and build-time enforcement from Ch 8 and Ch 12.

4.2.2. bundlewatch

bundlewatch is the community-driven successor to bundlesize, providing active maintenance and additional CI platform support.

  • PR Status Checks: Posts build status with file-by-file breakdown on pull requests.
  • CI Support: Travis CI, CircleCI, Wercker, Drone, and GitHub Actions.
  • Configuration: YAML or JSON configuration with per-file size limits.
  • Detailed Breakdown: Results file showing each matched file and its size relative to the budget.

Book relevance: Modern replacement for bundlesize; implements Ch 6 (size budgets), Ch 8 (build-time size checking), and Ch 12 (CI/CD gates).

4.2.3. Lighthouse CI (Budget Assertions)

Lighthouse CI includes budget assertion capabilities that go beyond simple file size checking to encompass full performance metric budgets.

  • Performance Metric Budgets: Assert on LCP, TBT, CLS, Speed Index, and other Lighthouse metrics, not just file sizes.
  • Resource Budgets: Set limits on number of requests, script count, third-party resource count, etc.
  • Size Budgets: Total page weight, JavaScript bytes, image bytes.
  • Historical Comparison: Compare against previous builds to detect regressions.

See the Lighthouse CI entry under CI/CD Integration for full details.

5. Real User Monitoring (RUM)

RUM captures performance data from real users in production, providing field data that complements synthetic/lab testing. It is central to Ch 3 (field data for user-centered goals), Ch 13 (production observability), and Ch 17 (optimizing actual user experience).

5.1. Browser APIs & Libraries

5.1.1. web-vitals Library

The web-vitals library is a tiny (~2KB brotli'd) JavaScript library from Google for accurately measuring Core Web Vitals on real users, matching Chrome's internal measurement methodology.

  • Core Web Vitals: LCP (Largest Contentful Paint), INP (Interaction to Next Paint), CLS (Cumulative Layout Shift).
  • Additional Metrics: TTFB (Time to First Byte), FCP (First Contentful Paint), FID (First Input Delay, legacy).
  • Accurate Measurement: Matches how metrics are measured by Chrome and reported to CrUX, PageSpeed Insights, and Search Console.
  • Buffered Flag: Uses PerformanceObserver's buffered flag to capture entries that occurred before the library loaded.
  • Attribution Build: Extended build with debugging attribution data to identify the cause of poor scores.
  • Modular: Import only the metrics you need; tree-shakable.
  • Framework Agnostic: Works with any web framework or vanilla JavaScript.

Book relevance: Essential for Ch 3 (measuring field data for user-centered performance goals), Ch 13 (RUM data in production), and Ch 17 (diagnosing Core Web Vitals issues on real users).

Implementation status: Prototype at web-vitals-demo.html. Tracked in bead www.wal.sh-gxuj. Phase 1 is console-only with attribution build. Phase 2 will beacon to beacon.termbox.org (our endpoint).

5.1.2. PerformanceObserver API

The PerformanceObserver API is a browser-native interface for observing performance measurement events and receiving notifications as new entries are recorded in the browser's performance timeline.

  • Asynchronous Notifications: Delivered during browser idle time; does not compete with critical rendering work.
  • Buffered Historical Access: Retrieve performance entries that occurred before the observer was created with buffered: true.
  • Entry Types: Navigation timing, resource timing, paint timing (FP, FCP), largest-contentful-paint, long-animation-frames, user timing (marks and measures), and element timing.
  • No Buffer Limits: Not subject to the 150-item default buffer limit.
  • Modern Metric Access: Many modern metrics (LCP, long animation frames) are only available through PerformanceObserver, not the performance object.
  • Browser Support: Available across all major browsers since January 2020.

Book relevance: Underlying API used by the web-vitals library; understanding it directly is relevant to Ch 3 (custom metric measurement), Ch 13 (building custom RUM solutions), and Ch 17 (frontend performance recipes).

5.2. RUM Platforms

5.2.1. SpeedCurve

SpeedCurve combines synthetic monitoring and Real User Monitoring (RUM) with business metric correlation and competitive benchmarking.

  • Synthetic Testing: Scheduled tests with Lighthouse scores, custom browser profiles, and on-demand testing of any URL.
  • RUM (LUX): Field data collection with session-level drill-down, segment by device, geography, and connection type.
  • User Happiness Metric: Composite metric combining multiple performance signals into a user experience score.
  • Business Correlation: Correlate performance data with conversion rate, bounce rate, and other business metrics.
  • Competitive Benchmarking: Test competitor sites alongside your own.
  • Deploy API: CI integration for tracking performance impact between deployments (Jenkins, Travis, CircleCI).
  • Team Dashboards: Shareable dashboards for non-technical stakeholders.

Book relevance: Covers both lab and field data for Ch 3 (user-centered goals), Ch 11 (realistic testing), Ch 13 (production RUM), and Ch 14 (communicating performance to stakeholders).

5.2.2. Calibre

Calibre is a modern performance monitoring platform combining synthetic testing, CrUX data, and real user analytics with performance budget management.

  • Synthetic Monitoring: Scheduled tests across 17 global locations with Lighthouse integration.
  • CrUX Integration: Chrome User Experience Report data without additional tracking scripts.
  • Real User Analytics: Collect RUM data directly on your website.
  • Performance Budgets: Set budgets per metric with alerts when exceeded.
  • Pull Request Reports: Post detailed performance reports on PRs showing scores, changes, and budget status.
  • AI Test Verification: Re-tests results automatically when anomalies are detected.
  • Competitive Analysis: Benchmark against competitor pages.
  • Team Reporting: Scheduled email reports and Slack notifications.

Book relevance: Implements performance budgets (Ch 6), CI integration (Ch 12), production monitoring (Ch 13), and user-centered goal tracking (Ch 3).

6. CI/CD Performance Integration

CI/CD performance integration tools embed performance checks into the build and deployment pipeline, catching regressions before they reach users. They are central to Ch 12 (automate performance in CI/CD), Ch 8 (build-time size checks), and Ch 10 (code review integration).

6.1. CI Performance Testing Platforms

6.1.1. Lighthouse CI

Lighthouse CI (LHCI) is Google's official suite of tools for running Lighthouse audits as part of continuous integration, preventing performance regressions.

  • Automated Audits: Run Lighthouse on every commit with configurable assertions.
  • Performance Gates: Fail builds that violate performance, accessibility, SEO, or best practice thresholds.
  • Multiple Run Aggregation: Run Lighthouse multiple times per test to reduce variance.
  • Visual Diffing: Upload reports to the LHCI server for visual comparison between builds.
  • Historical Tracking: Category score history and trend analysis.
  • PR Comments: Automatic comments on pull requests with performance scores and deltas.
  • CI Platform Support: Travis CI, CircleCI, GitHub Actions, GitLab CI, Jenkins, and any Ubuntu/container-based CI.
  • Budget Assertions: Assert on Lighthouse scores, individual audit scores, and performance budgets.
  • Self-Hosted Server: On-premise server or Docker image for report storage; also offers free public temporary storage.

Book relevance: The primary CI/CD performance tool recommended across Ch 12 (pipeline automation), Ch 6 (budget enforcement), Ch 8 (size budget checks), and Ch 10 (automated PR review comments).

6.1.2. sitespeed.io

sitespeed.io is an open-source web performance analysis platform that tests with real browsers and integrates with time series databases for continuous monitoring.

  • Real Browser Testing: Chrome, Firefox, Edge, and Safari; including Chrome on Android and Safari on iOS.
  • Core Web Vitals: FCP, LCP, CLS, TBT/FID, and many additional metrics.
  • Time Series Integration: Export metrics to Graphite or InfluxDB; visualize with Grafana.
  • Video Generation: Record page load videos for visual analysis.
  • Performance Budgets: Set and enforce budgets with CI pass/fail.
  • Regression Alerts: Email, Slack, and PagerDuty alerts on regressions.
  • Accessibility Testing: Integrated Axe testing.
  • CI/CD Support: JUnit XML and TAP output for Jenkins, CircleCI, GitLab CI, and GitHub Actions.
  • Selenium Scripting: Run Selenium scripts before/after testing for authentication or setup.
  • Mobile Testing: Android and iOS device testing.

Book relevance: Comprehensive tool for Ch 12 (CI/CD performance automation), Ch 11 (real-browser real-world testing), Ch 13 (continuous monitoring), and Ch 9 (runtime testing).

6.2. GitHub Actions for Performance Gates

6.2.1. GitHub Actions + Performance Tools

GitHub Actions provides the workflow automation layer for integrating performance tools into the pull request and deployment lifecycle.

  • Lighthouse CI Action: Official action for running Lighthouse CI on every PR.
  • benchmark-action/github-action-benchmark: Continuous benchmarking action that stores results, tracks trends, and alerts on regressions.
  • k6 Action: Run k6 load tests in CI with threshold-based pass/fail.
  • bundlewatch Action: Enforce bundle size budgets with PR status checks.
  • Deployment Protection Rules: Gate deployments on performance monitor health (e.g., Datadog monitors).
  • Custom Workflows: Compose any combination of performance tools into .github/workflows/performance.yml.
  • PR Comments: Actions can post performance summaries, diffs, and regressions as PR comments for code review (Ch 10).

Book relevance: The implementation layer for Ch 12 (CI/CD performance automation) and Ch 10 (performance feedback in code review via PR comments).

7. Caching & CDN

Caching and CDN technologies implement the caching patterns described in Ch 16 (universal performance patterns) and the architecture recommendations in Ch 5 (build performance into architecture). They are also covered in Ch 4 (caching at scale) and Ch 18 (backend caching recipes).

7.1. Application-Level Caches

7.1.1. Redis

Redis is an in-memory data structure store used as a cache, database, message broker, and streaming engine, with sub-millisecond response times.

  • Rich Data Structures: Strings, hashes, lists, sets, sorted sets, streams, bitmaps, HyperLogLogs, and geospatial indexes.
  • Sub-Millisecond Latency: In-memory operations with single-digit microsecond access times.
  • Persistence: Optional RDB snapshots and AOF logs for data durability.
  • Pub/Sub & Streams: Real-time messaging and event streaming capabilities.
  • Clustering: Automatic partitioning across multiple nodes.
  • Lua Scripting: Server-side scripting for atomic multi-operation transactions.
  • TTL Support: Per-key expiration for cache invalidation.
  • Versatility: Functions as cache, session store, rate limiter, leaderboard, and more.

Book relevance: Primary application cache in Ch 18 (caching strategies with Redis), Ch 16 (cache-aside pattern), Ch 4 (caching at scale), and Ch 5 (architecture-level caching decisions).

Note: As of March 2025, Redis 8.0 Community Edition moved to AGPLv3. Evaluate license implications for your organization; alternatives include Valkey (Linux Foundation fork, BSD license) and KeyDB.

7.1.2. Memcached

Memcached is a high-performance, distributed memory caching system designed for simplicity and speed in key-value caching scenarios.

  • Multi-Threaded: Utilizes multiple CPU cores for high-throughput caching.
  • Simple Key-Value Model: Strings only; optimized for fast read/write of cached values.
  • Consistent Hashing: Distributed across multiple servers with client-side consistent hashing.
  • Sub-Millisecond Latency: In-memory operations with minimal overhead.
  • LRU Eviction: Automatic least-recently-used eviction when memory is full.
  • BSD License: Permissive open-source license with no copyleft restrictions.
  • Volatile by Design: All data is lost on restart; purely a caching layer.

Book relevance: Used alongside Redis in Ch 18 (application-level caching strategies), Ch 16 (simple caching patterns), and Ch 4 (caching at scale). Best for high-throughput, simple key-value caching where Redis's advanced data structures are unnecessary.

7.2. Reverse Proxy Caches

7.2.1. Varnish

Varnish Cache is a high-performance HTTP reverse proxy cache that sits in front of web servers to dramatically accelerate content delivery.

  • Extreme Performance: 300x to 1000x speedup depending on architecture; observed delivering 20 Gbps on commodity hardware.
  • VCL (Varnish Configuration Language): Flexible, powerful DSL for defining caching policies, request routing, and response manipulation.
  • Grace Mode: Continue serving stale content when backends return errors (HTTP 500).
  • Health Checking: Basic backend health checks with automatic failover.
  • Load Balancing: Round-robin and random directors with per-backend weighting.
  • HTTP/1.1 and HTTP/2: Full protocol support.
  • Monitoring Tools: varnishstat for real-time cache performance (hit rate, miss rate, resource consumption); varnishlog for request logging.
  • Cache Purging: Programmatic invalidation of cached content.

Book relevance: Implements the HTTP caching layer described in Ch 5 (edge caching architecture), Ch 16 (caching patterns), Ch 17 (frontend delivery optimization), and Ch 18 (backend API caching).

7.3. Content Delivery Networks (CDN)

7.3.1. Cloudflare

Cloudflare operates a global edge network that combines CDN, DDoS protection, DNS, and edge compute in an integrated platform.

  • Global Edge Network: Points of presence in 300+ cities across 100+ countries.
  • Lowest TTFB: Benchmarks show lowest time-to-first-byte at P95 for the largest number of top-1000 networks.
  • Workers (Edge Compute): Run JavaScript, TypeScript, or WASM at the edge for dynamic content without origin round-trips.
  • Instant Cache Purge: Sub-150ms global median purge latency.
  • Integrated Security: DDoS mitigation, WAF, bot management, and SSL/TLS.
  • Free Tier: Generous free plan for small sites.
  • Argo Smart Routing: Optimized routing across the Cloudflare network for faster origin fetches.

Book relevance: Implements CDN caching strategies in Ch 5 (edge computing architecture), Ch 4 (CDN as a caching layer), Ch 16 (caching patterns), and Ch 11 (geographic distribution testing).

7.3.2. Fastly

Fastly is a developer-focused edge cloud platform known for instant cache purging, real-time analytics, and programmable edge compute.

  • Instant Purge: Sub-150ms global cache invalidation.
  • Compute@Edge: WebAssembly-based edge compute for complex logic at the edge.
  • VCL & Fiddle: Varnish-based configuration with an online testing environment.
  • Real-Time Analytics: Streaming log delivery and real-time metrics.
  • Dynamic Content Support: Strong support for API caching, streaming media, and personalized content.
  • Developer-First: Comprehensive API and CLI for infrastructure-as-code.

Book relevance: Alternative to Cloudflare for Ch 5 (edge architecture), Ch 4 (CDN caching), and Ch 16 (caching patterns). Preferred by organizations needing instant purge and edge compute capabilities.

7.3.3. Akamai

Akamai operates the world's largest and most deeply embedded CDN with 4,100+ edge nodes across 130+ countries, many placed inside ISPs and mobile carriers.

  • Largest Edge Network: 4,100+ points of presence, often co-located with ISPs for minimal last-mile latency.
  • Enterprise Focus: Strict SLAs, dedicated support, and compliance certifications.
  • Media Delivery: Specialized solutions for video streaming and large file distribution.
  • Edge Compute: EdgeWorkers for running JavaScript at the edge.
  • Application Security: WAF, DDoS protection, bot management, and API security.
  • API Acceleration: Optimized routing and caching for API traffic.

Book relevance: Enterprise CDN option for Ch 5 (architecture), Ch 4 (caching at scale), and Ch 16 (multi-layer caching). Best for organizations with demanding media workloads, enterprise compliance requirements, and strict SLA needs.

8. Tool Selection Guide by Book Chapter

This section maps each book chapter to its most relevant tools, helping readers quickly identify which tools to explore for each topic.

Chapter Title Primary Tools
Ch 1 What is Performance Engineering? (Conceptual; introduces the need for all categories)
Ch 2 Measure the Real Cost web-vitals, Datadog, New Relic (correlating perf to revenue)
Ch 3 Set User-Centered Goals web-vitals, Chrome DevTools, Lighthouse, SpeedCurve, Calibre
Ch 4 Identify Critical Paths Jaeger, Zipkin, OpenTelemetry, Chrome DevTools, Redis, CDNs
Ch 5 Build Performance into Architecture Varnish, Redis, Cloudflare, Fastly, Akamai
Ch 6 Turn Goals into Budgets Lighthouse, bundlewatch, bundlesize, Calibre
Ch 7 Catch Slow Code Before Shipping Chrome DevTools, async-profiler, py-spy, perf, eBPF, JMH, wrk
Ch 8 Catch Size Problems During Builds webpack-bundle-analyzer, source-map-explorer, bundlewatch, LHCI
Ch 9 Catch Runtime Issues Early k6, Locust, Gatling, Artillery, vegeta, Datadog, Grafana
Ch 10 Catch Problems in Code Review Lighthouse CI (PR comments), GitHub Actions, bundlewatch
Ch 11 Test Like the Real World WebPageTest, k6, Locust, sitespeed.io, SpeedCurve
Ch 12 Automate Performance in CI/CD Lighthouse CI, sitespeed.io, GitHub Actions, k6, bundlewatch
Ch 13 Own and Observe Performance OpenTelemetry, Jaeger, Datadog, New Relic, Grafana, web-vitals
Ch 14 Scale Performance-First Culture Grafana (dashboards), New Relic, SpeedCurve, Calibre
Ch 15 Stay Fast as Stack Evolves OpenTelemetry (vendor-neutral), Lighthouse CI, sitespeed.io
Ch 16 Universal Performance Patterns Redis, Memcached, Varnish, CDNs (caching/compression patterns)
Ch 17 Frontend Web Recipes Chrome DevTools, Lighthouse, web-vitals, webpack-bundle-analyzer
Ch 18 Backend & API Recipes async-profiler, py-spy, perf, k6, Redis, Memcached, JMH
Ch 19 Mobile & Native Recipes Xcode Instruments, Android Systrace, Jetpack Benchmark
Ch 20 Keep It Fast by Default (Synthesis; references all tools from prior chapters)

9. Tool Comparison Summary

9.1. Load Testing Tools at a Glance

Tool Language Protocol Support Distributed CI/CD Best For
k6 JS/Go HTTP, WS, gRPC, Browser Cloud Strong General-purpose load testing
Locust Python HTTP, extensible Native Good Python teams, custom protocols
Gatling Java/Scala HTTP, WS, JMS, SMTP Enterprise Strong JVM teams, high-fidelity sims
Artillery YAML/JS HTTP, WS, GraphQL, Playwright Cloud Strong Full-stack + browser testing
wrk C/Lua HTTP No Manual Raw throughput benchmarking
hey Go HTTP No Manual Quick dev-time benchmarks
vegeta Go HTTP No Strong Constant-rate API testing

9.2. Observability Platforms at a Glance

Tool Type Traces Metrics Logs Profiling RUM License
Datadog Commercial Yes Yes Yes Yes Yes SaaS
New Relic Commercial Yes Yes Yes Yes Yes SaaS/Free tier
Grafana Stack Open/Comm Yes Yes Yes Yes No AGPL/Comm
Jaeger Open Yes No No No No Apache 2.0
Zipkin Open Yes No No No No Apache 2.0
OpenTelemetry Open Yes Yes Yes No No Apache 2.0

9.3. Bundle & Size Tools at a Glance

Tool Visualization CI Gate PR Status Budget Enforcement
webpack-bundle-analyzer Yes (treemap) No No No
source-map-explorer Yes (treemap) No No No
bundlesize No Yes Yes Yes (gzip)
bundlewatch No Yes Yes Yes
Lighthouse CI Yes (report) Yes Yes Yes (multi-metric)

10. Getting Started Recommendations

For teams beginning their performance engineering journey (the book's Performance Engineering Maturity Model, Ch 20), here is a recommended adoption path:

10.1. Level 1: Reactive (Minimum Viable Performance Tooling)

  1. Browser profiling: Chrome DevTools (free, already installed)
  2. Synthetic audits: Lighthouse in Chrome DevTools (free)
  3. Field data: web-vitals library (~2KB, free)
  4. Quick benchmarks: hey or wrk (free CLI tools)

10.2. Level 2: Measured (Adding Tracking and Baselines)

  1. Bundle analysis: webpack-bundle-analyzer or source-map-explorer
  2. Load testing: k6 with basic scenarios
  3. Distributed tracing: OpenTelemetry + Jaeger (open source)
  4. Dashboards: Grafana + Prometheus

10.3. Level 3: Proactive (Budgets and Gates)

  1. CI performance gates: Lighthouse CI in GitHub Actions
  2. Size budgets: bundlewatch with PR status checks
  3. Load test gates: k6 thresholds in CI
  4. Performance budgets: Calibre or SpeedCurve

10.4. Level 4: Embedded (Performance-First Culture)

  1. Full APM: Datadog or New Relic for end-to-end observability
  2. Continuous profiling: async-profiler (Java) / py-spy (Python)
  3. Continuous monitoring: sitespeed.io with Grafana dashboards
  4. CDN optimization: Cloudflare/Fastly with Varnish origin caching
  5. Advanced tracing: eBPF tools for deep system-level analysis

Author: Study Reference

jwalsh@nexus

Last Updated: 2026-02-28 16:54:25

build: 2026-04-17 18:37 | sha: 792b203