ADS-B Data Pipeline Design

Table of Contents

Overview

A robust data pipeline for processing ADS-B data from multiple receivers, designed around the Falsehoods Programmers Believe About Aviation.

Pipeline Stages

┌─────────────────────────────────────────────────────────────────────────────┐
│                           ADS-B DATA PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────┐  ┌─────────┐                                                   │
│  │Receiver │  │Receiver │   RECEIVERS (1090 MHz)                            │
│  │    A    │  │    B    │                                                   │
│  └────┬────┘  └────┬────┘                                                   │
│       │            │                                                        │
│       └─────┬──────┘                                                        │
│             ▼                                                               │
│  ┌─────────────────────┐                                                    │
│  │  1. INGEST          │  Parse SBS format                                  │
│  │     - Tag receiver  │  Handle connection drops                           │
│  │     - Buffer msgs   │  Use device serial as ID                           │
│  └──────────┬──────────┘                                                    │
│             ▼                                                               │
│  ┌─────────────────────┐  ┌─────────────────────┐                           │
│  │  2. VALIDATE        │──│    QUARANTINE       │                           │
│  │     - Callsign      │  │    - Bad data       │                           │
│  │     - Position      │  │    - For review     │                           │
│  │     - Spoof detect  │  └─────────────────────┘                           │
│  └──────────┬──────────┘                                                    │
│             ▼                                                               │
│  ┌─────────────────────┐                                                    │
│  │  3. NORMALIZE       │  Consistent formats                                │
│  │     - Callsign fmt  │  Handle encoding issues                            │
│  │     - Altitude refs │  Squawk codes                                      │
│  └──────────┬──────────┘                                                    │
│             ▼                                                               │
│  ┌─────────────────────┐  ┌─────────────────────┐                           │
│  │  4. DEDUPLICATE     │──│   CONFLICTS         │                           │
│  │     - Time window   │  │   - Record when     │                           │
│  │     - Same hex_id   │  │     sources differ  │                           │
│  │     - Merge fields  │  └─────────────────────┘                           │
│  └──────────┬──────────┘                                                    │
│             ▼                                                               │
│  ┌─────────────────────┐                                                    │
│  │  5. ENRICH          │  Add metadata                                      │
│  │     - Registration  │  Operator info                                     │
│  │     - Aircraft type │  Flight linking                                    │
│  └──────────┬──────────┘                                                    │
│             ▼                                                               │
│  ┌─────────────────────┐                                                    │
│  │  6. STORE           │  Clean output                                      │
│  │     - SQLite/CSV    │  Audit trail                                       │
│  │     - Never delete  │  Queryable                                         │
│  └─────────────────────┘                                                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Validation Decision Tree

RawMessage
    │
    ▼
┌─────────────┐
│ hex_ident   │──── empty/invalid ──────────────────┐
│ validation  │                                     │
└──────┬──────┘                                     │
       │ valid                                      │
       ▼                                            │
┌─────────────┐                                     │
│ callsign    │──── "NULL", "TEST" ──► weird_but_  │
│ validation  │                         valid       │
└──────┬──────┘                                     │
       │                                            ▼
       ▼                                   ┌─────────────┐
┌─────────────┐                            │ QUARANTINE  │
│ position    │──── lat/lon invalid ──────►│             │
│ validation  │                            │ - reason    │
└──────┬──────┘                            │ - timestamp │
       │                                   │ - raw_data  │
       ▼                                   └─────────────┘
┌─────────────┐                                   ▲
│ continuity  │──── teleportation detected ───────┤
│ check       │                                   │
└──────┬──────┘                                   │
       │                                          │
       ▼                                          │
┌─────────────┐                                   │
│ spoof       │──── spoof indicators ─────────────┘
│ detection   │     (mark, don't discard)
└──────┬──────┘
       │
       ▼
  VALID MESSAGE
  (with confidence score)

Multi-Receiver Deduplication

When multiple receivers see the same aircraft:

┌────────┐ ┌────────┐
│Receiver│ │Receiver│
│   A    │ │   B    │
└───┬────┘ └───┬────┘
    │          │
    │ A12345   │ A12345
    │ 18:00:01 │ 18:00:02
    │          │
    └────┬─────┘
         │
         ▼
    ┌─────────┐
    │ Group by│
    │ hex+time│
    └────┬────┘
         │
         ▼
    ┌─────────┐
    │  Merge  │
    │ fields  │
    └────┬────┘
         │
         ▼
    ┌─────────────────┐
    │ MergedSighting  │
    │ - receiver_ids: │
    │   ["A", "B"]    │
    │ - source_count: │
    │   2             │
    └─────────────────┘

Merge Priority Rules

Field Priority
hexident Must match (grouping key)
timestamp Use most recent
callsign Prefer non-empty, most recent
position Prefer highest confidence
altitude Prefer geometric over barometric
receiverids Append all (for coverage tracking)

Receiver Identification

Use physical device serial numbers, not hostnames:

@dataclass
class ReceiverConfig:
    device_serial: str  # USB serial (e.g., "00000001")
    host: str           # Hostname or IP
    port: int           # SBS port (default 30003)
    name: str           # Friendly name
    antenna: str        # Antenna description

This allows multiple receivers per host and ensures each message is tagged with the physical device that received it.

Conflict Resolution

When sources disagree, don't discard - record:

{
  "entity_type": "sighting",
  "field_name": "callsign",
  "values": {
    "receiver_a": "UAL123",
    "receiver_b": "UAL124"
  },
  "confidences": {
    "receiver_a": 0.8,
    "receiver_b": 0.7
  },
  "resolved": false
}

Resolution Strategies

  1. HIGHEST_CONFIDENCE - Pick value with highest confidence
  2. MOST_RECENT - Pick most recently received value
  3. MAJORITY - Pick value with most sources agreeing
  4. MANUAL - Flag for human review

Output Schema

-- Clean sightings (deduplicated, validated)
CREATE TABLE sightings (
    id INTEGER PRIMARY KEY,
    timestamp TEXT NOT NULL,
    mode_s_hex TEXT NOT NULL,
    callsign TEXT,
    latitude REAL,
    longitude REAL,
    altitude_feet INTEGER,
    confidence REAL,
    is_potential_spoof INTEGER,
    receiver_ids TEXT,       -- JSON array
    source_count INTEGER,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Quarantined data (for review)
CREATE TABLE quarantine (
    id INTEGER PRIMARY KEY,
    raw_data TEXT,
    reason TEXT,
    receiver_id TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Conflicts (when sources disagree)
CREATE TABLE conflicts (
    id INTEGER PRIMARY KEY,
    mode_s_hex TEXT,
    field_name TEXT,
    values TEXT,             -- JSON
    resolved INTEGER DEFAULT 0,
    resolution TEXT
);

References

Author: Jason Walsh

jwalsh@nexus

Last Updated: 2025-12-29 17:08:19

build: 2026-04-17 18:35 | sha: 792b203