ADS-B Data Pipeline Design
Table of Contents
Overview
A robust data pipeline for processing ADS-B data from multiple receivers, designed around the Falsehoods Programmers Believe About Aviation.
Pipeline Stages
┌─────────────────────────────────────────────────────────────────────────────┐ │ ADS-B DATA PIPELINE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────┐ ┌─────────┐ │ │ │Receiver │ │Receiver │ RECEIVERS (1090 MHz) │ │ │ A │ │ B │ │ │ └────┬────┘ └────┬────┘ │ │ │ │ │ │ └─────┬──────┘ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ 1. INGEST │ Parse SBS format │ │ │ - Tag receiver │ Handle connection drops │ │ │ - Buffer msgs │ Use device serial as ID │ │ └──────────┬──────────┘ │ │ ▼ │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ 2. VALIDATE │──│ QUARANTINE │ │ │ │ - Callsign │ │ - Bad data │ │ │ │ - Position │ │ - For review │ │ │ │ - Spoof detect │ └─────────────────────┘ │ │ └──────────┬──────────┘ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ 3. NORMALIZE │ Consistent formats │ │ │ - Callsign fmt │ Handle encoding issues │ │ │ - Altitude refs │ Squawk codes │ │ └──────────┬──────────┘ │ │ ▼ │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ 4. DEDUPLICATE │──│ CONFLICTS │ │ │ │ - Time window │ │ - Record when │ │ │ │ - Same hex_id │ │ sources differ │ │ │ │ - Merge fields │ └─────────────────────┘ │ │ └──────────┬──────────┘ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ 5. ENRICH │ Add metadata │ │ │ - Registration │ Operator info │ │ │ - Aircraft type │ Flight linking │ │ └──────────┬──────────┘ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ 6. STORE │ Clean output │ │ │ - SQLite/CSV │ Audit trail │ │ │ - Never delete │ Queryable │ │ └─────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘
Validation Decision Tree
RawMessage
│
▼
┌─────────────┐
│ hex_ident │──── empty/invalid ──────────────────┐
│ validation │ │
└──────┬──────┘ │
│ valid │
▼ │
┌─────────────┐ │
│ callsign │──── "NULL", "TEST" ──► weird_but_ │
│ validation │ valid │
└──────┬──────┘ │
│ ▼
▼ ┌─────────────┐
┌─────────────┐ │ QUARANTINE │
│ position │──── lat/lon invalid ──────►│ │
│ validation │ │ - reason │
└──────┬──────┘ │ - timestamp │
│ │ - raw_data │
▼ └─────────────┘
┌─────────────┐ ▲
│ continuity │──── teleportation detected ───────┤
│ check │ │
└──────┬──────┘ │
│ │
▼ │
┌─────────────┐ │
│ spoof │──── spoof indicators ─────────────┘
│ detection │ (mark, don't discard)
└──────┬──────┘
│
▼
VALID MESSAGE
(with confidence score)
Multi-Receiver Deduplication
When multiple receivers see the same aircraft:
┌────────┐ ┌────────┐
│Receiver│ │Receiver│
│ A │ │ B │
└───┬────┘ └───┬────┘
│ │
│ A12345 │ A12345
│ 18:00:01 │ 18:00:02
│ │
└────┬─────┘
│
▼
┌─────────┐
│ Group by│
│ hex+time│
└────┬────┘
│
▼
┌─────────┐
│ Merge │
│ fields │
└────┬────┘
│
▼
┌─────────────────┐
│ MergedSighting │
│ - receiver_ids: │
│ ["A", "B"] │
│ - source_count: │
│ 2 │
└─────────────────┘
Merge Priority Rules
| Field | Priority |
|---|---|
| hexident | Must match (grouping key) |
| timestamp | Use most recent |
| callsign | Prefer non-empty, most recent |
| position | Prefer highest confidence |
| altitude | Prefer geometric over barometric |
| receiverids | Append all (for coverage tracking) |
Receiver Identification
Use physical device serial numbers, not hostnames:
@dataclass class ReceiverConfig: device_serial: str # USB serial (e.g., "00000001") host: str # Hostname or IP port: int # SBS port (default 30003) name: str # Friendly name antenna: str # Antenna description
This allows multiple receivers per host and ensures each message is tagged with the physical device that received it.
Conflict Resolution
When sources disagree, don't discard - record:
{
"entity_type": "sighting",
"field_name": "callsign",
"values": {
"receiver_a": "UAL123",
"receiver_b": "UAL124"
},
"confidences": {
"receiver_a": 0.8,
"receiver_b": 0.7
},
"resolved": false
}
Resolution Strategies
HIGHEST_CONFIDENCE- Pick value with highest confidenceMOST_RECENT- Pick most recently received valueMAJORITY- Pick value with most sources agreeingMANUAL- Flag for human review
Output Schema
-- Clean sightings (deduplicated, validated) CREATE TABLE sightings ( id INTEGER PRIMARY KEY, timestamp TEXT NOT NULL, mode_s_hex TEXT NOT NULL, callsign TEXT, latitude REAL, longitude REAL, altitude_feet INTEGER, confidence REAL, is_potential_spoof INTEGER, receiver_ids TEXT, -- JSON array source_count INTEGER, created_at TEXT DEFAULT CURRENT_TIMESTAMP ); -- Quarantined data (for review) CREATE TABLE quarantine ( id INTEGER PRIMARY KEY, raw_data TEXT, reason TEXT, receiver_id TEXT, created_at TEXT DEFAULT CURRENT_TIMESTAMP ); -- Conflicts (when sources disagree) CREATE TABLE conflicts ( id INTEGER PRIMARY KEY, mode_s_hex TEXT, field_name TEXT, values TEXT, -- JSON resolved INTEGER DEFAULT 0, resolution TEXT );
