Agentic Systems Research Plan: Q1 2026
Table of Contents
Executive Summary
This plan outlines Q1 2026 research focus on multi-agent coordination infrastructure. Our prior work on REPLs, issue tracking, communication queues, and token economics provides a foundation that addresses gaps in existing frameworks (LangGraph, CrewAI, Microsoft Agent Framework).
Problem Statement
Current multi-agent frameworks focus on orchestration but lack:
| Gap | Industry Status | Our Approach |
|---|---|---|
| Work decomposition | Manual or ad-hoc | Git-native issue tracking with agent handoffs |
| Agent communication | Framework-specific | Universal file-based queues |
| State recovery | Lost on crash/timeout | Checkpoint/restore with git worktrees |
| Resource allocation | Unlimited or hard caps | Token economy with pooling and rate limits |
| Cost tracking | External or missing | Integrated per-agent metrics |
Prior Work Integration
Issue Tracking and Decomposition
Git-native issue tracking that travels with repositories:
- Issues as contracts between agents
- Priority-based work distribution
- Dependency tracking for blocked work
- Completion verification before handoff
Integration: Work items become first-class entities in multi-agent workflows. Agents claim, execute, and close issues with full audit trail.
Agent Communication Queues
File-based queue system for agent-to-agent communication:
- JSON request/response protocol
- Async processing with status tracking
- Works with any terminal coding agent
- No vendor lock-in
Integration: Universal adapter layer between heterogeneous agents (Claude Code, Amp, Gemini CLI, Copilot). Simpler than MCP for local multi-agent scenarios.
Session State and Exploration
REPL infrastructure with formal specifications:
- Session state persistence
- Token usage tracking
- TLA+/Alloy specifications for correctness
Integration: Foundation for checkpoint/restore and parallel exploration paths.
Time Travel and Decision Recovery
Git worktree-based exploration branching:
┌─── approach-A (worktree)
│
main ───────────────┼─── approach-B (worktree)
│
└─── approach-C (worktree)
Each path:
- Isolated git worktree
- Own agent sessions
- Checkpoint/restore capability
- Merge findings back to main
Integration: Agents explore alternatives without losing decision provenance. Failed approaches remain accessible for future reference.
Token Economy
Mock economy for coordinating agent resource usage:
| Mechanism | Purpose |
|---|---|
| Earning | Commits, issue resolution → tokens |
| Spending | LLM inference costs tokens |
| Pooling | Team operations for expensive models |
| Rate limiting | Prevent runaway spending |
Integration: Unified resource model across all agents. Cost-aware model selection (Claude Opus for complex reasoning, Haiku for simple tasks).
Architecture
┌─────────────────────────────────────────────────────────────────────────────┐ │ Q1 2026 INTEGRATION ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────┐ │ │ │ ISSUE TRACKER │ │ │ │ (Decomposition) │ │ │ └──────────┬───────────┘ │ │ │ creates work items │ │ ┌──────────▼───────────┐ │ │ │ TOKEN EXCHANGE │ │ │ │ (Resource Alloc) │ │ │ └──────────┬───────────┘ │ │ │ funds agent work │ │ ┌──────────────────────────┼──────────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ │ Claude Code │ │ Amp │ │ Gemini CLI │ │ │ │ (Primary) │ │ (Search) │ │ (Review) │ │ │ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │ │ │ │ │ │ │ └───────────────────────┼───────────────────────┘ │ │ │ via communication queues │ │ ┌───────▼───────┐ │ │ │ QUEUE SYSTEM │ │ │ │ (Agent IPC) │ │ │ └───────┬───────┘ │ │ │ results + state │ │ ┌───────▼───────┐ │ │ │ CHECKPOINTS │ │ │ │ (Recovery) │ │ │ └───────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘
Q1 2026 Deliverables
P0: Critical Path
| Deliverable | Description | Dependencies |
|---|---|---|
| Issue ↔ Token bridge | Close issue → earn tokens; claim → reserve | Issue tracker, token exchange |
| Queue adapter | Universal TCA interface via file queues | Queue system |
| Queue protocol versioning | Semver for JSON schema, backwards compatibility | Queue adapter |
| Queue concurrency | File locking, atomic writes, conflict resolution | Queue adapter |
| Queue durability | Append-only logs, checksums, corruption detection | Queue adapter |
| Failure recovery protocol | Agent crash recovery, issue state rollback | Issue tracker, queues |
| Structured logging | JSON logs from all components for post-mortem | All |
| Cost metrics | Per-agent token usage logging | Token exchange |
Queue Protocol Versioning
Semantic versioning for queue message schema:
{
"protocol_version": "1.0.0",
"id": "req_001",
"type": "eval",
"content": "..."
}
| Version Bump | When |
|---|---|
| Major (2.0.0) | Breaking changes to required fields |
| Minor (1.1.0) | New optional fields, new message types |
| Patch (1.0.1) | Bug fixes, clarifications |
Receivers MUST accept messages with same major version. Unknown fields ignored (forward compatibility).
Queue Concurrency
File-based queues need explicit coordination when multiple agents operate simultaneously:
┌─────────────────────────────────────────────────────────────────────────┐ │ CONCURRENCY STRATEGY │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ WRITE OPERATIONS │ │ ──────────────── │ │ 1. Atomic write via temp file + rename │ │ write(tmp) → fsync(tmp) → rename(tmp, target) │ │ │ │ 2. Lock file for multi-step operations │ │ flock(queue.lock) → read → modify → write → unlock │ │ │ │ 3. Unique message IDs (UUID v7 for time-ordering) │ │ Enables idempotent processing, dedup on replay │ │ │ │ READ OPERATIONS │ │ ─────────────── │ │ 1. Move-to-processing pattern │ │ requests/ → processing/ → responses/ │ │ │ │ 2. Claim timeout (default 5min) │ │ processing/ files older than timeout → back to requests/ │ │ │ │ CONFLICT RESOLUTION │ │ ─────────────────── │ │ 1. First-write-wins for claims (rename fails if exists) │ │ 2. Append-only for logs (no conflicts possible) │ │ 3. Last-write-wins for status updates (version field) │ │ │ └─────────────────────────────────────────────────────────────────────────┘
Queue Durability
Protect against corruption and enable recovery:
MESSAGE FORMAT (append-only log)
────────────────────────────────
{
"id": "01ARZ3NDEKTSV4RRFFQ69G5FAV",
"protocol_version": "1.0.0",
"timestamp": "2026-01-15T10:30:00Z",
"checksum": "sha256:a1b2c3...",
"type": "request",
"payload": { ... }
}
INTEGRITY CHECKS
────────────────
1. Per-message SHA256 checksum
2. Append-only: messages never modified, only appended
3. Sequence numbers for gap detection
4. Daily rotation with archived checksums
CORRUPTION RECOVERY
───────────────────
1. Detect: checksum mismatch or parse failure
2. Isolate: move corrupt file to quarantine/
3. Rebuild: replay from git history (queues are committed)
4. Alert: notify operator via webhook/email
File structure with durability:
queues/ ├── requests/ │ └── 01ARZ3NDEKTSV4RRFFQ69G5FAV.json ├── processing/ ├── responses/ ├── archive/ │ └── 2026-01-14.jsonl.gz # Daily rotation ├── quarantine/ # Corrupt files ├── queue.lock # flock target └── checksums.sha256 # Integrity manifest
Failure Recovery Protocol
What happens when an agent crashes mid-issue:
┌─────────────────────────────────────────────────────────────────────────┐ │ AGENT FAILURE SCENARIOS │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ 1. CRASH DURING CLAIM │ │ Issue: in_progress, no work done │ │ Recovery: Timeout → auto-release after 30min │ │ Tokens: Reserved tokens returned to agent wallet │ │ │ │ 2. CRASH DURING WORK │ │ Issue: in_progress, partial commits exist │ │ Recovery: Checkpoint restore OR new agent picks up │ │ Tokens: Partial credit based on commits/progress │ │ │ │ 3. CRASH DURING HANDOFF │ │ Issue: Queue message sent, not ACKd │ │ Recovery: Message replay from queue (at-least-once delivery) │ │ Tokens: Deferred until receiving agent ACKs │ │ │ │ 4. QUEUE CORRUPTION │ │ Issue: Messages lost or malformed │ │ Recovery: Rebuild from git history (queues are committed) │ │ Tokens: Reconcile from ledger │ │ │ └─────────────────────────────────────────────────────────────────────────┘
Issue state machine with failure transitions:
timeout (30min)
┌──────────────────────────────┐
│ │
▼ │
┌───────┐ claim ┌─────────────┐ │ crash
│ open │─────────▶│ in_progress │─┘
└───────┘ └──────┬──────┘
▲ │
│ abandon │ complete
└─────────────────────┼─────────────┐
▼ │
┌────────┐ │
│ closed │◀────────┘
└────────┘
Recovery commands:
# Check for stale in_progress issues (>30min) bd list --status=in_progress --stale # Force release abandoned issue bd release <issue-id> --reason="agent timeout" # Replay failed queue messages queue replay --failed --since="1h ago" # Reconcile token ledger with issue history token-exchange reconcile --dry-run
Failure Mode Catalog
Comprehensive failure scenarios with detection and mitigation:
| Failure | Detection | Impact | Mitigation | Recovery Time |
|---|---|---|---|---|
| Agent timeout | Heartbeat miss >30min | Issue orphaned | Auto-release to pool | <1min |
| Agent crash | Process exit, no response | Partial work lost | Checkpoint restore | <5min |
| Queue corruption | Checksum mismatch | Messages lost | Rebuild from git | <10min |
| Queue deadlock | Circular wait detected | All agents blocked | Forced release oldest | <1min |
| Git merge conflict | bd sync fails | State divergence | Manual resolution | Variable |
| Token bankruptcy | Balance < reserve | Agent starved | Emergency pool loan | <1min |
| Token double-spend | Ledger inconsistency | Economic instability | Reconcile + rollback | <5min |
| Network partition | Remote unreachable | Sync blocked | Local-only mode | 0 (degraded) |
| Disk full | Write fails | All ops blocked | Alert + cleanup | Variable |
| Clock skew | Timestamp anomaly | Ordering wrong | NTP sync + reorder | <1min |
Severity levels:
CRITICAL - System unusable, all agents blocked HIGH - Major functionality impaired, some agents affected MEDIUM - Degraded performance, workarounds available LOW - Minor issues, self-healing
| Failure | Severity | Auto-Recovery |
|---|---|---|
| Agent timeout | MEDIUM | Yes |
| Agent crash | MEDIUM | Yes (with checkpoint) |
| Queue corruption | HIGH | Yes (from git) |
| Queue deadlock | CRITICAL | Yes (forced release) |
| Git merge conflict | HIGH | No (manual) |
| Token bankruptcy | MEDIUM | Yes (pool loan) |
| Token double-spend | CRITICAL | Yes (reconcile) |
| Network partition | LOW | Yes (local mode) |
| Disk full | CRITICAL | No (manual) |
| Clock skew | LOW | Yes (NTP) |
Structured Logging
All components emit JSON logs for post-mortem analysis:
{
"timestamp": "2026-01-15T10:30:45.123Z",
"level": "INFO",
"component": "queue",
"event": "message_claimed",
"trace_id": "tr_01ARZ3NDEK",
"span_id": "sp_TSVRRF",
"agent": "claude",
"message_id": "01ARZ3NDEKTSV4RRFFQ69G5FAV",
"duration_ms": 45,
"metadata": {
"queue_depth": 3,
"processing_count": 1
}
}
Log schema by component:
| Component | Key Events | Trace Fields |
|---|---|---|
| Queue | claimed, completed, failed, timeout | messageid, agent, durationms |
| Issue tracker | created, claimed, closed, released | issueid, agent, reason |
| Token exchange | earned, spent, pooled, ratelimited | agent, amount, balance, model |
| Checkpoint | saved, restored, pruned | checkpointid, sizebytes |
| Agent | started, heartbeat, stopped, crashed | agent, pid, exitcode |
Log levels:
| Level | When | Retention |
|---|---|---|
| ERROR | Failures requiring attention | 90 days |
| WARN | Anomalies, auto-recovered | 30 days |
| INFO | Normal operations | 7 days |
| DEBUG | Detailed tracing | 1 day |
Correlation:
trace_id: Spans entire workflow (issue claim → close)span_id: Individual operation within traceparent_span_id: Links nested operations
Query examples:
# Find all events for a failed workflow jq 'select(.trace_id == "tr_01ARZ3NDEK")' logs/*.jsonl # Agent error rate last hour jq -s '[.[] | select(.level == "ERROR" and .agent == "claude")] | length' logs/$(date +%Y-%m-%d).jsonl # Slowest queue operations jq -s 'sort_by(.duration_ms) | reverse | .[0:10]' logs/*.jsonl # Token spending by model jq -s 'group_by(.metadata.model) | map({model: .[0].metadata.model, total: map(.amount) | add})' logs/token-*.jsonl
P1: Core Features
| Deliverable | Description | Dependencies |
|---|---|---|
| Queue introspection | Debug stalled workflows, message tracing | Queue adapter |
| Checkpoint protocol | /checkpoint save/restore commands |
REPL infrastructure |
| Exploration branching | Parallel worktrees per approach | Git worktrees |
| Agent handoff | Structured work transfer between agents | Issue tracker, queues |
Queue Introspection
Essential for debugging stalled workflows before full dashboard:
# Queue status overview queue status # Output: # requests/: 3 pending (oldest: 2min ago) # processing/: 1 active (agent: claude, claimed: 45s ago) # responses/: 12 today # errors/: 0 # Trace a specific message through the system queue trace <message-id> # Output: # 01ARZ3... created 2026-01-15T10:30:00Z # 01ARZ3... claimed 2026-01-15T10:30:02Z by claude # 01ARZ3... completed 2026-01-15T10:30:45Z duration: 43s # Find stuck messages queue stuck --threshold=5m # Output: # processing/01ARZ3... claimed 8m ago by amp (likely stalled) # Watch queue activity in real-time queue watch # Output: # [10:30:01] ← claude submitted req_001 # [10:30:02] → amp claimed req_001 # [10:30:45] ✓ amp completed req_001 (43s) # Dump queue state for debugging queue dump --format=json > queue-state.json
P2: Integration
| Deliverable | Description | Dependencies |
|---|---|---|
| Formal specifications | TLA+/Alloy specs for queue protocol, published | Queue adapter |
| CLASSic metrics | Cost, latency, accuracy, stability, security | Cost metrics |
| Multi-agent review | Adversarial pass with different agents | Queue adapter |
| Dashboard | Real-time economy and work visualization | All above |
Formal Specifications
Publish TLA+/Alloy specs alongside implementation:
specs/ ├── queue-protocol.tla # Queue state machine, message ordering ├── token-exchange.tla # Token invariants, no negative balances ├── agent-handoff.tla # Issue state transitions, no orphans ├── concurrency.tla # Lock-free operations, no deadlocks └── README.md # How to run TLC model checker
Key properties to verify:
| Property | Spec | Tool |
|---|---|---|
| Message ordering preserved | queue-protocol.tla | TLC |
| No token double-spend | token-exchange.tla | TLC |
| Issues never orphaned | agent-handoff.tla | TLC |
| Lock-free queue operations | concurrency.tla | TLC |
| No deadlock in handoff | agent-handoff.als | Alloy |
Benefits:
- Catches edge cases before implementation
- Executable documentation of invariants
- Confidence in concurrent operations
- Differentiator vs other frameworks
Research Questions
- What decomposition granularity optimizes agent effectiveness?
- How should token rewards align with actual value delivered?
- What checkpoint frequency balances recovery vs overhead?
- How do we measure coordination quality (not just individual performance)?
- What security model prevents malicious agent behavior in shared queues?
Success Criteria
| Metric | Target | Measurement |
|---|---|---|
| Agent handoffs | 10+ per day | Issue tracker logs |
| Cost per task | -30% vs baseline | Token exchange ledger |
| Recovery time | <5 min from checkpoint | Manual testing |
| Parallel explorations | 3+ concurrent | Worktree count |
| Framework independence | 3+ TCA types | Queue adapter compatibility |
Timeline
| Week | Focus | Deliverables |
|---|---|---|
| 1-2 | Issue ↔ Token bridge | Integration scripts, hook setup |
| 3-4 | Queue adapter | Universal TCA interface |
| 5-6 | Checkpoint protocol | Save/restore commands |
| 7-8 | Exploration branching | Worktree management |
| 9-10 | Metrics and dashboard | CLASSic integration |
| 11-12 | Documentation and refinement | Team onboarding |
Related External Research
Multi-Agent Orchestration
- LangGraph: State graphs, supervisor patterns
- CrewAI: Role-based teams (Manager/Worker/Researcher)
- Microsoft Agent Framework: AutoGen + Semantic Kernel merger
Protocol Standards
- MCP (Model Context Protocol): Anthropic's tool integration standard
- A2A (Agent-to-Agent): Google's inter-agent protocol
- Agentic AI Foundation: Linux Foundation governance
Evaluation
- AgentBench: 8 interactive environments
- GAIA: 466 real-world tasks
- CLASSic: Enterprise dimensions (ICLR 2025)
Economics
- ASI Alliance: Fetch.ai + SingularityNET + Ocean ($9.2B)
- Agent Exchange (AEX): RTB-inspired auction framework
Appendix: Component Status
| Component | Maturity | Repository |
|---|---|---|
| Issue tracker | Production | git-native JSONL |
| Token exchange | Prototype | Guile Scheme (~1100 LOC) |
| Queue system | Production | File-based JSON |
| REPL infrastructure | Prototype | ClojureScript |
| Checkpoint system | Conceptual | Git worktrees |
Contact
Questions or suggestions: j@wal.sh