Who Does What, With Agents: A CTO's Read of Team Topologies for the Agentic Platform

Table of Contents

1. Stance

Olivier Wulveryck's "Who does what — Team Topologies for the agentic platform" (2026-06-22) asks the right question for the agentic era. His first article asked what an agentic platform must provide — the systemic capabilities (context, guardrails, tooling) needed to ship reliable applications at scale. This one asks who provides it, using Skelton & Pais's Team Topologies: four team types (stream-aligned, platform, enabling, complicated-subsystem) and three interaction modes (X-as-a-Service, facilitating, collaboration).

This note re-reads that argument from a CTO's chair and scales it down — to the team that fits on one or two people plus a stack of agents. The grid does not vanish at small scale; it collapses. Four teams become four hats. Three interaction modes become a schedule. The platform that survives the collapse is the one made of cached system boundaries — and the claim of this note is not abstract: each boundary below is one we have built and written up, cited inline as the supporting detail. Speculative throughout.

(An aside worth keeping: Wulveryck's piece was AI-translated from French and drew "word-salad" criticism on Hacker News; he left it unedited and appended the community's open engineering questions. The frame was strong enough to spark the debate anyway — which is the case for putting a foundational idea in writing before it is polished.)

2. The real problem: anticipation burden, not headcount

Wulveryck's diagnosis is that agentic production changes the shape of cognitive load. It is no longer a quantity to distribute across people; it is a throughput to regulate over time. Agents do not ask questions — they produce. The human steering them must anticipate, inside the window of a single prompt, every question a designer, an architect, a tester, and an operator would once have raised in sequence.

At organization scale the platform absorbs that throughput: it answers the agent's routine questions on behalf of the human, so the prompt carries only the contested, judgement-bearing decisions. That is the CTO's actual lever — not "buy an agentic platform," but "decide what the platform must absorb so scarce human judgement is spent only where it is scarce."

Scale down and the throughput is unchanged while the headcount to absorb it vanishes. No platform team to industrialize systemic context, no enabling team to bridge a gap, no complicated-subsystem team to hide deep complexity. So the load is not distributed across teams — it is cached across sessions. A note carrying a reviewed verdict replaces the conversation four teams would otherwise have. The boundary becomes a property of the artifact, not of an org chart.

3. The four hats

The four team types map onto four modes of working, not four people. A solo operator wears them in sequence; a pair splits them, rarely cleanly. The hats name which load you are carrying right now — which tells you what kind of cache you should be writing to.

  • Stream-aligned hat (the author). Drives an agent toward a business outcome and holds the what: the dynamic context, the question actually being asked, the judgement that must be defended. It owns the up-front statement of "the thing the agent must not get wrong."
  • Platform hat (the build + the boundaries). Industrializes the how: the build pipeline, the conventions, the credential model. It invests effort once and amortizes across every later session.
  • Enabling hat (skills, templates, conventions). Teaches a future agent — the same human, or a literal LLM — how to do a thing without being told. At this scale enabling is not transient: it is structurally compensated by the cached convention, because the producer next session may be a different agent and the convention must survive the handoff.
  • Complicated-subsystem hat (depth). Holds the technically demanding work whose expertise must not be diluted across casual sessions. It reaches the agent only through the cached note — the agent looks up the decomposition rather than re-deriving it.

4. Three interaction modes, time-multiplexed

In the original, the three modes are relationships between teams: facilitating (enabling teaches stream-aligned), X-as-a-Service (platform delivers self-service), collaboration (subsystem works with platform). At solo scale the same three modes happen between hats, inside one session, as scheduled transitions rather than negotiated relationships:

author takes the question            (load it onto the prompt)
author → platform  (X-as-a-Service)  scaffold from a template
author → subsystem (collaboration)   consult a cached decomposition / verdict
author drafts                        (prose + executable evidence)
author → enabling  (facilitating)    a skill regenerates a derived artifact
author → platform  (deploy)          ship, gated by the credential tier
author runs an integrated review     (second pass, same session)

Each transition costs the price of picking up cached context — conventions, prior verdicts, skill descriptions, memories. It is low precisely because the boundaries were drawn once and cached since.

5. The platform that survives: cached system boundaries

The CTO-relevant inversion: at organization scale you industrialize capabilities; at small scale you industrialize boundaries. Five caches do the work — and each is one we have built and documented, cited here as the detail behind the framing.

  1. Memory. Across-session memory is the solo-operator platform investment: invariants and conventions read at every session start, plus a keyword-indexed store of durable insights. Its defining property is that the agent reading the memory is not the agent that wrote it — it must survive model upgrades, account switches, and context resets. Detail: Agent Memory as Institutional Knowledge (the caches are the institutional knowledge, versioned like code) and the mechanism survey in Agent Memory Systems.
  2. A decision ledger. An append-only, hash-chained trail binding each change to its verifier, so a past decision cannot be silently rewritten — the small-team form of "decision traceability via audit trails": no SIEM, just a file that fails closed. Detail: the review methodology in Annotation Systems.
  3. Credential-tier classification. Classify every operation by which credential it touches (local-only / network-but-no-deploy-key / deploy-keys); getting it wrong is the single most expensive small-scale mistake, because no platform team is guarding it. The agent reads the tier before proposing a command, the way an org-scale agent reads an RBAC policy. Detail: the credential-provenance argument in Agent Deployment Systems (who holds the deploy credential) and secret custody as an isolation axis in Agent Sandbox Systems.
  4. Sandbox layers. "Sandbox" is one word for four isolations — compute, filesystem custody, network egress, secret custody. At small scale there is no one sandbox; there is a stack, layered by what each protects, and the cached note tells the agent which layer a task needs. Detail: the decomposition in Agent Sandbox Systems and the tested configurations in Agent Sandbox: Practical Configs.
  5. An observability seam. The one mechanism by which the platform hat can watch the author hat without interrupting it — at org scale a dashboard for the platform team; at small scale a dashboard for the same author one session later. Detail: the posture in Agent Telemetry Systems; the working instrument is the read-only crowsnest dashboard.

6. Integrated review: the team that never existed

The implicit guarantor in the original model is a review process — someone eventually checks the work before it ships. At small scale that review either does not happen or folds into the implementer's own loop. The move is to make review a property of the artifact, not of a team: every reviewed claim carries a verdict annotation from a controlled vocabulary (correct / corrected / disputed / needs-citation / verified), and a separate agent pass in the same session plays reviewer — reads the diff, writes the verdict, appends to the ledger. The methodology is worked out in Annotation Systems.

It is not a panacea: the reviewer agent shares model biases with the implementer, so it catches mechanical faults (date confusions, vocabulary drift, missing citations, convention violations) but not claims wrong in ways the training data is also wrong. The ledger makes that limitation auditable later, not gone — the agent that wakes up weeks on and finds a verdict that no longer matches the heading is the one that catches the drift.

Why it works small and breaks large: at small scale implementer and reviewer share context anyway, so a second pass is cheap. At org scale, the team that wrote the application signing off on its own safety is a conflict of interest — which is exactly why the original model puts that guarantee in the platform's guardrails.

7. The CTO's scorecard

Wulveryck lists platform-maturity criteria; here they are as questions a leader can ask of any agentic setup, at any scale:

  • Guardrail coverage — does every artifact bearing a verifiable claim carry a reviewed verdict? (A coverage ratio you can trend, not a vibe.)
  • Pipeline reliability — are deploy and canonical-URL checks gates, not metrics?
  • Self-service share — what fraction of sessions complete a full produce-and-ship loop without the human debugging the pipeline?
  • Documentation completeness — is the convention vocabulary closed (no improvised values), and does every tool entry-point carry a one-line description?
  • Decision traceability — does the ledger's integrity check pass, and is every verdict attributed to a verifier?

When these hold, the platform absorbs enough anticipation that the human can prompt with the question rather than the question plus its scaffolding. That is the entire return on the platform investment, stated as something a CTO can measure.

8. Graduation: the rule of three becomes the rule of three notes

Wulveryck's "rule of three" — a guardrail repeated by three teams is a candidate for systemization — has no three teams at small scale. It becomes the rule of three notes: a pattern that shows up in three artifacts is the candidate for promotion from convention to code (a module the agent consults programmatically). Graduation is the platform hat's job even when the platform hat is the same person who wrote the original three notes.

9. Risks (all of Wulveryck's, intensified)

  • The PO bottleneck. The original warns the product owner can carry the platform backlog, guardrail graduation, and portfolio governance — the very bottleneck the model claims to prevent. At small scale that PO is the human, with no escape valve; the only compensation is tooling that surfaces graduation candidates and stale work, so the human is reminded of decisions rather than expected to remember them.
  • Loss of judgement by absence. "A team that never sees the protection mechanisms loses the ability to judge their relevance." The agentic form: the reviewer learns to satisfy the verdict vocabulary without engaging the claim — "correct" becomes the lazy default. Mitigation: keep the verdict distribution visible; an all-green ledger is a warning, not a success.
  • Shadow infrastructure. "Without governance you end up with industrialized shadow IT." At small scale that is the loose script that should have been a module, the convention quietly drifting. Let it accumulate and the platform stops being a cache of boundaries and becomes a junk drawer.

10. Conclusion

The original argument holds: the platform's job is to absorb the anticipation burden so the human's prompt carries only judgement. Scaled down, the same job is done by the boundaries a small team caches in its repository — memory, a decision ledger, credential tiers, sandbox layers, an observability seam — and by the four hats one person wears in sequence. The four team types do not vanish; they become four caches with the same names. The three interaction modes do not vanish; they become three transitions inside a single session.

The bet is that the cache is durable enough to outlive the session, the model upgrade, and the agent switch. That bet is testable — the memory, the ledger, the tier classification, and the verdict annotations either survive the next six months unchanged or they do not.

11. References