Skip to content

ADR-0011: Topology — Postgres CTEs + precomputed blast-radius projection

ADR-0011: Topology — Postgres CTEs + precomputed blast-radius projection

Status

Accepted

Date

2026-06-24

Context

The proof loop ends with “recalculate service impact,” and POC 8 is the topology engine (connectivity, containment, service-dependency, trace path, blast radius, capacity, single-point-of-failure). Phase 3 targets topology traversal at 10M links.

Blast-radius — given a failed span, every affected service/site/customer — is the killer telecom feature, and also the query most likely to blow the p95 target, because it is a potentially deep, unbounded graph traversal. Recursive CTEs in Postgres handle bounded traversals well (trace a path, k-hop continuity) but degrade badly on deep blast-radius over millions of links. The temptation is a dedicated graph DB — but that is another store to operate, sync, and policy-filter, and it is premature at proof scale.

Decision

Postgres adjacency + recursive CTEs for bounded traversals, and precompute blast-radius as a materialized projection.

  • Adjacency tables + recursive CTEs for trace/continuity/bounded queries.
  • A service-dependency projection (service → reachable spans), maintained by the Projection Engine, turns blast-radius into a lookup, not a live deep traversal. This is squarely within the Projection Engine’s existing job of emitting topology indexes.
  • Put the whole thing behind a topology-engine interface so a real graph engine can be swapped in later.
  • Defer the dedicated graph DB and the 10M-link perf proof until a real dataset demands them.

Alternatives Considered

Dedicated graph store now (Neo4j / heavy pgRouting / custom)

  • Rejected: best traversal ceiling, but another store to operate, sync, and policy-filter — premature at proof scale.

In-memory graph projection

Load per-Universe topology into RAM for fast traversal.

  • Rejected for now: fast, but adds a stateful service bounded by RAM. Reconsider behind the topology interface if CTEs + precompute prove insufficient.

Live CTEs only, no precompute

  • Rejected: simplest, but deep blast-radius is the single query most likely to miss p95 at real scale; precompute is the targeted fix.

Consequences

  • Blast-radius is served as a lookup against a maintained projection — fast reads, with freshness bounded by projection lag (ADR-0003).
  • The topology-engine interface preserves the option to adopt a graph engine without rewriting callers.
  • The precomputed service-dependency projection must be invalidated/updated correctly when topology links change — its correctness is part of the projection-rebuild tests.
  • Read-side authority (link/traversal visibility) must apply to topology results, including the precomputed projection.