Platform Architecture

Component architecture of the Federal Frontier AI Platform — FFO knowledge graph, MCP tool servers, Compass UI, LLM reasoning layer, and deployment topology.

Platform Architecture

The Federal Frontier AI Platform is a layered system that gives operators conversational access to infrastructure through a knowledge graph, tool servers, and an LLM reasoning layer. This page covers each layer, how they connect, and where they run.

Architecture Diagram

graph TD Op[Operator] --> UI[Compass UI
Next.js — compass.vitro.lan] UI --> API[Compass API
FastAPI — Bedrock Claude Sonnet 4.6] API --> FFO[FFO TypeDB
Knowledge Graph
40 entities · 48 relations] API -->|tool calls| MCP[MCP Server Fleet — 13 servers · 153+ tools] MCP -->|queries| FFO MCP --> Infra[Infrastructure Layer
OpenStack · Ceph · Keycloak · ArgoCD
Gitea · Grafana · Harbor · Kolla · PostgreSQL] Alert[Grafana Alertmanager] --> DC[Dispatch Controller
FastAPI] DC --> OPA[OPA Risk Governance
Rego Policy] DC --> FFO DC --> Job[Claude Code Runner
K8s Job] Job -->|tool calls| MCP style Op fill:#2b6cb0,stroke:#4299e1,color:#fff style UI fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style API fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style FFO fill:#2c7a7b,stroke:#38b2ac,color:#e2e8f0 style MCP fill:#1a365d,stroke:#4299e1,color:#e2e8f0 style Infra fill:#1a202c,stroke:#718096,color:#e2e8f0 style Alert fill:#c53030,stroke:#fc8181,color:#fff style DC fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style OPA fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style Job fill:#2b6cb0,stroke:#4299e1,color:#fff

Layer 1: Operator Interface (Compass)

Compass is the frontend application that operators interact with. It is a Next.js application backed by a FastAPI API server. It provides:

  • Chat interface for natural language queries (“list all production clusters”, “show Ceph health”, “what NIST controls apply to geo-prod-01?”)
  • Graph visualization of entities and relationships from the FFO ontology using ReactFlow
  • Dashboard views for cluster status, findings, compliance posture
  • Table-formatted results so operators get structured output, not raw JSON

The Compass API receives operator input, determines whether the query can be answered directly from the FFO knowledge graph or needs to go through the LLM for tool selection, and routes accordingly.

Tech stack: Next.js frontend, FastAPI backend, BlueprintJS components, ReactFlow graph rendering.

Access: https://compass.vitro.lan via Traefik IngressRoute.

Layer 2: LLM Reasoning Layer

The LLM provides intent classification, tool selection, query construction, and result synthesis. It does not have direct access to infrastructure — all actions flow through MCP servers using the Model Context Protocol (JSON-RPC).

The primary inference path is Claude Sonnet 4.6 via AWS Bedrock VPC PrivateLink — a sovereign deployment pattern where all inference traffic stays within AWS’s private network fabric. No public internet traversal. No Anthropic visibility into CUI content. This is architecturally enforced, not a policy choice. The Compass API and Dispatch Controller both use Bedrock as the inference backend.

Classification Inference Backend Model Notes
IL2-IL4 CUI AWS Bedrock VPC PrivateLink Claude Sonnet/Opus Sovereign — FedRAMP Moderate, no public internet
IL5 Bedrock GovCloud Claude Sonnet/Opus FedRAMP High / DoD SRG
IL6 Air-Gapped vLLM on VitroAI Llama 3.3 70B US-origin models only, agency RMF
Tactical Edge Ollama on Ampere ARM64 Llama 3.1 8B Pre-certified TFO playbooks
Dev/Operator Workstation Ollama (LAN) qwen3.5 35B Development and testing

For the full inference architecture, see Sovereign Inference.

Layer 2.5: Agent Orchestration Layer

Between the LLM and the MCP tool servers sits the agent orchestration layer — the system that makes InfrastructureAI autonomous rather than just conversational. This layer has two parts: the Agent Harness (validated and deployed) and the multi-agent orchestration (target architecture).

Agent Harness (Deployed)

The Dispatch Controller is the first production implementation of the Agent Harness pattern (ADR-005). It receives alerts from Grafana Alertmanager, evaluates risk via OPA Rego policy, fetches context from the FFO knowledge graph, and spawns a Kubernetes Job running Claude Code with access to the full MCP tool surface. Each dispatched agent is isolated, short-lived, and fully audited.

The harness enforces eight components on every autonomous dispatch: Trigger Layer, Risk Governance, Context Injection, Tool Surface, Output Parsing, Audit Trail, Escalation Routing, and World Model Write-Back. See Agent Harness Pattern (ADR-005) for the full architecture.

Multi-Agent Orchestration (Target)

TrailbossAI (LangGraph) manages mission lifecycle — intake, scope assessment, risk calculation, approval gates, multi-Posse coordination, failure handling, and completion verification. It does not do the work; it decides what work needs to happen, in what order, with what approvals.

Posses (CrewAI) execute domain-specific work. Each Posse contains four specialized agents: Marshal validates policy and authorization, Scout discovers current infrastructure state, Sage analyzes and plans remediation, and Wrangler executes changes and writes outcomes back to FFO. Multiple Posses can operate in parallel across different infrastructure domains (AWS, VitroAI, Platform).

FFO is the shared read/write context all agents operate through. Agents read state from the knowledge graph, act via MCP tools, and write results back. Every remediation generates an outcome record in FFO — these accumulate as institutional memory without anyone writing runbooks.

For the multi-agent architecture, see Agent Architecture — TrailbossAI, Posses, and Autonomous Remediation.

Layer 3: MCP Tool Servers

The platform runs 13 MCP servers exposing 153+ verified tools. Each server wraps a specific infrastructure API and exposes it as typed, documented MCP tools that the LLM can call.

Server Tools Infrastructure Target
Grafana MCP 28 Dashboards, alerts, Prometheus, Loki
OpenStack MCP 23 Nova, Neutron, Glance, Keystone
Gitea MCP 22 Git repositories, branches, PRs
Atlassian MCP 20 Jira issues, Confluence pages
Keycloak MCP 12 Identity management (users, roles, realms)
Ceph MCP 12 Ceph cluster (pools, OSDs, monitors)
ArgoCD MCP 11 GitOps deployments (apps, sync, rollback)
Kolla MCP 10 OpenStack containers, services, logs
FFO MCP 10 TypeDB knowledge graph
Web MCP Web fetch, crawl, link extraction
Federal Compliance 3 NIST controls, compliance checks
Tool Hub 1 Aggregated tool discovery
Trailboss 1 Cluster provisioning status

Each MCP server follows the same pattern:

  1. FastAPI application with SSE transport at /mcp/sse
  2. Health (/health) and readiness (/ready) probes for Kubernetes
  3. Prometheus metrics at /metrics
  4. Pydantic models for request/response validation
  5. Typed tool definitions with JSON Schema input schemas

The FFO MCP server is the central tool server. It exposes the knowledge graph to the LLM via 10 tools: ffo.query, ffo.infer, ffo.entity.get, ffo.entity.create, ffo.entity.update, ffo.search, ffo.traverse, ffo.relationship.create, ffo.write, and ffo.context.for_action.

Layer 4: FFO Knowledge Graph (TypeDB)

The Federal Frontier Ontology (FFO) is a TypeDB 3.x knowledge graph that models the entire infrastructure as entities and relationships. It is the single source of truth for “what exists and how it connects.”

FFO contains:

  • 40 entity types covering compute, storage, security, identity, and operations
  • 48 relation types linking entities across domains
  • Inference rules that derive relationships automatically (baseline inheritance, finding propagation, control applicability)

Both the Compass API and the FFO MCP server query TypeDB directly. The LLM accesses it through the FFO MCP server’s tools.

FFO answers “what is X?” — it does not handle authorization. Authorization is divided across three systems per ADR-001: FFO (TypeDB) answers “what is X?”, Keycloak answers “who is this?”, and OPA (Rego) enforces “can user Y do Z?” at the Wanaku MCP boundary. PostgreSQL is not an authorization store in the Federal Frontier Platform.

Tech stack: TypeDB 3.x (Community Edition 3.7.2), TypeQL query language.

Layer 5: Infrastructure

The bottom layer is the actual infrastructure that MCP servers interact with:

  • OpenStack (Nova, Neutron, Glance, Keystone) — VM and network management via Kolla deployment
  • Ceph — Distributed storage (pools, OSDs, monitors, health)
  • Keycloak — Identity and access management
  • ArgoCD — GitOps continuous deployment
  • Gitea — Git repository hosting (ArgoCD source of truth)
  • Grafana — Monitoring dashboards and alerting
  • Harbor — Container image registry
  • OPA — Policy enforcement (Rego) at MCP tool invocation boundary

Deployment Topology

All AI Platform components run in the f3iai Kubernetes namespace on the texas-dell-04 cluster.

GitOps Workflow

graph LR A[Developer commits code] --> B[Build container image] B --> C[Push to Harbor
harbor.vitro.lan/ffp/*] C --> D[Update GitOps manifest
in Gitea] D --> E[ArgoCD detects change
syncs to cluster] style A fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style B fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style C fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style D fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style E fill:#2b6cb0,stroke:#4299e1,color:#fff
  • Git source of truth: Gitea at gitea-http.gitea.svc.cluster.local:3000
  • Container registry: Harbor at harbor.vitro.lan
  • Deployment manifests: deploy/overlays/fmc/<app-name>/ on the main branch
  • ArgoCD configuration: Auto-sync, self-heal, and prune enabled for all apps

Container Images

Component Image Base
FFO MCP Server harbor.vitro.lan/ffp/ffo-mcp-server:v1.0.0 python:3.11-slim
TypeDB harbor.vitro.lan/ffp/typedb:3.7.2 TypeDB CE
Compass API harbor.vitro.lan/ffp/compass-api:* python:3.11-slim
Compass Frontend harbor.vitro.lan/ffp/compass-frontend:* Node.js

Network

  • TypeDB runs as a ClusterIP service (no NodePort). For local development, tunnel to the pod IP: ssh -fN -L 1729:<pod-ip>:1729 ubuntu@texas-dell-04
  • MCP servers communicate over cluster-internal networking
  • Compass is exposed externally via Traefik IngressRoute at compass.vitro.lan
  • Prometheus scrapes /metrics endpoints on each MCP server

Health Checks

Every MCP server exposes two probe endpoints:

  • /health (liveness) — returns {"status": "healthy"} if the process is running
  • /ready (readiness) — returns {"status": "ready"} after verifying connectivity to its backend (TypeDB, OpenStack API, etc.)

Kubernetes uses these probes to restart unhealthy pods and remove unready pods from service endpoints.