Platform Architecture
Component architecture of the Federal Frontier AI Platform — FFO knowledge graph, MCP tool servers, Compass UI, LLM reasoning layer, and deployment topology.
Platform Architecture
The Federal Frontier AI Platform is a layered system that gives operators conversational access to infrastructure through a knowledge graph, tool servers, and an LLM reasoning layer. This page covers each layer, how they connect, and where they run.
Architecture Diagram
Next.js — compass.vitro.lan] UI --> API[Compass API
FastAPI — Bedrock Claude Sonnet 4.6] API --> FFO[FFO TypeDB
Knowledge Graph
40 entities · 48 relations] API -->|tool calls| MCP[MCP Server Fleet — 13 servers · 153+ tools] MCP -->|queries| FFO MCP --> Infra[Infrastructure Layer
OpenStack · Ceph · Keycloak · ArgoCD
Gitea · Grafana · Harbor · Kolla · PostgreSQL] Alert[Grafana Alertmanager] --> DC[Dispatch Controller
FastAPI] DC --> OPA[OPA Risk Governance
Rego Policy] DC --> FFO DC --> Job[Claude Code Runner
K8s Job] Job -->|tool calls| MCP style Op fill:#2b6cb0,stroke:#4299e1,color:#fff style UI fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style API fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style FFO fill:#2c7a7b,stroke:#38b2ac,color:#e2e8f0 style MCP fill:#1a365d,stroke:#4299e1,color:#e2e8f0 style Infra fill:#1a202c,stroke:#718096,color:#e2e8f0 style Alert fill:#c53030,stroke:#fc8181,color:#fff style DC fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style OPA fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style Job fill:#2b6cb0,stroke:#4299e1,color:#fff
Layer 1: Operator Interface (Compass)
Compass is the frontend application that operators interact with. It is a Next.js application backed by a FastAPI API server. It provides:
- Chat interface for natural language queries (“list all production clusters”, “show Ceph health”, “what NIST controls apply to geo-prod-01?”)
- Graph visualization of entities and relationships from the FFO ontology using ReactFlow
- Dashboard views for cluster status, findings, compliance posture
- Table-formatted results so operators get structured output, not raw JSON
The Compass API receives operator input, determines whether the query can be answered directly from the FFO knowledge graph or needs to go through the LLM for tool selection, and routes accordingly.
Tech stack: Next.js frontend, FastAPI backend, BlueprintJS components, ReactFlow graph rendering.
Access: https://compass.vitro.lan via Traefik IngressRoute.
Layer 2: LLM Reasoning Layer
The LLM provides intent classification, tool selection, query construction, and result synthesis. It does not have direct access to infrastructure — all actions flow through MCP servers using the Model Context Protocol (JSON-RPC).
The primary inference path is Claude Sonnet 4.6 via AWS Bedrock VPC PrivateLink — a sovereign deployment pattern where all inference traffic stays within AWS’s private network fabric. No public internet traversal. No Anthropic visibility into CUI content. This is architecturally enforced, not a policy choice. The Compass API and Dispatch Controller both use Bedrock as the inference backend.
| Classification | Inference Backend | Model | Notes |
|---|---|---|---|
| IL2-IL4 CUI | AWS Bedrock VPC PrivateLink | Claude Sonnet/Opus | Sovereign — FedRAMP Moderate, no public internet |
| IL5 | Bedrock GovCloud | Claude Sonnet/Opus | FedRAMP High / DoD SRG |
| IL6 Air-Gapped | vLLM on VitroAI | Llama 3.3 70B | US-origin models only, agency RMF |
| Tactical Edge | Ollama on Ampere ARM64 | Llama 3.1 8B | Pre-certified TFO playbooks |
| Dev/Operator Workstation | Ollama (LAN) | qwen3.5 35B | Development and testing |
For the full inference architecture, see Sovereign Inference.
Layer 2.5: Agent Orchestration Layer
Between the LLM and the MCP tool servers sits the agent orchestration layer — the system that makes InfrastructureAI autonomous rather than just conversational. This layer has two parts: the Agent Harness (validated and deployed) and the multi-agent orchestration (target architecture).
Agent Harness (Deployed)
The Dispatch Controller is the first production implementation of the Agent Harness pattern (ADR-005). It receives alerts from Grafana Alertmanager, evaluates risk via OPA Rego policy, fetches context from the FFO knowledge graph, and spawns a Kubernetes Job running Claude Code with access to the full MCP tool surface. Each dispatched agent is isolated, short-lived, and fully audited.
The harness enforces eight components on every autonomous dispatch: Trigger Layer, Risk Governance, Context Injection, Tool Surface, Output Parsing, Audit Trail, Escalation Routing, and World Model Write-Back. See Agent Harness Pattern (ADR-005) for the full architecture.
Multi-Agent Orchestration (Target)
TrailbossAI (LangGraph) manages mission lifecycle — intake, scope assessment, risk calculation, approval gates, multi-Posse coordination, failure handling, and completion verification. It does not do the work; it decides what work needs to happen, in what order, with what approvals.
Posses (CrewAI) execute domain-specific work. Each Posse contains four specialized agents: Marshal validates policy and authorization, Scout discovers current infrastructure state, Sage analyzes and plans remediation, and Wrangler executes changes and writes outcomes back to FFO. Multiple Posses can operate in parallel across different infrastructure domains (AWS, VitroAI, Platform).
FFO is the shared read/write context all agents operate through. Agents read state from the knowledge graph, act via MCP tools, and write results back. Every remediation generates an outcome record in FFO — these accumulate as institutional memory without anyone writing runbooks.
For the multi-agent architecture, see Agent Architecture — TrailbossAI, Posses, and Autonomous Remediation.
Layer 3: MCP Tool Servers
The platform runs 13 MCP servers exposing 153+ verified tools. Each server wraps a specific infrastructure API and exposes it as typed, documented MCP tools that the LLM can call.
| Server | Tools | Infrastructure Target |
|---|---|---|
| Grafana MCP | 28 | Dashboards, alerts, Prometheus, Loki |
| OpenStack MCP | 23 | Nova, Neutron, Glance, Keystone |
| Gitea MCP | 22 | Git repositories, branches, PRs |
| Atlassian MCP | 20 | Jira issues, Confluence pages |
| Keycloak MCP | 12 | Identity management (users, roles, realms) |
| Ceph MCP | 12 | Ceph cluster (pools, OSDs, monitors) |
| ArgoCD MCP | 11 | GitOps deployments (apps, sync, rollback) |
| Kolla MCP | 10 | OpenStack containers, services, logs |
| FFO MCP | 10 | TypeDB knowledge graph |
| Web MCP | — | Web fetch, crawl, link extraction |
| Federal Compliance | 3 | NIST controls, compliance checks |
| Tool Hub | 1 | Aggregated tool discovery |
| Trailboss | 1 | Cluster provisioning status |
Each MCP server follows the same pattern:
- FastAPI application with SSE transport at
/mcp/sse - Health (
/health) and readiness (/ready) probes for Kubernetes - Prometheus metrics at
/metrics - Pydantic models for request/response validation
- Typed tool definitions with JSON Schema input schemas
The FFO MCP server is the central tool server. It exposes the knowledge graph to the LLM via 10 tools: ffo.query, ffo.infer, ffo.entity.get, ffo.entity.create, ffo.entity.update, ffo.search, ffo.traverse, ffo.relationship.create, ffo.write, and ffo.context.for_action.
Layer 4: FFO Knowledge Graph (TypeDB)
The Federal Frontier Ontology (FFO) is a TypeDB 3.x knowledge graph that models the entire infrastructure as entities and relationships. It is the single source of truth for “what exists and how it connects.”
FFO contains:
- 40 entity types covering compute, storage, security, identity, and operations
- 48 relation types linking entities across domains
- Inference rules that derive relationships automatically (baseline inheritance, finding propagation, control applicability)
Both the Compass API and the FFO MCP server query TypeDB directly. The LLM accesses it through the FFO MCP server’s tools.
FFO answers “what is X?” — it does not handle authorization. Authorization is divided across three systems per ADR-001: FFO (TypeDB) answers “what is X?”, Keycloak answers “who is this?”, and OPA (Rego) enforces “can user Y do Z?” at the Wanaku MCP boundary. PostgreSQL is not an authorization store in the Federal Frontier Platform.
Tech stack: TypeDB 3.x (Community Edition 3.7.2), TypeQL query language.
Layer 5: Infrastructure
The bottom layer is the actual infrastructure that MCP servers interact with:
- OpenStack (Nova, Neutron, Glance, Keystone) — VM and network management via Kolla deployment
- Ceph — Distributed storage (pools, OSDs, monitors, health)
- Keycloak — Identity and access management
- ArgoCD — GitOps continuous deployment
- Gitea — Git repository hosting (ArgoCD source of truth)
- Grafana — Monitoring dashboards and alerting
- Harbor — Container image registry
- OPA — Policy enforcement (Rego) at MCP tool invocation boundary
Deployment Topology
All AI Platform components run in the f3iai Kubernetes namespace on the texas-dell-04 cluster.
GitOps Workflow
harbor.vitro.lan/ffp/*] C --> D[Update GitOps manifest
in Gitea] D --> E[ArgoCD detects change
syncs to cluster] style A fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style B fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style C fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style D fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style E fill:#2b6cb0,stroke:#4299e1,color:#fff
- Git source of truth: Gitea at
gitea-http.gitea.svc.cluster.local:3000 - Container registry: Harbor at
harbor.vitro.lan - Deployment manifests:
deploy/overlays/fmc/<app-name>/on themainbranch - ArgoCD configuration: Auto-sync, self-heal, and prune enabled for all apps
Container Images
| Component | Image | Base |
|---|---|---|
| FFO MCP Server | harbor.vitro.lan/ffp/ffo-mcp-server:v1.0.0 |
python:3.11-slim |
| TypeDB | harbor.vitro.lan/ffp/typedb:3.7.2 |
TypeDB CE |
| Compass API | harbor.vitro.lan/ffp/compass-api:* |
python:3.11-slim |
| Compass Frontend | harbor.vitro.lan/ffp/compass-frontend:* |
Node.js |
Network
- TypeDB runs as a ClusterIP service (no NodePort). For local development, tunnel to the pod IP:
ssh -fN -L 1729:<pod-ip>:1729 ubuntu@texas-dell-04 - MCP servers communicate over cluster-internal networking
- Compass is exposed externally via Traefik IngressRoute at
compass.vitro.lan - Prometheus scrapes
/metricsendpoints on each MCP server
Health Checks
Every MCP server exposes two probe endpoints:
/health(liveness) — returns{"status": "healthy"}if the process is running/ready(readiness) — returns{"status": "ready"}after verifying connectivity to its backend (TypeDB, OpenStack API, etc.)
Kubernetes uses these probes to restart unhealthy pods and remove unready pods from service endpoints.