Platform Architecture

Component architecture of the Federal Frontier AI Platform — FFO knowledge graph, MCP tool servers, Compass UI, LLM reasoning layer, and deployment topology.

Platform Architecture

The Federal Frontier AI Platform is a layered system that gives operators conversational access to infrastructure through a knowledge graph, tool servers, and an LLM reasoning layer. This page covers each layer, how they connect, and where they run.

Architecture Diagram

graph TD Op[Operator] --> UI[Compass UI
Next.js — compass.vitro.lan] UI --> API[Compass API
FastAPI — Bedrock Claude Sonnet 4.6] API --> FFO[FFO TypeDB
Knowledge Graph
40 entities · 48 relations] API -->|tool calls| MCP[MCP Server Fleet — 13 servers · 153+ tools] MCP -->|queries| FFO MCP --> Infra[Infrastructure Layer
OpenStack · Ceph · Keycloak · ArgoCD
Gitea · Grafana · Harbor · Kolla · PostgreSQL] Alert[Grafana Alertmanager] --> DC[Dispatch Controller
FastAPI] DC --> OPA[OPA Risk Governance
Rego Policy] DC --> FFO DC --> Job[Claude Code Runner
K8s Job] Job -->|tool calls| MCP style Op fill:#2b6cb0,stroke:#4299e1,color:#fff style UI fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style API fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style FFO fill:#2c7a7b,stroke:#38b2ac,color:#e2e8f0 style MCP fill:#1a365d,stroke:#4299e1,color:#e2e8f0 style Infra fill:#1a202c,stroke:#718096,color:#e2e8f0 style Alert fill:#c53030,stroke:#fc8181,color:#fff style DC fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style OPA fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style Job fill:#2b6cb0,stroke:#4299e1,color:#fff

Layer 1: Operator Interface (Compass)

Compass is the frontend application that operators interact with. It is a Next.js application backed by a FastAPI API server. It provides:

Chat interface for natural language queries (“list all production clusters”, “show Ceph health”, “what NIST controls apply to geo-prod-01?”)
Graph visualization of entities and relationships from the FFO ontology using ReactFlow
Dashboard views for cluster status, findings, compliance posture
Table-formatted results so operators get structured output, not raw JSON

The Compass API receives operator input, determines whether the query can be answered directly from the FFO knowledge graph or needs to go through the LLM for tool selection, and routes accordingly.

Tech stack: Next.js frontend, FastAPI backend, BlueprintJS components, ReactFlow graph rendering.

Access: https://compass.vitro.lan via Traefik IngressRoute.

Layer 2: LLM Reasoning Layer

The LLM provides intent classification, tool selection, query construction, and result synthesis. It does not have direct access to infrastructure — all actions flow through MCP servers using the Model Context Protocol (JSON-RPC).

The primary inference path is Claude Sonnet 4.6 via AWS Bedrock VPC PrivateLink — a sovereign deployment pattern where all inference traffic stays within AWS’s private network fabric. No public internet traversal. No Anthropic visibility into CUI content. This is architecturally enforced, not a policy choice. The Compass API and Dispatch Controller both use Bedrock as the inference backend.

Classification	Inference Backend	Model	Notes
IL2-IL4 CUI	AWS Bedrock VPC PrivateLink	Claude Sonnet/Opus	Sovereign — FedRAMP Moderate, no public internet
IL5	Bedrock GovCloud	Claude Sonnet/Opus	FedRAMP High / DoD SRG
IL6 Air-Gapped	vLLM on VitroAI	Llama 3.3 70B	US-origin models only, agency RMF
Tactical Edge	Ollama on Ampere ARM64	Llama 3.1 8B	Pre-certified TFO playbooks
Dev/Operator Workstation	Ollama (LAN)	qwen3.5 35B	Development and testing

For the full inference architecture, see Sovereign Inference.

Layer 2.5: Agent Orchestration Layer

Between the LLM and the MCP tool servers sits the agent orchestration layer — the system that makes InfrastructureAI autonomous rather than just conversational. This layer has two parts: the Agent Harness (validated and deployed) and the multi-agent orchestration (target architecture).

Agent Harness (Deployed)

The Dispatch Controller is the first production implementation of the Agent Harness pattern (ADR-005). It receives alerts from Grafana Alertmanager, evaluates risk via OPA Rego policy, fetches context from the FFO knowledge graph, and spawns a Kubernetes Job running Claude Code with access to the full MCP tool surface. Each dispatched agent is isolated, short-lived, and fully audited.

The harness enforces eight components on every autonomous dispatch: Trigger Layer, Risk Governance, Context Injection, Tool Surface, Output Parsing, Audit Trail, Escalation Routing, and World Model Write-Back. See Agent Harness Pattern (ADR-005) for the full architecture.

Multi-Agent Orchestration (Target)

TrailbossAI (LangGraph) manages mission lifecycle — intake, scope assessment, risk calculation, approval gates, multi-Posse coordination, failure handling, and completion verification. It does not do the work; it decides what work needs to happen, in what order, with what approvals.

Posses (CrewAI) execute domain-specific work. Each Posse contains four specialized agents: Marshal validates policy and authorization, Scout discovers current infrastructure state, Sage analyzes and plans remediation, and Wrangler executes changes and writes outcomes back to FFO. Multiple Posses can operate in parallel across different infrastructure domains (AWS, VitroAI, Platform).

FFO is the shared read/write context all agents operate through. Agents read state from the knowledge graph, act via MCP tools, and write results back. Every remediation generates an outcome record in FFO — these accumulate as institutional memory without anyone writing runbooks.

For the multi-agent architecture, see Agent Architecture — TrailbossAI, Posses, and Autonomous Remediation.

Layer 3: MCP Tool Servers

The platform runs 13 MCP servers exposing 153+ verified tools. Each server wraps a specific infrastructure API and exposes it as typed, documented MCP tools that the LLM can call.

Server	Tools	Infrastructure Target
Grafana MCP	28	Dashboards, alerts, Prometheus, Loki
OpenStack MCP	23	Nova, Neutron, Glance, Keystone
Gitea MCP	22	Git repositories, branches, PRs
Atlassian MCP	20	Jira issues, Confluence pages
Keycloak MCP	12	Identity management (users, roles, realms)
Ceph MCP	12	Ceph cluster (pools, OSDs, monitors)
ArgoCD MCP	11	GitOps deployments (apps, sync, rollback)
Kolla MCP	10	OpenStack containers, services, logs
FFO MCP	10	TypeDB knowledge graph
Web MCP	—	Web fetch, crawl, link extraction
Federal Compliance	3	NIST controls, compliance checks
Tool Hub	1	Aggregated tool discovery
Trailboss	1	Cluster provisioning status

Each MCP server follows the same pattern:

FastAPI application with SSE transport at /mcp/sse
Health (/health) and readiness (/ready) probes for Kubernetes
Prometheus metrics at /metrics
Pydantic models for request/response validation
Typed tool definitions with JSON Schema input schemas

The FFO MCP server is the central tool server. It exposes the knowledge graph to the LLM via 10 tools: ffo.query, ffo.infer, ffo.entity.get, ffo.entity.create, ffo.entity.update, ffo.search, ffo.traverse, ffo.relationship.create, ffo.write, and ffo.context.for_action.

Layer 4: FFO Knowledge Graph (TypeDB)

The Federal Frontier Ontology (FFO) is a TypeDB 3.x knowledge graph that models the entire infrastructure as entities and relationships. It is the single source of truth for “what exists and how it connects.”

FFO contains:

40 entity types covering compute, storage, security, identity, and operations
48 relation types linking entities across domains
Inference rules that derive relationships automatically (baseline inheritance, finding propagation, control applicability)

Both the Compass API and the FFO MCP server query TypeDB directly. The LLM accesses it through the FFO MCP server’s tools.

FFO answers “what is X?” — it does not handle authorization. Authorization is divided across three systems per ADR-001: FFO (TypeDB) answers “what is X?”, Keycloak answers “who is this?”, and OPA (Rego) enforces “can user Y do Z?” at the Wanaku MCP boundary. PostgreSQL is not an authorization store in the Federal Frontier Platform.

Tech stack: TypeDB 3.x (Community Edition 3.7.2), TypeQL query language.

Layer 5: Infrastructure

The bottom layer is the actual infrastructure that MCP servers interact with:

OpenStack (Nova, Neutron, Glance, Keystone) — VM and network management via Kolla deployment
Ceph — Distributed storage (pools, OSDs, monitors, health)
Keycloak — Identity and access management
ArgoCD — GitOps continuous deployment
Gitea — Git repository hosting (ArgoCD source of truth)
Grafana — Monitoring dashboards and alerting
Harbor — Container image registry
OPA — Policy enforcement (Rego) at MCP tool invocation boundary

Deployment Topology

All AI Platform components run in the f3iai Kubernetes namespace on the texas-dell-04 cluster.

GitOps Workflow

graph LR A[Developer commits code] --> B[Build container image] B --> C[Push to Harbor
harbor.vitro.lan/ffp/*] C --> D[Update GitOps manifest
in Gitea] D --> E[ArgoCD detects change
syncs to cluster] style A fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style B fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style C fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style D fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style E fill:#2b6cb0,stroke:#4299e1,color:#fff

Git source of truth: Gitea at gitea-http.gitea.svc.cluster.local:3000
Container registry: Harbor at harbor.vitro.lan
Deployment manifests: deploy/overlays/fmc/<app-name>/ on the main branch
ArgoCD configuration: Auto-sync, self-heal, and prune enabled for all apps

Container Images

Component	Image	Base
FFO MCP Server	`harbor.vitro.lan/ffp/ffo-mcp-server:v1.0.0`	`python:3.11-slim`
TypeDB	`harbor.vitro.lan/ffp/typedb:3.7.2`	TypeDB CE
Compass API	`harbor.vitro.lan/ffp/compass-api:*`	`python:3.11-slim`
Compass Frontend	`harbor.vitro.lan/ffp/compass-frontend:*`	Node.js

Network

TypeDB runs as a ClusterIP service (no NodePort). For local development, tunnel to the pod IP: ssh -fN -L 1729:<pod-ip>:1729 ubuntu@texas-dell-04
MCP servers communicate over cluster-internal networking
Compass is exposed externally via Traefik IngressRoute at compass.vitro.lan
Prometheus scrapes /metrics endpoints on each MCP server

Health Checks

Every MCP server exposes two probe endpoints:

/health (liveness) — returns {"status": "healthy"} if the process is running
/ready (readiness) — returns {"status": "ready"} after verifying connectivity to its backend (TypeDB, OpenStack API, etc.)

Kubernetes uses these probes to restart unhealthy pods and remove unready pods from service endpoints.