Federal Frontier AI Platform
Architecture overview of the Federal Frontier AI Platform — MCP servers, Compass, FFO ontology, and LLM integration.
Federal Frontier AI Platform
InfrastructureAI monitors infrastructure continuously, fires autonomous remediation workflows at warning thresholds before failure occurs, and writes the post-mortem after every incident — without a runbook library, without waking an SRE, and without sending classified data outside the customer’s authorization boundary.
Built for federal agencies and defense contractors operating classified, air-gapped, or tactically disconnected infrastructure. InfrastructureAI runs sovereign at IL2 through IL6 including tactical edge. The agent architecture (TrailbossAI + Posses) and the Federal Frontier Ontology (FFO) living digital twin run entirely within the customer’s boundary. No external AI API touches classified content. For IL2-IL4, Claude inference stays within AWS’s private network via Bedrock VPC PrivateLink. For IL6 and tactical edge, inference runs on-premises with US-origin models.
Components
| Component | Purpose | Tech Stack |
|---|---|---|
| FFO (Federal Frontier Ontology) | Knowledge graph of all infrastructure entities and relationships | TypeDB 3.x, TypeQL |
| MCP Servers (12) | Tool servers exposing infrastructure APIs via Model Context Protocol | Python/Go, FastAPI, JSON-RPC |
| Compass | Digital twin UI with AI chat, graph visualization, and dashboards | Next.js, FastAPI, ReactFlow |
| LLM Reasoning | Autonomous agent reasoning and tool calling | Claude via Bedrock (IL2-IL5), vLLM/Ollama (IL6/Edge) |
| Agent Orchestration | TrailbossAI (missions) + Posses (Marshal, Scout, Sage, Wrangler) | LangGraph, CrewAI |
| OutpostAI | Kubernetes cluster lifecycle management (create, delete, add-ons) | Vue.js, Go (Trailboss) |
How It Works
Conversational mode — operators ask questions in natural language via Compass. The LLM selects MCP tools, executes them, and returns formatted results. This is how operators query infrastructure state today.
Autonomous mode — Grafana fires an alert at a warning threshold (60% CPU, 80% disk). The alert becomes a WorkItem in FFO. TrailbossAI dispatches a Posse: Marshal validates policy, Scout discovers current state, Sage reasons and plans remediation, Wrangler executes and verifies. The SRE reads the post-mortem. No page. No 2am wakeup.
Both modes use the same MCP tool ecosystem and the same FFO knowledge graph. The difference is the trigger — human question vs. automated alert.
MCP Server Fleet
The platform runs 12 MCP servers with 150+ tools total:
| Server | Tools | What It Does |
|---|---|---|
| Grafana | 28 | Dashboards, alerts, data sources |
| OpenStack | 23 | VMs, networks, images, flavors |
| Gitea | 22 | Git repos, branches, PRs, issues |
| Atlassian | 20 | Jira issues, Confluence pages |
| Keycloak | 12 | Users, roles, realms, clients |
| Ceph | 12 | Pools, OSDs, monitors, health |
| ArgoCD | 11 | Apps, sync, rollback, projects |
| Kolla | 10 | OpenStack containers, services, logs |
| FFO | 7 | Ontology queries, entity CRUD |
| Federal Compliance | 3 | NIST controls, compliance checks |
| Tool Hub | 1 | Aggregated tool discovery |
| Trailboss | 1 | Cluster provisioning status |
Deployment
All components deploy to Kubernetes (namespace f3iai) via ArgoCD with GitOps:
- GitOps repo: Gitea (ArgoCD source of truth)
- Container registry: Harbor
- ArgoCD apps: Auto-sync + self-heal + prune enabled
- Workflow: Code change → build image → push to Harbor → update GitOps manifest → ArgoCD syncs
Next Steps
- Autonomous SRE Agent — How InfrastructureAI prevents outages before they happen
- Agent Architecture — TrailbossAI, Posses, and the mission lifecycle
- Sovereign Inference — Classified AI from IL2 through tactical edge
- Platform Architecture — Component layers and deployment topology
- MCP Servers — The 12 tool servers and 150+ tools
- FFO Ontology — The knowledge graph that ties it all together
- Compass — The digital twin UI