Federal Frontier AI Platform

Architecture overview of the Federal Frontier AI Platform — MCP servers, Compass, FFO ontology, and LLM integration.

Federal Frontier AI Platform

InfrastructureAI monitors infrastructure continuously, fires autonomous remediation workflows at warning thresholds before failure occurs, and writes the post-mortem after every incident — without a runbook library, without waking an SRE, and without sending classified data outside the customer’s authorization boundary.

Built for federal agencies and defense contractors operating classified, air-gapped, or tactically disconnected infrastructure. InfrastructureAI runs sovereign at IL2 through IL6 including tactical edge. The agent architecture (TrailbossAI + Posses) and the Federal Frontier Ontology (FFO) living digital twin run entirely within the customer’s boundary. No external AI API touches classified content. For IL2-IL4, Claude inference stays within AWS’s private network via Bedrock VPC PrivateLink. For IL6 and tactical edge, inference runs on-premises with US-origin models.

Components

Component Purpose Tech Stack
FFO (Federal Frontier Ontology) Knowledge graph of all infrastructure entities and relationships TypeDB 3.x, TypeQL
MCP Servers (12) Tool servers exposing infrastructure APIs via Model Context Protocol Python/Go, FastAPI, JSON-RPC
Compass Digital twin UI with AI chat, graph visualization, and dashboards Next.js, FastAPI, ReactFlow
LLM Reasoning Autonomous agent reasoning and tool calling Claude via Bedrock (IL2-IL5), vLLM/Ollama (IL6/Edge)
Agent Orchestration TrailbossAI (missions) + Posses (Marshal, Scout, Sage, Wrangler) LangGraph, CrewAI
OutpostAI Kubernetes cluster lifecycle management (create, delete, add-ons) Vue.js, Go (Trailboss)

How It Works

Conversational mode — operators ask questions in natural language via Compass. The LLM selects MCP tools, executes them, and returns formatted results. This is how operators query infrastructure state today.

Autonomous mode — Grafana fires an alert at a warning threshold (60% CPU, 80% disk). The alert becomes a WorkItem in FFO. TrailbossAI dispatches a Posse: Marshal validates policy, Scout discovers current state, Sage reasons and plans remediation, Wrangler executes and verifies. The SRE reads the post-mortem. No page. No 2am wakeup.

Both modes use the same MCP tool ecosystem and the same FFO knowledge graph. The difference is the trigger — human question vs. automated alert.

MCP Server Fleet

The platform runs 12 MCP servers with 150+ tools total:

Server Tools What It Does
Grafana 28 Dashboards, alerts, data sources
OpenStack 23 VMs, networks, images, flavors
Gitea 22 Git repos, branches, PRs, issues
Atlassian 20 Jira issues, Confluence pages
Keycloak 12 Users, roles, realms, clients
Ceph 12 Pools, OSDs, monitors, health
ArgoCD 11 Apps, sync, rollback, projects
Kolla 10 OpenStack containers, services, logs
FFO 7 Ontology queries, entity CRUD
Federal Compliance 3 NIST controls, compliance checks
Tool Hub 1 Aggregated tool discovery
Trailboss 1 Cluster provisioning status

Deployment

All components deploy to Kubernetes (namespace f3iai) via ArgoCD with GitOps:

  • GitOps repo: Gitea (ArgoCD source of truth)
  • Container registry: Harbor
  • ArgoCD apps: Auto-sync + self-heal + prune enabled
  • Workflow: Code change → build image → push to Harbor → update GitOps manifest → ArgoCD syncs

Next Steps