Federal Frontier AI Platform

Architecture overview of the Federal Frontier AI Platform — MCP servers, Compass, FFO ontology, and LLM integration.

Federal Frontier AI Platform

InfrastructureAI monitors infrastructure continuously, fires autonomous remediation workflows at warning thresholds before failure occurs, and writes the post-mortem after every incident — without a runbook library, without waking an SRE, and without sending classified data outside the customer’s authorization boundary.

Built for federal agencies and defense contractors operating classified, air-gapped, or tactically disconnected infrastructure. InfrastructureAI runs sovereign at IL2 through IL6 including tactical edge. The agent architecture (TrailbossAI + Posses) and the Federal Frontier Ontology (FFO) living digital twin run entirely within the customer’s boundary. No external AI API touches classified content. For IL2-IL4, Claude inference stays within AWS’s private network via Bedrock VPC PrivateLink. For IL6 and tactical edge, inference runs on-premises with US-origin models.

Components

Component	Purpose	Tech Stack
FFO (Federal Frontier Ontology)	Knowledge graph of all infrastructure entities and relationships	TypeDB 3.x, TypeQL
MCP Servers (12)	Tool servers exposing infrastructure APIs via Model Context Protocol	Python/Go, FastAPI, JSON-RPC
Compass	Digital twin UI with AI chat, graph visualization, and dashboards	Next.js, FastAPI, ReactFlow
LLM Reasoning	Autonomous agent reasoning and tool calling	Claude via Bedrock (IL2-IL5), vLLM/Ollama (IL6/Edge)
Agent Orchestration	TrailbossAI (missions) + Posses (Marshal, Scout, Sage, Wrangler)	LangGraph, CrewAI
OutpostAI	Kubernetes cluster lifecycle management (create, delete, add-ons)	Vue.js, Go (Trailboss)

How It Works

Conversational mode — operators ask questions in natural language via Compass. The LLM selects MCP tools, executes them, and returns formatted results. This is how operators query infrastructure state today.

Autonomous mode — Grafana fires an alert at a warning threshold (60% CPU, 80% disk). The alert becomes a WorkItem in FFO. TrailbossAI dispatches a Posse: Marshal validates policy, Scout discovers current state, Sage reasons and plans remediation, Wrangler executes and verifies. The SRE reads the post-mortem. No page. No 2am wakeup.

Both modes use the same MCP tool ecosystem and the same FFO knowledge graph. The difference is the trigger — human question vs. automated alert.

MCP Server Fleet

The platform runs 12 MCP servers with 150+ tools total:

Server	Tools	What It Does
Grafana	28	Dashboards, alerts, data sources
OpenStack	23	VMs, networks, images, flavors
Gitea	22	Git repos, branches, PRs, issues
Atlassian	20	Jira issues, Confluence pages
Keycloak	12	Users, roles, realms, clients
Ceph	12	Pools, OSDs, monitors, health
ArgoCD	11	Apps, sync, rollback, projects
Kolla	10	OpenStack containers, services, logs
FFO	7	Ontology queries, entity CRUD
Federal Compliance	3	NIST controls, compliance checks
Tool Hub	1	Aggregated tool discovery
Trailboss	1	Cluster provisioning status

Deployment

All components deploy to Kubernetes (namespace f3iai) via ArgoCD with GitOps:

GitOps repo: Gitea (ArgoCD source of truth)
Container registry: Harbor
ArgoCD apps: Auto-sync + self-heal + prune enabled
Workflow: Code change → build image → push to Harbor → update GitOps manifest → ArgoCD syncs

Next Steps

Autonomous SRE Agent — How InfrastructureAI prevents outages before they happen
Agent Architecture — TrailbossAI, Posses, and the mission lifecycle
Sovereign Inference — Classified AI from IL2 through tactical edge
Platform Architecture — Component layers and deployment topology
MCP Servers — The 12 tool servers and 150+ tools
FFO Ontology — The knowledge graph that ties it all together
Compass — The digital twin UI