FFP Installation Guide
Complete installation reference for the Federal Frontier Platform — from bare-metal site preparation through VitroAI, FMC, platform services, InfrastructureAI, workload clusters, and integration layer deployment.
FFP Installation Guide
This guide covers the complete Federal Frontier Platform deployment from bare-metal hardware through a fully operational autonomous SRE platform. The guide is organized into seven phases:
| Phase | What | Duration | Prerequisites |
|---|---|---|---|
| Phase 0 | Site Preparation | 1-2 weeks | Hardware procurement, rack and stack |
| Phase 1 | VitroAI (OpenStack HCI) | 1-2 days | Phase 0 complete |
| Phase 2 | Fleet Management Cluster | 4-8 hours | Phase 1 or cloud provider |
| Phase 3 | Platform Services | 4-8 hours | Phase 2 complete |
| Phase 4 | InfrastructureAI | 4-8 hours | Phase 3 complete |
| Phase 5 | Workload Clusters | 1-2 hours per cluster | Phase 4 complete |
| Phase 6 | Integration Layer | 2-4 hours | Phase 4 complete |
All container images are pulled from registry.eupraxialabs.com/ffp/<component>:<version> under your RTU license.
Phase 0: Site Preparation
Phase 0 prepares the physical infrastructure. This phase is unique to on-premise deployments — AWS-only deployments skip to Phase 2.
Hardware Requirements
| Component | Minimum (3-Node HCI) | Recommended (Production) |
|---|---|---|
| CPU | 8 cores per node | 16+ cores per node (Intel Xeon Gold or AMD EPYC) |
| RAM | 32 GB per node | 64 GB+ per node (ECC required for production) |
| OS Storage | 500 GB SSD | 2x SSD in RAID1 |
| Ceph OSD Storage | 1 TB NVMe per node | 2-6x NVMe per node (no RAID — Ceph manages redundancy) |
| Network | 1 Gbps management + 1 Gbps data | 2x 10 GbE bonded (LACP 802.3ad) |
| Nodes | 3 (HCI converged) | 3+ control + 3+ compute (disaggregated) |
Important: Do not RAID or format the NVMe drives intended for Ceph OSDs. Ceph manages its own redundancy and requires raw block devices.
BIOS and Firmware Configuration
Configure the following in each server’s BIOS before OS installation:
| Setting | Value | Purpose |
|---|---|---|
| CPU Virtualization | Enable VT-x (Intel) or AMD-V/SVM (AMD) | Required for KVM hypervisor |
| IOMMU | Enable VT-d (Intel) or AMD-Vi | Required for PCI passthrough and GPU virtualization |
| NUMA | Enable | Improves memory locality for VMs |
| CPU Power Management | Performance mode (disable C-states) | Prevents latency spikes from frequency scaling |
| TPM 2.0 | Enable | Required for FIPS 140-2 compliance |
| Secure Boot | Disable if using custom KVM stack | Re-enable after OS hardening if required |
| Boot Order | PXE first (automated) or local disk (manual) | Depends on provisioning method |
Operating System Installation
Install Ubuntu 22.04 LTS Server on each node:
# Recommended partition layout
/boot — 1 GB (ext4)
/boot/efi — 512 MB (EFI System Partition)
swap — 8 GB
/ — 100 GB (ext4, LVM)
/var/lib — remainder (ext4, LVM — Ceph, containers, logs)
FIPS 140-2 enablement (required for IL4+ deployments):
sudo apt install -y ubuntu-advantage-tools
sudo ua enable fips
sudo reboot
# Verify after reboot
cat /proc/sys/crypto/fips_enabled # Should output: 1
Kernel parameters for KVM and storage performance:
# /etc/sysctl.d/99-ffp.conf
vm.swappiness = 10
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
KVM Hypervisor Stack
Install KVM, QEMU, and libvirt on each compute node:
sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virtinst
sudo systemctl enable --now libvirtd
# Verify KVM modules
lsmod | grep kvm
# Should show: kvm_intel (or kvm_amd) and kvm
Network Infrastructure
FFP requires five network segments. Each segment serves a specific purpose and has distinct MTU and security requirements:
| Segment | Purpose | MTU | VLAN | Notes |
|---|---|---|---|---|
| Management | SSH, IPMI, control plane API | 1500 | Tagged | All nodes reachable |
| API / Internal | OpenStack service communication | 1500 | Tagged | Inter-service RPC |
| Tenant / Overlay | VM-to-VM traffic (Geneve tunnels) | 9000 | Tagged | Jumbo frames required |
| Storage | Ceph OSD replication, client I/O | 9000 | Tagged | Jumbo frames required |
| External / Provider | Floating IPs, internet access | 1500 | Native or Tagged | Gateway to upstream |
Switch configuration:
- Configure VLANs for each segment on all switch ports connecting to FFP nodes
- Enable LACP (802.3ad) for bonded interfaces
- Set MTU 9216 on switch ports carrying storage and overlay traffic (9216 accounts for encapsulation overhead)
- Enable spanning tree portfast on server-facing ports
Bond configuration (on each node):
# /etc/netplan/01-ffp.yaml
network:
version: 2
ethernets:
eno1: {}
eno2: {}
bonds:
bond0:
interfaces: [eno1, eno2]
parameters:
mode: 802.3ad
lacp-rate: fast
mii-monitor-interval: 100
vlans:
bond0.100:
id: 100
link: bond0
mtu: 1500
addresses: [<management-ip>/24]
bond0.200:
id: 200
link: bond0
mtu: 9000
addresses: [<storage-ip>/24]
DNS and NTP:
- Configure forward and reverse DNS zones for all node hostnames
- NTP: use
chronypointed at authorized time sources (NIST, DoD NTP, or local Stratum 1) - All nodes must agree on time — Ceph and Kubernetes are time-sensitive
Storage Preparation
Prepare NVMe drives for Ceph OSDs:
# Verify drive health
sudo smartctl -a /dev/nvme0n1
# Label drives for Ceph OSD bootstrap (Kolla-Ansible convention)
sudo parted /dev/nvme0n1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1MiB 100%
Do NOT:
- Format the OSD drives with a filesystem
- Create RAID arrays on OSD drives
- Partition OSD drives beyond the label
Ceph manages its own data layout, replication, and recovery. Raw block devices are required.
GPU Passthrough (Optional)
For AI inference or VDI workloads requiring GPU access:
# Enable IOMMU in GRUB
# /etc/default/grub
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt"
# (or amd_iommu=on for AMD systems)
sudo update-grub && sudo reboot
# Bind GPU to VFIO driver
echo "vfio-pci" | sudo tee /etc/modules-load.d/vfio.conf
echo "options vfio-pci ids=<vendor-id>:<device-id>" | sudo tee /etc/modprobe.d/vfio.conf
sudo update-initramfs -u && sudo reboot
Decision: PCIe passthrough gives a single VM full GPU access. vGPU (NVIDIA GRID) shares a GPU across multiple VMs but requires NVIDIA vGPU licensing. Choose based on workload density requirements.
Air-Gap Preparation (Disconnected Environments)
For IL5/IL6 deployments without internet access:
- Offline package mirror: Create a local apt mirror using
apt-mirrororaptlyon a connected staging system, then transfer via approved removable media - Container image mirror: Use the
ffp-mirror.shscript to pull all FFP images fromregistry.eupraxialabs.comand export as tarballs for transfer to the disconnected Harbor registry - PKI certificates: Generate or obtain TLS certificates from your organizational CA — do not use Let’s Encrypt in disconnected environments
- Transfer procedure: Follow your organization’s approved media transfer protocol for crossing security boundaries
Phase 1: VitroAI Installation
VitroAI is the OpenStack-based Hyper-Converged Infrastructure (HCI) layer. It provides compute, networking, and storage for the FMC and workload clusters.
See the VitroAI Deployment Guide for detailed procedures and the VitroAI Architecture for design decisions.
Deployment Summary
- Create Python virtual environment for Kolla-Ansible
- Configure inventory — list all nodes with roles (control, compute, storage)
- Configure globals.yml — network interfaces, VIP addresses, enabled services
- Generate passwords —
kolla-genpwdcreates all service passwords - Bootstrap servers —
kolla-ansible bootstrap-serversprepares nodes - Pull images —
kolla-ansible pulldownloads container images (or loads from air-gap mirror) - Pre-checks —
kolla-ansible prechecksvalidates configuration - Deploy —
kolla-ansible deploydeploys the full OpenStack control plane - Post-deploy —
kolla-ansible post-deploygenerates admin credentials
For air-gapped deployments, configure docker_registry in globals.yml to point to your local Harbor instance.
Validation Checklist
- Ceph cluster reports
HEALTH_OKwith all OSDs active - OpenStack services are active (
openstack service list) - Test VM launches with a floating IP and is reachable via SSH
- Cinder volume attaches to a test VM
- HAProxy VIP failover works (stop HAProxy on one node, verify API access continues)
- Skyline dashboard accessible
Phase 2: Fleet Management Cluster
The Fleet Management Cluster (FMC) runs the FFP control plane: ArgoCD, Keycloak, platform services, CAPI controllers, and the InfrastructureAI agent stack.
Platform Decision
| Deployment Model | Kubernetes | When to Use |
|---|---|---|
| VitroAI VMs | RKE2 | On-premise, full sovereignty, IL4+ |
| Bare Metal | RKE2 | Maximum performance, dedicated hardware |
| AWS | EKS | Cloud-native, IL2-IL4 |
See FMC Setup & Installation for detailed procedures.
RKE2 Installation (On-Premise)
# First control plane node
curl -sfL https://get.rke2.io | sh -
systemctl enable --now rke2-server
# Get join token
cat /var/lib/rancher/rke2/server/node-token
# Additional control plane and worker nodes
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
# Configure /etc/rancher/rke2/config.yaml with server URL and token
systemctl enable --now rke2-agent
Essential Configuration
- Create the
f3iainamespace — all FFP components deploy here - Configure persistent storage — Rook-Ceph CSI (on-premise) or EBS CSI (AWS). See Storage Architecture.
- Install Traefik (or your preferred ingress controller) for TLS termination
- Configure DNS — wildcard record for
*.your-domainpointing to the FMC ingress
Validation Checklist
- All nodes show
Readyinkubectl get nodes f3iainamespace exists- PVC provisioning works (create a test PVC, verify it binds)
- DNS resolves from within the cluster (
nslookupfrom a test pod)
Phase 3: Platform Services
Platform services provide identity, GitOps, container registry, and observability. All deployments use ArgoCD GitOps — never apply manifests directly with kubectl.
See the FMC Admin Guide for detailed service configuration.
Service Stack
| Service | Purpose | Guide |
|---|---|---|
| ArgoCD | GitOps deployment controller | FMC Admin Guide |
| Keycloak | Identity provider (OIDC, CAC/PIV) | Keycloak Auth Setup |
| Harbor | Container registry (mirrors from Eupraxia Labs registry) | FMC Admin Guide |
| Gitea | Git repository for GitOps manifests | FMC Admin Guide |
| Grafana + Prometheus | Monitoring and alerting | Monitoring & Dispatch |
| Loki | Log aggregation | Monitoring & Dispatch |
| CNI | Container networking (Calico or Canal) | CNI Guide |
| CSI | Persistent storage driver | CSI Guide |
| Ingress | TLS termination and routing | Ingress Guide |
Deployment Order
- ArgoCD (deploys everything else)
- Keycloak (identity — needed by all services)
- Harbor (container registry — needed for image pulls)
- Gitea (GitOps source repository)
- Observability stack (Grafana, Prometheus, Loki, AlertManager)
- Add-ons (CNI, CSI, Ingress, Load Balancer)
Validation Checklist
- ArgoCD UI accessible, syncing applications
- Keycloak login works (admin console + OIDC test)
- Harbor push/pull succeeds with FFP images
- Grafana dashboards render with live data
- AlertManager routes fire to the dispatch controller
Phase 4: InfrastructureAI + Frontier SRE Agent
The InfrastructureAI stack provides the autonomous SRE capability: a living digital twin (FFO), LLM inference, agent orchestration, MCP tool fleet, and operator interfaces.
See Platform Overview and Architecture for the full design.
Component Stack
| Component | Purpose | Guide |
|---|---|---|
| FFO (TypeDB) | Living digital twin — knowledge graph | FFO Overview, Schema |
| LLM Inference | AI reasoning engine | LLM Inference |
| TrailbossAI | Agent orchestrator | Agent Architecture |
| MCP Server Fleet | Infrastructure tool surface | MCP Servers |
| Wanaku | MCP router with OIDC auth | SoR Integration |
| OutpostAI | Operator dispatch console | OutpostAI |
| Compass | Integration design UI | Compass |
| Frontier CLI | Terminal interface | Frontier CLI |
LLM Inference Decision
| Classification | Backend | Model Requirements |
|---|---|---|
| IL2 – IL4 | AWS Bedrock via VPC PrivateLink | Claude Sonnet/Opus (sovereign, no public internet) |
| IL5 | AWS Bedrock GovCloud | FedRAMP High certified |
| IL6 Air-Gapped | vLLM on local GPU nodes | US-origin open-weight models only (Llama 3.x 70B+) |
| Development | Ollama on Apple Silicon | Any model for local testing |
Important: Chinese-origin models (Qwen, DeepSeek) are prohibited in all federal deployment contexts regardless of classification level. Minimum 30B parameters recommended for reliable MCP tool-call execution.
See Sovereign Inference for the full architecture at each classification level.
Container Images
All images are pulled from the Eupraxia Labs registry:
# Pull FFP component images
docker pull registry.eupraxialabs.com/ffp/ffo-mcp-server:<version>
docker pull registry.eupraxialabs.com/ffp/trailboss:<version>
docker pull registry.eupraxialabs.com/ffp/sage:<version>
docker pull registry.eupraxialabs.com/ffp/outpostai-dev:<version>
docker pull registry.eupraxialabs.com/ffp/ffo-compass-api:<version>
docker pull registry.eupraxialabs.com/ffp/claude-code-runner:<version>
# ... (all MCP servers, TypeDB, support services)
For air-gapped deployments, mirror these images to your local Harbor instance first.
Validation Checklist
- TypeDB is running and the FFO database is accessible
- LLM inference responds (test via OutpostAI chat or Frontier CLI
frontier chat) - OutpostAI dashboard loads with cluster data
- Dispatch webhook receives test alerts and creates Jobs
- Claude Runner Job completes with tool calls and Bedrock inference
- MCP authentication enforced (Wanaku rejects unauthenticated tool calls)
- Compass instance graph renders FFO entities
Phase 5: Workload Clusters
Workload clusters are the Kubernetes clusters where customer applications run. FFP provisions them via Cluster API (CAPI).
See Modern Provisioning for the CAPI architecture.
Supported Providers
| Provider | CAPI Provider | Distribution | Guide |
|---|---|---|---|
| VitroAI (OpenStack) | CAPO | RKE2 | CAPI Providers |
| AWS | CAPA | EKS | CAPI Providers |
Provisioning Methods
Workload clusters can be provisioned through:
- Frontier CLI —
frontier create cluster capoorfrontier create cluster eks. See Cluster Management. - OutpostAI UI — cluster creation wizard with template selection. See OutpostAI Guide.
Both methods use the Cluster Template System — JSON Schema-driven templates that render CAPI manifests, push to Gitea, and ArgoCD syncs them to the FMC.
Pre-Baked Node Images
For on-premise CAPO clusters, build node images with Packer that include RKE2, CNI images, and system packages pre-cached. See Packer Image Build.
Validation Checklist
- CAPI controllers healthy on FMC (
kubectl get providers) - Test cluster provisions successfully (control plane + workers join)
- Nodes show
Readyand workloads schedule - Persistent storage works (PVC binds, pod mounts volume)
- Compliance Operator running (if required for your classification)
Phase 6: Integration Layer
The integration layer connects customer Systems of Record (ServiceNow, vCenter, Splunk, CMDBs) to the FFO digital twin.
See SoR Integration Architecture for the full design.
Components
| Component | Purpose | Installation |
|---|---|---|
| Camel-K | Integration runtime | Helm chart from Eupraxia Labs registry |
| Wanaku | MCP router with OIDC and classification-aware routing | Deploy per classification level |
| Kaoto | Visual integration designer | VS Code Extension (Extension Pack for Apache Camel by Red Hat) |
Integration Patterns
| Pattern | Trigger | Use Case |
|---|---|---|
| Scout (Polling) | Timer/Cron | vCenter inventory sync (every 15 min) |
| Event-Driven | Webhook | ServiceNow incident notifications |
| On-Demand | MCP tool call | Splunk log search during investigation |
Validation Checklist
- Camel-K operator running in
f3iainamespace - Wanaku MCP tool list returns configured integrations
- Agent can invoke an integration tool and receive results
- FFO entities update after a Scout sync cycle
Post-Installation
Getting Started
- Operators: Getting Started Guide for your first dispatch
- CLI users: Frontier CLI Installation —
brew install frontier-cli - Security review: Security in FKP for compliance posture
Day-2 Operations
- Frontier CLI — Command Reference for cluster management and AI chat
- FMC Admin Guide — Administration for platform maintenance
- FFO Queries — TypeQL Examples for querying the knowledge graph
- Add-ons — Add-ons Guide for CNI, CSI, ingress, and monitoring
Upgrades
FFP upgrades follow the GitOps pattern:
- Eupraxia Labs publishes new image tags to
registry.eupraxialabs.com - For air-gapped deployments, mirror the new images to your local Harbor first
- Update the image tags in your GitOps repository
- ArgoCD detects the change and reconciles — rolling updates with zero downtime
Support
| Channel | Contact |
|---|---|
| Technical Support | support@eupraxialabs.com |
| Sales | sales@eupraxialabs.com |
| Documentation | readthedocs.eupraxia.io |