FFP Installation Guide

Complete installation reference for the Federal Frontier Platform — from bare-metal site preparation through VitroAI, FMC, platform services, InfrastructureAI, workload clusters, and integration layer deployment.

FFP Installation Guide

This guide covers the complete Federal Frontier Platform deployment from bare-metal hardware through a fully operational autonomous SRE platform. The guide is organized into seven phases:

Phase What Duration Prerequisites
Phase 0 Site Preparation 1-2 weeks Hardware procurement, rack and stack
Phase 1 VitroAI (OpenStack HCI) 1-2 days Phase 0 complete
Phase 2 Fleet Management Cluster 4-8 hours Phase 1 or cloud provider
Phase 3 Platform Services 4-8 hours Phase 2 complete
Phase 4 InfrastructureAI 4-8 hours Phase 3 complete
Phase 5 Workload Clusters 1-2 hours per cluster Phase 4 complete
Phase 6 Integration Layer 2-4 hours Phase 4 complete

All container images are pulled from registry.eupraxialabs.com/ffp/<component>:<version> under your RTU license.


Phase 0: Site Preparation

Phase 0 prepares the physical infrastructure. This phase is unique to on-premise deployments — AWS-only deployments skip to Phase 2.

Hardware Requirements

Component Minimum (3-Node HCI) Recommended (Production)
CPU 8 cores per node 16+ cores per node (Intel Xeon Gold or AMD EPYC)
RAM 32 GB per node 64 GB+ per node (ECC required for production)
OS Storage 500 GB SSD 2x SSD in RAID1
Ceph OSD Storage 1 TB NVMe per node 2-6x NVMe per node (no RAID — Ceph manages redundancy)
Network 1 Gbps management + 1 Gbps data 2x 10 GbE bonded (LACP 802.3ad)
Nodes 3 (HCI converged) 3+ control + 3+ compute (disaggregated)

Important: Do not RAID or format the NVMe drives intended for Ceph OSDs. Ceph manages its own redundancy and requires raw block devices.

BIOS and Firmware Configuration

Configure the following in each server’s BIOS before OS installation:

Setting Value Purpose
CPU Virtualization Enable VT-x (Intel) or AMD-V/SVM (AMD) Required for KVM hypervisor
IOMMU Enable VT-d (Intel) or AMD-Vi Required for PCI passthrough and GPU virtualization
NUMA Enable Improves memory locality for VMs
CPU Power Management Performance mode (disable C-states) Prevents latency spikes from frequency scaling
TPM 2.0 Enable Required for FIPS 140-2 compliance
Secure Boot Disable if using custom KVM stack Re-enable after OS hardening if required
Boot Order PXE first (automated) or local disk (manual) Depends on provisioning method

Operating System Installation

Install Ubuntu 22.04 LTS Server on each node:

# Recommended partition layout
/boot     — 1 GB   (ext4)
/boot/efi — 512 MB (EFI System Partition)
swap      — 8 GB
/         — 100 GB (ext4, LVM)
/var/lib  — remainder (ext4, LVM — Ceph, containers, logs)

FIPS 140-2 enablement (required for IL4+ deployments):

sudo apt install -y ubuntu-advantage-tools
sudo ua enable fips
sudo reboot
# Verify after reboot
cat /proc/sys/crypto/fips_enabled  # Should output: 1

Kernel parameters for KVM and storage performance:

# /etc/sysctl.d/99-ffp.conf
vm.swappiness = 10
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

KVM Hypervisor Stack

Install KVM, QEMU, and libvirt on each compute node:

sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virtinst
sudo systemctl enable --now libvirtd

# Verify KVM modules
lsmod | grep kvm
# Should show: kvm_intel (or kvm_amd) and kvm

Network Infrastructure

FFP requires five network segments. Each segment serves a specific purpose and has distinct MTU and security requirements:

Segment Purpose MTU VLAN Notes
Management SSH, IPMI, control plane API 1500 Tagged All nodes reachable
API / Internal OpenStack service communication 1500 Tagged Inter-service RPC
Tenant / Overlay VM-to-VM traffic (Geneve tunnels) 9000 Tagged Jumbo frames required
Storage Ceph OSD replication, client I/O 9000 Tagged Jumbo frames required
External / Provider Floating IPs, internet access 1500 Native or Tagged Gateway to upstream

Switch configuration:

  • Configure VLANs for each segment on all switch ports connecting to FFP nodes
  • Enable LACP (802.3ad) for bonded interfaces
  • Set MTU 9216 on switch ports carrying storage and overlay traffic (9216 accounts for encapsulation overhead)
  • Enable spanning tree portfast on server-facing ports

Bond configuration (on each node):

# /etc/netplan/01-ffp.yaml
network:
  version: 2
  ethernets:
    eno1: {}
    eno2: {}
  bonds:
    bond0:
      interfaces: [eno1, eno2]
      parameters:
        mode: 802.3ad
        lacp-rate: fast
        mii-monitor-interval: 100
  vlans:
    bond0.100:
      id: 100
      link: bond0
      mtu: 1500
      addresses: [<management-ip>/24]
    bond0.200:
      id: 200
      link: bond0
      mtu: 9000
      addresses: [<storage-ip>/24]

DNS and NTP:

  • Configure forward and reverse DNS zones for all node hostnames
  • NTP: use chrony pointed at authorized time sources (NIST, DoD NTP, or local Stratum 1)
  • All nodes must agree on time — Ceph and Kubernetes are time-sensitive

Storage Preparation

Prepare NVMe drives for Ceph OSDs:

# Verify drive health
sudo smartctl -a /dev/nvme0n1

# Label drives for Ceph OSD bootstrap (Kolla-Ansible convention)
sudo parted /dev/nvme0n1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1MiB 100%

Do NOT:

  • Format the OSD drives with a filesystem
  • Create RAID arrays on OSD drives
  • Partition OSD drives beyond the label

Ceph manages its own data layout, replication, and recovery. Raw block devices are required.

GPU Passthrough (Optional)

For AI inference or VDI workloads requiring GPU access:

# Enable IOMMU in GRUB
# /etc/default/grub
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt"
# (or amd_iommu=on for AMD systems)
sudo update-grub && sudo reboot

# Bind GPU to VFIO driver
echo "vfio-pci" | sudo tee /etc/modules-load.d/vfio.conf
echo "options vfio-pci ids=<vendor-id>:<device-id>" | sudo tee /etc/modprobe.d/vfio.conf
sudo update-initramfs -u && sudo reboot

Decision: PCIe passthrough gives a single VM full GPU access. vGPU (NVIDIA GRID) shares a GPU across multiple VMs but requires NVIDIA vGPU licensing. Choose based on workload density requirements.

Air-Gap Preparation (Disconnected Environments)

For IL5/IL6 deployments without internet access:

  1. Offline package mirror: Create a local apt mirror using apt-mirror or aptly on a connected staging system, then transfer via approved removable media
  2. Container image mirror: Use the ffp-mirror.sh script to pull all FFP images from registry.eupraxialabs.com and export as tarballs for transfer to the disconnected Harbor registry
  3. PKI certificates: Generate or obtain TLS certificates from your organizational CA — do not use Let’s Encrypt in disconnected environments
  4. Transfer procedure: Follow your organization’s approved media transfer protocol for crossing security boundaries

Phase 1: VitroAI Installation

VitroAI is the OpenStack-based Hyper-Converged Infrastructure (HCI) layer. It provides compute, networking, and storage for the FMC and workload clusters.

See the VitroAI Deployment Guide for detailed procedures and the VitroAI Architecture for design decisions.

Deployment Summary

  1. Create Python virtual environment for Kolla-Ansible
  2. Configure inventory — list all nodes with roles (control, compute, storage)
  3. Configure globals.yml — network interfaces, VIP addresses, enabled services
  4. Generate passwordskolla-genpwd creates all service passwords
  5. Bootstrap serverskolla-ansible bootstrap-servers prepares nodes
  6. Pull imageskolla-ansible pull downloads container images (or loads from air-gap mirror)
  7. Pre-checkskolla-ansible prechecks validates configuration
  8. Deploykolla-ansible deploy deploys the full OpenStack control plane
  9. Post-deploykolla-ansible post-deploy generates admin credentials

For air-gapped deployments, configure docker_registry in globals.yml to point to your local Harbor instance.

Validation Checklist

  • Ceph cluster reports HEALTH_OK with all OSDs active
  • OpenStack services are active (openstack service list)
  • Test VM launches with a floating IP and is reachable via SSH
  • Cinder volume attaches to a test VM
  • HAProxy VIP failover works (stop HAProxy on one node, verify API access continues)
  • Skyline dashboard accessible

Phase 2: Fleet Management Cluster

The Fleet Management Cluster (FMC) runs the FFP control plane: ArgoCD, Keycloak, platform services, CAPI controllers, and the InfrastructureAI agent stack.

Platform Decision

Deployment Model Kubernetes When to Use
VitroAI VMs RKE2 On-premise, full sovereignty, IL4+
Bare Metal RKE2 Maximum performance, dedicated hardware
AWS EKS Cloud-native, IL2-IL4

See FMC Setup & Installation for detailed procedures.

RKE2 Installation (On-Premise)

# First control plane node
curl -sfL https://get.rke2.io | sh -
systemctl enable --now rke2-server

# Get join token
cat /var/lib/rancher/rke2/server/node-token

# Additional control plane and worker nodes
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
# Configure /etc/rancher/rke2/config.yaml with server URL and token
systemctl enable --now rke2-agent

Essential Configuration

  1. Create the f3iai namespace — all FFP components deploy here
  2. Configure persistent storage — Rook-Ceph CSI (on-premise) or EBS CSI (AWS). See Storage Architecture.
  3. Install Traefik (or your preferred ingress controller) for TLS termination
  4. Configure DNS — wildcard record for *.your-domain pointing to the FMC ingress

Validation Checklist

  • All nodes show Ready in kubectl get nodes
  • f3iai namespace exists
  • PVC provisioning works (create a test PVC, verify it binds)
  • DNS resolves from within the cluster (nslookup from a test pod)

Phase 3: Platform Services

Platform services provide identity, GitOps, container registry, and observability. All deployments use ArgoCD GitOps — never apply manifests directly with kubectl.

See the FMC Admin Guide for detailed service configuration.

Service Stack

Service Purpose Guide
ArgoCD GitOps deployment controller FMC Admin Guide
Keycloak Identity provider (OIDC, CAC/PIV) Keycloak Auth Setup
Harbor Container registry (mirrors from Eupraxia Labs registry) FMC Admin Guide
Gitea Git repository for GitOps manifests FMC Admin Guide
Grafana + Prometheus Monitoring and alerting Monitoring & Dispatch
Loki Log aggregation Monitoring & Dispatch
CNI Container networking (Calico or Canal) CNI Guide
CSI Persistent storage driver CSI Guide
Ingress TLS termination and routing Ingress Guide

Deployment Order

  1. ArgoCD (deploys everything else)
  2. Keycloak (identity — needed by all services)
  3. Harbor (container registry — needed for image pulls)
  4. Gitea (GitOps source repository)
  5. Observability stack (Grafana, Prometheus, Loki, AlertManager)
  6. Add-ons (CNI, CSI, Ingress, Load Balancer)

Validation Checklist

  • ArgoCD UI accessible, syncing applications
  • Keycloak login works (admin console + OIDC test)
  • Harbor push/pull succeeds with FFP images
  • Grafana dashboards render with live data
  • AlertManager routes fire to the dispatch controller

Phase 4: InfrastructureAI + Frontier SRE Agent

The InfrastructureAI stack provides the autonomous SRE capability: a living digital twin (FFO), LLM inference, agent orchestration, MCP tool fleet, and operator interfaces.

See Platform Overview and Architecture for the full design.

Component Stack

Component Purpose Guide
FFO (TypeDB) Living digital twin — knowledge graph FFO Overview, Schema
LLM Inference AI reasoning engine LLM Inference
TrailbossAI Agent orchestrator Agent Architecture
MCP Server Fleet Infrastructure tool surface MCP Servers
Wanaku MCP router with OIDC auth SoR Integration
OutpostAI Operator dispatch console OutpostAI
Compass Integration design UI Compass
Frontier CLI Terminal interface Frontier CLI

LLM Inference Decision

Classification Backend Model Requirements
IL2 – IL4 AWS Bedrock via VPC PrivateLink Claude Sonnet/Opus (sovereign, no public internet)
IL5 AWS Bedrock GovCloud FedRAMP High certified
IL6 Air-Gapped vLLM on local GPU nodes US-origin open-weight models only (Llama 3.x 70B+)
Development Ollama on Apple Silicon Any model for local testing

Important: Chinese-origin models (Qwen, DeepSeek) are prohibited in all federal deployment contexts regardless of classification level. Minimum 30B parameters recommended for reliable MCP tool-call execution.

See Sovereign Inference for the full architecture at each classification level.

Container Images

All images are pulled from the Eupraxia Labs registry:

# Pull FFP component images
docker pull registry.eupraxialabs.com/ffp/ffo-mcp-server:<version>
docker pull registry.eupraxialabs.com/ffp/trailboss:<version>
docker pull registry.eupraxialabs.com/ffp/sage:<version>
docker pull registry.eupraxialabs.com/ffp/outpostai-dev:<version>
docker pull registry.eupraxialabs.com/ffp/ffo-compass-api:<version>
docker pull registry.eupraxialabs.com/ffp/claude-code-runner:<version>
# ... (all MCP servers, TypeDB, support services)

For air-gapped deployments, mirror these images to your local Harbor instance first.

Validation Checklist

  • TypeDB is running and the FFO database is accessible
  • LLM inference responds (test via OutpostAI chat or Frontier CLI frontier chat)
  • OutpostAI dashboard loads with cluster data
  • Dispatch webhook receives test alerts and creates Jobs
  • Claude Runner Job completes with tool calls and Bedrock inference
  • MCP authentication enforced (Wanaku rejects unauthenticated tool calls)
  • Compass instance graph renders FFO entities

Phase 5: Workload Clusters

Workload clusters are the Kubernetes clusters where customer applications run. FFP provisions them via Cluster API (CAPI).

See Modern Provisioning for the CAPI architecture.

Supported Providers

Provider CAPI Provider Distribution Guide
VitroAI (OpenStack) CAPO RKE2 CAPI Providers
AWS CAPA EKS CAPI Providers

Provisioning Methods

Workload clusters can be provisioned through:

  • Frontier CLIfrontier create cluster capo or frontier create cluster eks. See Cluster Management.
  • OutpostAI UI — cluster creation wizard with template selection. See OutpostAI Guide.

Both methods use the Cluster Template System — JSON Schema-driven templates that render CAPI manifests, push to Gitea, and ArgoCD syncs them to the FMC.

Pre-Baked Node Images

For on-premise CAPO clusters, build node images with Packer that include RKE2, CNI images, and system packages pre-cached. See Packer Image Build.

Validation Checklist

  • CAPI controllers healthy on FMC (kubectl get providers)
  • Test cluster provisions successfully (control plane + workers join)
  • Nodes show Ready and workloads schedule
  • Persistent storage works (PVC binds, pod mounts volume)
  • Compliance Operator running (if required for your classification)

Phase 6: Integration Layer

The integration layer connects customer Systems of Record (ServiceNow, vCenter, Splunk, CMDBs) to the FFO digital twin.

See SoR Integration Architecture for the full design.

Components

Component Purpose Installation
Camel-K Integration runtime Helm chart from Eupraxia Labs registry
Wanaku MCP router with OIDC and classification-aware routing Deploy per classification level
Kaoto Visual integration designer VS Code Extension (Extension Pack for Apache Camel by Red Hat)

Integration Patterns

Pattern Trigger Use Case
Scout (Polling) Timer/Cron vCenter inventory sync (every 15 min)
Event-Driven Webhook ServiceNow incident notifications
On-Demand MCP tool call Splunk log search during investigation

Validation Checklist

  • Camel-K operator running in f3iai namespace
  • Wanaku MCP tool list returns configured integrations
  • Agent can invoke an integration tool and receive results
  • FFO entities update after a Scout sync cycle

Post-Installation

Getting Started

Day-2 Operations

Upgrades

FFP upgrades follow the GitOps pattern:

  1. Eupraxia Labs publishes new image tags to registry.eupraxialabs.com
  2. For air-gapped deployments, mirror the new images to your local Harbor first
  3. Update the image tags in your GitOps repository
  4. ArgoCD detects the change and reconciles — rolling updates with zero downtime

Support

Channel Contact
Technical Support support@eupraxialabs.com
Sales sales@eupraxialabs.com
Documentation readthedocs.eupraxia.io