FFP Installation Guide

Complete installation reference for the Federal Frontier Platform — from bare-metal site preparation through VitroAI, FMC, platform services, InfrastructureAI, workload clusters, and integration layer deployment.

FFP Installation Guide

This guide covers the complete Federal Frontier Platform deployment from bare-metal hardware through a fully operational autonomous SRE platform. The guide is organized into seven phases:

Phase	What	Duration	Prerequisites
Phase 0	Site Preparation	1-2 weeks	Hardware procurement, rack and stack
Phase 1	VitroAI (OpenStack HCI)	1-2 days	Phase 0 complete
Phase 2	Fleet Management Cluster	4-8 hours	Phase 1 or cloud provider
Phase 3	Platform Services	4-8 hours	Phase 2 complete
Phase 4	InfrastructureAI	4-8 hours	Phase 3 complete
Phase 5	Workload Clusters	1-2 hours per cluster	Phase 4 complete
Phase 6	Integration Layer	2-4 hours	Phase 4 complete

All container images are pulled from registry.eupraxialabs.com/ffp/<component>:<version> under your RTU license.

Phase 0: Site Preparation

Phase 0 prepares the physical infrastructure. This phase is unique to on-premise deployments — AWS-only deployments skip to Phase 2.

Hardware Requirements

Component	Minimum (3-Node HCI)	Recommended (Production)
CPU	8 cores per node	16+ cores per node (Intel Xeon Gold or AMD EPYC)
RAM	32 GB per node	64 GB+ per node (ECC required for production)
OS Storage	500 GB SSD	2x SSD in RAID1
Ceph OSD Storage	1 TB NVMe per node	2-6x NVMe per node (no RAID — Ceph manages redundancy)
Network	1 Gbps management + 1 Gbps data	2x 10 GbE bonded (LACP 802.3ad)
Nodes	3 (HCI converged)	3+ control + 3+ compute (disaggregated)

Important: Do not RAID or format the NVMe drives intended for Ceph OSDs. Ceph manages its own redundancy and requires raw block devices.

BIOS and Firmware Configuration

Configure the following in each server’s BIOS before OS installation:

Setting	Value	Purpose
CPU Virtualization	Enable VT-x (Intel) or AMD-V/SVM (AMD)	Required for KVM hypervisor
IOMMU	Enable VT-d (Intel) or AMD-Vi	Required for PCI passthrough and GPU virtualization
NUMA	Enable	Improves memory locality for VMs
CPU Power Management	Performance mode (disable C-states)	Prevents latency spikes from frequency scaling
TPM 2.0	Enable	Required for FIPS 140-2 compliance
Secure Boot	Disable if using custom KVM stack	Re-enable after OS hardening if required
Boot Order	PXE first (automated) or local disk (manual)	Depends on provisioning method

Operating System Installation

Install Ubuntu 22.04 LTS Server on each node:

# Recommended partition layout
/boot     — 1 GB   (ext4)
/boot/efi — 512 MB (EFI System Partition)
swap      — 8 GB
/         — 100 GB (ext4, LVM)
/var/lib  — remainder (ext4, LVM — Ceph, containers, logs)

FIPS 140-2 enablement (required for IL4+ deployments):

sudo apt install -y ubuntu-advantage-tools
sudo ua enable fips
sudo reboot
# Verify after reboot
cat /proc/sys/crypto/fips_enabled  # Should output: 1

Kernel parameters for KVM and storage performance:

# /etc/sysctl.d/99-ffp.conf
vm.swappiness = 10
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

KVM Hypervisor Stack

Install KVM, QEMU, and libvirt on each compute node:

sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virtinst
sudo systemctl enable --now libvirtd

# Verify KVM modules
lsmod | grep kvm
# Should show: kvm_intel (or kvm_amd) and kvm

Network Infrastructure

FFP requires five network segments. Each segment serves a specific purpose and has distinct MTU and security requirements:

Segment	Purpose	MTU	VLAN	Notes
Management	SSH, IPMI, control plane API	1500	Tagged	All nodes reachable
API / Internal	OpenStack service communication	1500	Tagged	Inter-service RPC
Tenant / Overlay	VM-to-VM traffic (Geneve tunnels)	9000	Tagged	Jumbo frames required
Storage	Ceph OSD replication, client I/O	9000	Tagged	Jumbo frames required
External / Provider	Floating IPs, internet access	1500	Native or Tagged	Gateway to upstream

Switch configuration:

Configure VLANs for each segment on all switch ports connecting to FFP nodes
Enable LACP (802.3ad) for bonded interfaces
Set MTU 9216 on switch ports carrying storage and overlay traffic (9216 accounts for encapsulation overhead)
Enable spanning tree portfast on server-facing ports

Bond configuration (on each node):

# /etc/netplan/01-ffp.yaml
network:
  version: 2
  ethernets:
    eno1: {}
    eno2: {}
  bonds:
    bond0:
      interfaces: [eno1, eno2]
      parameters:
        mode: 802.3ad
        lacp-rate: fast
        mii-monitor-interval: 100
  vlans:
    bond0.100:
      id: 100
      link: bond0
      mtu: 1500
      addresses: [<management-ip>/24]
    bond0.200:
      id: 200
      link: bond0
      mtu: 9000
      addresses: [<storage-ip>/24]

DNS and NTP:

Configure forward and reverse DNS zones for all node hostnames
NTP: use chrony pointed at authorized time sources (NIST, DoD NTP, or local Stratum 1)
All nodes must agree on time — Ceph and Kubernetes are time-sensitive

Storage Preparation

Prepare NVMe drives for Ceph OSDs:

# Verify drive health
sudo smartctl -a /dev/nvme0n1

# Label drives for Ceph OSD bootstrap (Kolla-Ansible convention)
sudo parted /dev/nvme0n1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1MiB 100%

Do NOT:

Format the OSD drives with a filesystem
Create RAID arrays on OSD drives
Partition OSD drives beyond the label

Ceph manages its own data layout, replication, and recovery. Raw block devices are required.

GPU Passthrough (Optional)

For AI inference or VDI workloads requiring GPU access:

# Enable IOMMU in GRUB
# /etc/default/grub
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt"
# (or amd_iommu=on for AMD systems)
sudo update-grub && sudo reboot

# Bind GPU to VFIO driver
echo "vfio-pci" | sudo tee /etc/modules-load.d/vfio.conf
echo "options vfio-pci ids=<vendor-id>:<device-id>" | sudo tee /etc/modprobe.d/vfio.conf
sudo update-initramfs -u && sudo reboot

Decision: PCIe passthrough gives a single VM full GPU access. vGPU (NVIDIA GRID) shares a GPU across multiple VMs but requires NVIDIA vGPU licensing. Choose based on workload density requirements.

Air-Gap Preparation (Disconnected Environments)

For IL5/IL6 deployments without internet access:

Offline package mirror: Create a local apt mirror using apt-mirror or aptly on a connected staging system, then transfer via approved removable media
Container image mirror: Use the ffp-mirror.sh script to pull all FFP images from registry.eupraxialabs.com and export as tarballs for transfer to the disconnected Harbor registry
PKI certificates: Generate or obtain TLS certificates from your organizational CA — do not use Let’s Encrypt in disconnected environments
Transfer procedure: Follow your organization’s approved media transfer protocol for crossing security boundaries

Phase 1: VitroAI Installation

VitroAI is the OpenStack-based Hyper-Converged Infrastructure (HCI) layer. It provides compute, networking, and storage for the FMC and workload clusters.

See the VitroAI Deployment Guide for detailed procedures and the VitroAI Architecture for design decisions.

Deployment Summary

Create Python virtual environment for Kolla-Ansible
Configure inventory — list all nodes with roles (control, compute, storage)
Configure globals.yml — network interfaces, VIP addresses, enabled services
Generate passwords — kolla-genpwd creates all service passwords
Bootstrap servers — kolla-ansible bootstrap-servers prepares nodes
Pull images — kolla-ansible pull downloads container images (or loads from air-gap mirror)
Pre-checks — kolla-ansible prechecks validates configuration
Deploy — kolla-ansible deploy deploys the full OpenStack control plane
Post-deploy — kolla-ansible post-deploy generates admin credentials

For air-gapped deployments, configure docker_registry in globals.yml to point to your local Harbor instance.

Validation Checklist

Ceph cluster reports HEALTH_OK with all OSDs active
OpenStack services are active (openstack service list)
Test VM launches with a floating IP and is reachable via SSH
Cinder volume attaches to a test VM
HAProxy VIP failover works (stop HAProxy on one node, verify API access continues)
Skyline dashboard accessible

Phase 2: Fleet Management Cluster

The Fleet Management Cluster (FMC) runs the FFP control plane: ArgoCD, Keycloak, platform services, CAPI controllers, and the InfrastructureAI agent stack.

Platform Decision

Deployment Model	Kubernetes	When to Use
VitroAI VMs	RKE2	On-premise, full sovereignty, IL4+
Bare Metal	RKE2	Maximum performance, dedicated hardware
AWS	EKS	Cloud-native, IL2-IL4

See FMC Setup & Installation for detailed procedures.

RKE2 Installation (On-Premise)

# First control plane node
curl -sfL https://get.rke2.io | sh -
systemctl enable --now rke2-server

# Get join token
cat /var/lib/rancher/rke2/server/node-token

# Additional control plane and worker nodes
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
# Configure /etc/rancher/rke2/config.yaml with server URL and token
systemctl enable --now rke2-agent

Essential Configuration

Create the f3iai namespace — all FFP components deploy here
Configure persistent storage — Rook-Ceph CSI (on-premise) or EBS CSI (AWS). See Storage Architecture.
Install Traefik (or your preferred ingress controller) for TLS termination
Configure DNS — wildcard record for *.your-domain pointing to the FMC ingress

Validation Checklist

All nodes show Ready in kubectl get nodes
f3iai namespace exists
PVC provisioning works (create a test PVC, verify it binds)
DNS resolves from within the cluster (nslookup from a test pod)

Phase 3: Platform Services

Platform services provide identity, GitOps, container registry, and observability. All deployments use ArgoCD GitOps — never apply manifests directly with kubectl.

See the FMC Admin Guide for detailed service configuration.

Service Stack

Service	Purpose	Guide
ArgoCD	GitOps deployment controller	FMC Admin Guide
Keycloak	Identity provider (OIDC, CAC/PIV)	Keycloak Auth Setup
Harbor	Container registry (mirrors from Eupraxia Labs registry)	FMC Admin Guide
Gitea	Git repository for GitOps manifests	FMC Admin Guide
Grafana + Prometheus	Monitoring and alerting	Monitoring & Dispatch
Loki	Log aggregation	Monitoring & Dispatch
CNI	Container networking (Calico or Canal)	CNI Guide
CSI	Persistent storage driver	CSI Guide
Ingress	TLS termination and routing	Ingress Guide

Deployment Order

ArgoCD (deploys everything else)
Keycloak (identity — needed by all services)
Harbor (container registry — needed for image pulls)
Gitea (GitOps source repository)
Observability stack (Grafana, Prometheus, Loki, AlertManager)
Add-ons (CNI, CSI, Ingress, Load Balancer)

Validation Checklist

ArgoCD UI accessible, syncing applications
Keycloak login works (admin console + OIDC test)
Harbor push/pull succeeds with FFP images
Grafana dashboards render with live data
AlertManager routes fire to the dispatch controller

Phase 4: InfrastructureAI + Frontier SRE Agent

The InfrastructureAI stack provides the autonomous SRE capability: a living digital twin (FFO), LLM inference, agent orchestration, MCP tool fleet, and operator interfaces.

See Platform Overview and Architecture for the full design.

Component Stack

Component	Purpose	Guide
FFO (TypeDB)	Living digital twin — knowledge graph	FFO Overview, Schema
LLM Inference	AI reasoning engine	LLM Inference
TrailbossAI	Agent orchestrator	Agent Architecture
MCP Server Fleet	Infrastructure tool surface	MCP Servers
Wanaku	MCP router with OIDC auth	SoR Integration
OutpostAI	Operator dispatch console	OutpostAI
Compass	Integration design UI	Compass
Frontier CLI	Terminal interface	Frontier CLI

LLM Inference Decision

Classification	Backend	Model Requirements
IL2 – IL4	AWS Bedrock via VPC PrivateLink	Claude Sonnet/Opus (sovereign, no public internet)
IL5	AWS Bedrock GovCloud	FedRAMP High certified
IL6 Air-Gapped	vLLM on local GPU nodes	US-origin open-weight models only (Llama 3.x 70B+)
Development	Ollama on Apple Silicon	Any model for local testing

Important: Chinese-origin models (Qwen, DeepSeek) are prohibited in all federal deployment contexts regardless of classification level. Minimum 30B parameters recommended for reliable MCP tool-call execution.

See Sovereign Inference for the full architecture at each classification level.

Container Images

All images are pulled from the Eupraxia Labs registry:

# Pull FFP component images
docker pull registry.eupraxialabs.com/ffp/ffo-mcp-server:<version>
docker pull registry.eupraxialabs.com/ffp/trailboss:<version>
docker pull registry.eupraxialabs.com/ffp/sage:<version>
docker pull registry.eupraxialabs.com/ffp/outpostai-dev:<version>
docker pull registry.eupraxialabs.com/ffp/ffo-compass-api:<version>
docker pull registry.eupraxialabs.com/ffp/claude-code-runner:<version>
# ... (all MCP servers, TypeDB, support services)

For air-gapped deployments, mirror these images to your local Harbor instance first.

Validation Checklist

TypeDB is running and the FFO database is accessible
LLM inference responds (test via OutpostAI chat or Frontier CLI frontier chat)
OutpostAI dashboard loads with cluster data
Dispatch webhook receives test alerts and creates Jobs
Claude Runner Job completes with tool calls and Bedrock inference
MCP authentication enforced (Wanaku rejects unauthenticated tool calls)
Compass instance graph renders FFO entities

Phase 5: Workload Clusters

Workload clusters are the Kubernetes clusters where customer applications run. FFP provisions them via Cluster API (CAPI).

See Modern Provisioning for the CAPI architecture.

Supported Providers

Provider	CAPI Provider	Distribution	Guide
VitroAI (OpenStack)	CAPO	RKE2	CAPI Providers
AWS	CAPA	EKS	CAPI Providers

Provisioning Methods

Workload clusters can be provisioned through:

Frontier CLI — frontier create cluster capo or frontier create cluster eks. See Cluster Management.
OutpostAI UI — cluster creation wizard with template selection. See OutpostAI Guide.

Both methods use the Cluster Template System — JSON Schema-driven templates that render CAPI manifests, push to Gitea, and ArgoCD syncs them to the FMC.

Pre-Baked Node Images

For on-premise CAPO clusters, build node images with Packer that include RKE2, CNI images, and system packages pre-cached. See Packer Image Build.

Validation Checklist

CAPI controllers healthy on FMC (kubectl get providers)
Test cluster provisions successfully (control plane + workers join)
Nodes show Ready and workloads schedule
Persistent storage works (PVC binds, pod mounts volume)
Compliance Operator running (if required for your classification)

Phase 6: Integration Layer

The integration layer connects customer Systems of Record (ServiceNow, vCenter, Splunk, CMDBs) to the FFO digital twin.

See SoR Integration Architecture for the full design.

Components

Component	Purpose	Installation
Camel-K	Integration runtime	Helm chart from Eupraxia Labs registry
Wanaku	MCP router with OIDC and classification-aware routing	Deploy per classification level
Kaoto	Visual integration designer	VS Code Extension (Extension Pack for Apache Camel by Red Hat)

Integration Patterns

Pattern	Trigger	Use Case
Scout (Polling)	Timer/Cron	vCenter inventory sync (every 15 min)
Event-Driven	Webhook	ServiceNow incident notifications
On-Demand	MCP tool call	Splunk log search during investigation

Validation Checklist

Camel-K operator running in f3iai namespace
Wanaku MCP tool list returns configured integrations
Agent can invoke an integration tool and receive results
FFO entities update after a Scout sync cycle

Post-Installation

Getting Started

Operators: Getting Started Guide for your first dispatch
CLI users: Frontier CLI Installation — brew install frontier-cli
Security review: Security in FKP for compliance posture

Day-2 Operations

Frontier CLI — Command Reference for cluster management and AI chat
FMC Admin Guide — Administration for platform maintenance
FFO Queries — TypeQL Examples for querying the knowledge graph
Add-ons — Add-ons Guide for CNI, CSI, ingress, and monitoring

Upgrades

FFP upgrades follow the GitOps pattern:

Eupraxia Labs publishes new image tags to registry.eupraxialabs.com
For air-gapped deployments, mirror the new images to your local Harbor first
Update the image tags in your GitOps repository
ArgoCD detects the change and reconciles — rolling updates with zero downtime

Support

Channel	Contact
Technical Support	support@eupraxialabs.com
Sales	sales@eupraxialabs.com
Documentation	readthedocs.eupraxia.io