Cluster Template System — Multi-Hyperscaler CAPI Provisioning

Postgres-backed Jinja2 template engine for provisioning Kubernetes workload clusters across OpenStack/Vitro (CAPO), AWS EKS (CAPA), Azure AKS (CAPZ), and Oracle Cloud OKE (CAPOCI) — adding a new hyperscaler is a Postgres insert, not a code change.

The Cluster Template System is the engine that turns “operator clicks Create Cluster” into a real Kubernetes workload cluster running on a hyperscaler. It is the implementation of ADR-008 and the foundation for multi-hyperscaler cluster provisioning across the Federal Frontier Platform.

The short version: cluster manifests are stored as Jinja2 templates with JSON Schema validation, rendered with operator-supplied values, written to Gitea, and reconciled into real cloud clusters by the appropriate Cluster API (CAPI) provider running on the Fleet Management Cluster (FMC). Adding a new hyperscaler is a Postgres insert, not a code change.

Supported Hyperscalers

Phase 1 of the Cluster Template System ships with four production-ready templates covering every major federal cloud target:

Provider Template Name CAPI Provider Kubernetes Distribution K8s Version Range
OpenStack / Vitro HCI capo-rke2-default CAPO (cluster-api-provider-openstack) RKE2 on Nova VMs v1.30.5+rke2r1 → v1.34.0+rke2r1
AWS capa-eks-default CAPA (cluster-api-provider-aws) EKS managed control plane v1.30 → v1.34
Azure capz-aks-default CAPZ (cluster-api-provider-azure) AKS managed control plane v1.30.0 → v1.34.0
Oracle Cloud capoci-oke-default CAPOCI (cluster-api-provider-oci) OKE managed control plane v1.30.1 → v1.34.0

All four templates are seeded automatically on first startup of the TrailbossAI backend. Operators do not need to import them manually.

Provider coverage is expandable. Adding vSphere (CAPV), Google Cloud (CAPG), IBM Cloud (CAPIBM), or Equinix Metal is a single Postgres row plus a Jinja2 file — no Trailboss code changes, no new image build, no Trailboss redeploy.

Architecture Overview

graph TD Op["Operator clicks Create Cluster
in OutpostAI wizard"] --> Wiz["Wizard fetches templates
per provider, renders
schema-driven form fields"] Wiz --> API["POST /cluster-templates/render
with values"] API --> Validate["JSON Schema validation
fail-fast on bad input"] Validate --> Render["Jinja2 render with
StrictUndefined"] Render --> Audit[("Postgres
cluster_renders table
full audit trail")] Render --> Gitea[("Gitea
federal-frontier-platform.git
clusters/name/yaml")] Gitea --> ArgoCD["ArgoCD on FMC
auto-sync"] ArgoCD --> CAPIProvider["CAPI provider on FMC
capo / capa / capz / capoci"] CAPIProvider --> Cloud["Hyperscaler API
OpenStack · AWS · Azure · Oracle"] Cloud --> Cluster["New Kubernetes
workload cluster"] style Validate fill:#553c9a,stroke:#805ad5,color:#fff style Render fill:#2b6cb0,stroke:#4299e1,color:#fff style Audit fill:#553c9a,stroke:#805ad5,color:#fff style Gitea fill:#2c7a7b,stroke:#38b2ac,color:#fff style ArgoCD fill:#2c7a7b,stroke:#38b2ac,color:#fff style Cluster fill:#2f855a,stroke:#48bb78,color:#fff

Every layer has a clear responsibility:

Layer Responsibility
OutpostAI Wizard Operator UX. Fetches available templates per provider, renders form fields dynamically from each template’s JSON Schema.
TrailbossAI API Six new endpoints (list / get / schema / preview / render / audit).
Render Engine JSON Schema validation, Jinja2 render with StrictUndefined, multi-doc YAML splitting via # FILE: markers, audit row insertion.
gitops_writer Pythonic Gitea REST client. Pushes the rendered file map to the GitOps repo with idempotent POST → PUT fallback.
Postgres Source of truth for templates, template versions, and per-cluster render audit.
Gitea The GitOps source of truth that ArgoCD watches.
ArgoCD on FMC Reconciles the rendered manifests onto the Fleet Management Cluster.
CAPI Providers The actual reconcilers that talk to each hyperscaler API and provision real clusters.

Postgres Schema

Three tables in the f3iai / frontier-db Postgres database, created automatically on first connect via CREATE TABLE IF NOT EXISTS:

cluster_templates

The active template registry.

Column Type Purpose
id SERIAL PK Stable template identifier
name VARCHAR(64) UNIQUE Human-readable template name (e.g. capo-rke2-default)
provider VARCHAR(32) CAPI provider — capo, capa, capz, capoci
k8s_distro VARCHAR(32) rke2, eks, aks, oke
description TEXT Operator-facing description
template_yaml TEXT The Jinja2 source
values_schema JSONB JSON Schema for input validation
version INT Current version number
parent_id INT FK For template inheritance (customer overrides)
is_active BOOLEAN Soft-delete flag
created_at, created_by TIMESTAMPTZ, VARCHAR Provenance

cluster_template_versions

Immutable version history. Every edit to a template inserts a new row here; templates are never updated in place. In-flight cluster renders pin to a specific template_version_id so a template change doesn’t silently break clusters mid-provisioning.

cluster_renders

The audit trail. Every render — preview or production — is recorded here.

Column Type Purpose
id SERIAL PK Render identifier
cluster_name, namespace VARCHAR(63) The cluster being rendered
template_id, template_version INT Pinned template + version
input_values JSONB The exact payload the operator supplied
rendered_files JSONB The path → content map written to Gitea
git_commit_sha VARCHAR(64) Resulting Gitea commit
rendered_at, rendered_by TIMESTAMPTZ, VARCHAR Provenance

This table answers the question “what manifest was generated for this cluster, from which template, with which inputs?” for the lifetime of every cluster.

Template Anatomy

A cluster template is three files under common/tools/cluster_templates_seed/:

capo-rke2-default.j2          # Jinja2 source
capo-rke2-default.schema.json # JSON Schema for inputs
capo-rke2-default.meta.json   # provider, k8s_distro, description

Multi-document output via # FILE: markers

A single template renders into multiple manifest files, one per Kubernetes object. The render engine splits the output on lines beginning with # FILE::

# FILE: clusters/CLUSTER_NAME/cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: CLUSTER_NAME
  ...
---
# FILE: clusters/CLUSTER_NAME/control-plane.yaml
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
...

(In an actual template, CLUSTER_NAME is the Jinja2 expression {{ cluster_name }} — replaced here with a literal so the syntax highlighter renders cleanly in dark mode.)

The CAPO template produces 7 files; CAPA, CAPZ, and CAPOCI each produce 5. Every rendered file is recorded in the cluster_renders.rendered_files audit JSONB.

JSON Schema validation

The accompanying .schema.json declares every input the template accepts, with types, defaults, enums, ranges, and human-readable titles. The render engine validates every input against this schema before rendering. Bad input fails with HTTP 400 and a structured error pointing at the offending field — not a half-rendered YAML blob written to Gitea.

The schema is also what the OutpostAI wizard reads to render its dynamic Step 2 form. The wizard’s form renderer maps schema constructs to Blueprint UI controls:

JSON Schema construct Blueprint UI control
enum (string array) HTMLSelect dropdown
type: integer InputGroup numeric with min/max
type: boolean Switch
type: array comma-separated text input
type: string InputGroup text input
default pre-filled value
title field label
description helper text
listed in required (required) label info

Adding a new hyperscaler requires no UI code changes. The wizard inspects the schema at runtime and renders whatever fields the new template declares.

API Endpoints

Six endpoints on the TrailbossAI API (harbor.vitro.lan/ffp/trailboss:v5.7.3+):

GET /api/v1/cluster-templates[?provider=capo]

Returns the list of active templates, optionally filtered by provider.

{
  "templates": [
    {"id": 2, "name": "capo-rke2-default", "provider": "capo", "k8s_distro": "rke2", "version": 1},
    {"id": 1, "name": "capa-eks-default",  "provider": "capa", "k8s_distro": "eks",  "version": 1},
    {"id": 4, "name": "capz-aks-default",  "provider": "capz", "k8s_distro": "aks",  "version": 1},
    {"id": 3, "name": "capoci-oke-default","provider": "capoci","k8s_distro":"oke",  "version": 1}
  ],
  "count": 4
}

GET /api/v1/cluster-templates/{id}

Returns a single template including its full Jinja2 source and JSON Schema.

GET /api/v1/cluster-templates/{id}/schema

Returns only the JSON Schema. This is what the OutpostAI wizard polls when an operator selects a template — it uses the schema to render the form fields.

POST /api/v1/cluster-templates/{id}/preview

Renders a template against operator-supplied values without writing to Gitea or recording an audit row. Used by the wizard’s Review step to show operators exactly what will be committed before they click Create Cluster.

curl -X POST http://localhost:8080/api/v1/cluster-templates/3/preview \
  -H "Content-Type: application/json" \
  -d '{
    "cluster_name": "oracle-test-1",
    "namespace": "workloads",
    "values": {
      "oci_compartment_id": "ocid1.compartment.oc1..aaaa",
      "oci_region": "us-gov-ashburn-1",
      "oci_image_ocid": "ocid1.image.oc1.iad.aaaa",
      "worker_count": 5,
      "k8s_version": "v1.34.0"
    }
  }'

Returns:

{
  "template_id": 3,
  "cluster_name": "oracle-test-1",
  "namespace": "workloads",
  "files": {
    "clusters/oracle-test-1/cluster.yaml": "...",
    "clusters/oracle-test-1/oci-managed-cluster.yaml": "...",
    "clusters/oracle-test-1/oci-managed-control-plane.yaml": "...",
    "clusters/oracle-test-1/managed-machine-pool.yaml": "...",
    "clusters/oracle-test-1/oci-managed-machine-pool.yaml": "..."
  },
  "file_count": 5
}

POST /api/v1/cluster-templates/{id}/render

Renders and pushes to Gitea. Records an audit row. This is the endpoint OutpostAI’s wizard hits when the operator clicks Create Cluster.

GET /api/v1/clusters/{name}/render?namespace=<ns>

Returns the most recent render audit row for a cluster. Answers “what manifest was generated for this cluster, from which template, with which inputs?” The legacy Trailboss did not have this — debugging a misbehaving cluster meant reverse-engineering Git history. With the audit table, the answer is one query away.

OutpostAI Wizard — Schema-Driven Step 2

The cluster creation wizard in OutpostAI now supports all four hyperscalers. The Provider dropdown in Step 1 lists:

  • OpenStack (VitroAI) — CAPO
  • AWS EKS — CAPA
  • Azure AKS — CAPZ
  • Oracle Cloud OKE — CAPOCI

When the operator changes provider, Step 2 (Infrastructure) dynamically refetches the template list for that provider and renders form fields from the chosen template’s JSON Schema. Each form field is generated from the schema’s properties object — title becomes the label, description becomes helper text, default becomes the pre-filled value, enum becomes a dropdown.

Step 3 (Add-ons) is unchanged from the existing add-on selection model with the Monaco values.yaml editor.

Step 4 (Review) shows the chosen template name + version and a JSON dump of the input values that will be sent to the render engine.

When the operator clicks Create Cluster, the wizard calls POST /api/v1/cluster-templates/{id}/render, the engine validates → renders → pushes to Gitea, and ArgoCD picks it up within seconds.

Adding a new hyperscaler does not require any frontend code changes. The wizard auto-discovers templates and renders the right fields from the JSON Schema.

TrailbossAI Chatbot Integration

The chatbot in OutpostAI and Compass exposes two new built-in tools that wrap the cluster template system:

list_cluster_templates(provider?)

Lets the LLM discover what templates are available before attempting to create a cluster. The LLM uses this to learn the template names and which provider they belong to.

create_kubernetes_cluster(cluster_name, template_name, values, namespace?)

Renders and pushes a cluster from natural language. The LLM:

  1. Parses the operator’s request (e.g. “spin up an EKS cluster in us-gov-east-1 with 3 workers”)
  2. Calls list_cluster_templates(provider="capa") to find capa-eks-default
  3. Constructs the values dict (aws_region: "us-gov-east-1", worker_count: 3)
  4. Calls create_kubernetes_cluster(cluster_name="my-eks-1", template_name="capa-eks-default", values=...)

The system prompt instructs the LLM:

When asked to create a Kubernetes cluster: list templates first, pick the one matching the user’s intent, supply provider-specific values. CAPO needs image_name + external_network_id; CAPA needs aws_region + worker_instance_type; CAPZ needs azure_region + resource_group + subscription_id; CAPOCI needs oci_compartment_id + oci_region + oci_image_ocid. The values dict is validated against the template’s JSON Schema before render — if validation fails, the error message tells you which field is wrong.

This is the same workflow the wizard uses, but driven by natural language.

CAPI Provider Installation on FMC

For the cluster template engine to actually provision real clusters, the corresponding Cluster API providers must be running on the Fleet Management Cluster. The current FMC has all four:

Provider Namespace Version Status
CAPI Core capi-system v1.11.0 Running
CAPO (OpenStack) capo-system (existing) Running
CAPA (AWS) capa-system v2.10.2 Running
CAPZ (Azure) capz-system v1.23.0 Running
CAPOCI (Oracle) cluster-api-provider-oci-system v0.24.0 Installed (scaled to 0 pending real OCI credentials)

CAPOCI’s controller manager fails to start without a real capoci-auth-config Secret containing valid OCI tenancy/user/fingerprint/key data. Once an operator populates that Secret (via Sealed Secrets, External Secrets Operator, or kubectl patch), scaling the deployment back to 1 brings it online.

CAPA and CAPZ tolerate empty credentials at startup — they only need real credentials when actually reconciling a cluster. Their controllers run idle until a workload cluster manifest references them.

Per-Cluster Credentials

Each provider expects per-cluster identity references:

Provider Identity Resource Required Secret
CAPO cloud-config Secret OpenStack Application Credentials in INI format
CAPA AWSClusterStaticIdentity AWS access key + secret
CAPZ AzureClusterIdentity Service Principal client secret
CAPOCI OCIClusterIdentity Tenancy OCID + user OCID + API key fingerprint + PEM key

Stub Secrets and Identity templates ship in deploy/overlays/fmc/capi-providers/ in the federal-frontier-platform Gitea repo. Operators populate them out-of-band — credentials should never be committed to git in plaintext. Sealed Secrets, External Secrets Operator pulling from Vault, or kubectl patch are all valid options.

Versioning and Lifecycle

Resource Lifecycle
Templates Versioned in cluster_template_versions. Edits create a new version; old versions are never modified.
In-flight clusters Pin to a specific template_version_id at render time. Template edits do not break running provisioning.
Active version Each template row has a current version number. New cluster creates use the active version.
Render audit Retained for 365 days minimum. Older rows can be archived to object storage if needed. ~50 KB per render.

Operator Workflow Examples

Provision a Vitro RKE2 cluster (default OpenStack template)

  1. Open OutpostAI → Clusters → Create Cluster
  2. Step 1: Name satellite-1, namespace workloads, Provider OpenStack (VitroAI) — CAPO
  3. Step 2: Wizard fetches capo-rke2-default, renders form fields:
    • Control Plane Nodes: 1 (default)
    • Worker Nodes: 3
    • Glance Image Name: rke2-node-v1.31-20260407
    • External Network ID: <your-vitro-network-uuid>
    • Kubernetes Version: v1.31.4+rke2r1 (default)
  4. Step 3: Add-ons — defaults to Canal CNI + Cinder CSI
  5. Step 4: Review the rendered manifest preview, click Create Cluster
  6. ArgoCD picks up the manifests within seconds, CAPO provisions the cluster

Provision an AWS EKS cluster via the chatbot

Operator: spin up an EKS cluster in us-gov-east-1 with 3 t3.large workers
LLM:      [calls list_cluster_templates(provider="capa")]
LLM:      [calls create_kubernetes_cluster(
            cluster_name="govcloud-eks-1",
            template_name="capa-eks-default",
            values={
              "aws_region": "us-gov-east-1",
              "worker_count": 3,
              "worker_instance_type": "t3.large",
              "k8s_version": "v1.31"
            }
          )]
LLM:      Cluster 'govcloud-eks-1' provisioning initiated via template
          'capa-eks-default'. 5 manifests pushed to Gitea. ArgoCD will
          sync them to the FMC; CAPI provider 'capa' will provision the cluster.

Inspect what was generated for an existing cluster

curl http://trailboss.f3iai.svc.cluster.local:8080/api/v1/clusters/satellite-1/render?namespace=workloads

Returns the audit row with the exact template version, input values, rendered files, and git commit SHA.

Adding a New Hyperscaler

No code changes required. Three steps:

  1. Author a Jinja2 template at common/tools/cluster_templates_seed/<provider>-<distro>-default.j2. Use # FILE: markers to split into multi-doc output. Reference upstream CAPI provider documentation for the manifest shapes.

  2. Author the JSON Schema at <name>.schema.json. Declare every input the template accepts, with types, defaults, enums, ranges, titles, and descriptions.

  3. Author the metadata at <name>.meta.json:

    {
      "provider": "capv",
      "k8s_distro": "kubeadm",
      "description": "Default vSphere CAPV template"
    }
    

On the next TrailbossAI startup, the seed loader inserts the new template into Postgres. The OutpostAI wizard’s Provider dropdown automatically picks it up. The chatbot’s list_cluster_templates tool also picks it up. Zero code changes, zero rebuild, zero redeploy.

The corresponding CAPI provider controller does need to be installed on the FMC. The deploy/overlays/fmc/capi-providers/ overlay in the federal-frontier-platform repo handles CAPA, CAPZ, and CAPOCI; new providers follow the same pattern.

Where Things Live

Item Location
Render engine common/tools/cluster_templates.py (federal-frontier-f3iai)
Gitea writer common/tools/gitops_writer.py
Seed templates common/tools/cluster_templates_seed/
API endpoints common/api/trailboss_api.py
Chatbot tools common/api/unified_chat.py
Wizard frontend/components/ClusterCreateDialog.tsx (outpostai-dev)
API client frontend/lib/api.ts
CAPI provider install deploy/overlays/fmc/capi-providers/ (federal-frontier-platform)
ADR docs/adr/ADR-008-cluster-template-system.md (federal-frontier-f3iai)
ADR canonical Confluence FED ADR-008