Cluster Template System — Multi-Hyperscaler CAPI Provisioning

Postgres-backed Jinja2 template engine for provisioning Kubernetes workload clusters across OpenStack/Vitro (CAPO), AWS EKS (CAPA), Azure AKS (CAPZ), and Oracle Cloud OKE (CAPOCI) — adding a new hyperscaler is a Postgres insert, not a code change.

The Cluster Template System is the engine that turns “operator clicks Create Cluster” into a real Kubernetes workload cluster running on a hyperscaler. It is the implementation of ADR-008 and the foundation for multi-hyperscaler cluster provisioning across the Federal Frontier Platform.

The short version: cluster manifests are stored as Jinja2 templates with JSON Schema validation, rendered with operator-supplied values, written to Gitea, and reconciled into real cloud clusters by the appropriate Cluster API (CAPI) provider running on the Fleet Management Cluster (FMC). Adding a new hyperscaler is a Postgres insert, not a code change.

Supported Hyperscalers

Phase 1 of the Cluster Template System ships with four production-ready templates covering every major federal cloud target:

Provider	Template Name	CAPI Provider	Kubernetes Distribution	K8s Version Range
OpenStack / Vitro HCI	`capo-rke2-default`	CAPO (cluster-api-provider-openstack)	RKE2 on Nova VMs	v1.30.5+rke2r1 → v1.34.0+rke2r1
AWS	`capa-eks-default`	CAPA (cluster-api-provider-aws)	EKS managed control plane	v1.30 → v1.34
Azure	`capz-aks-default`	CAPZ (cluster-api-provider-azure)	AKS managed control plane	v1.30.0 → v1.34.0
Oracle Cloud	`capoci-oke-default`	CAPOCI (cluster-api-provider-oci)	OKE managed control plane	v1.30.1 → v1.34.0

All four templates are seeded automatically on first startup of the TrailbossAI backend. Operators do not need to import them manually.

Provider coverage is expandable. Adding vSphere (CAPV), Google Cloud (CAPG), IBM Cloud (CAPIBM), or Equinix Metal is a single Postgres row plus a Jinja2 file — no Trailboss code changes, no new image build, no Trailboss redeploy.

Architecture Overview

graph TD Op["Operator clicks Create Cluster
in OutpostAI wizard"] --> Wiz["Wizard fetches templates
per provider, renders
schema-driven form fields"] Wiz --> API["POST /cluster-templates/render
with values"] API --> Validate["JSON Schema validation
fail-fast on bad input"] Validate --> Render["Jinja2 render with
StrictUndefined"] Render --> Audit[("Postgres
cluster_renders table
full audit trail")] Render --> Gitea[("Gitea
federal-frontier-platform.git
clusters/name/yaml")] Gitea --> ArgoCD["ArgoCD on FMC
auto-sync"] ArgoCD --> CAPIProvider["CAPI provider on FMC
capo / capa / capz / capoci"] CAPIProvider --> Cloud["Hyperscaler API
OpenStack · AWS · Azure · Oracle"] Cloud --> Cluster["New Kubernetes
workload cluster"] style Validate fill:#553c9a,stroke:#805ad5,color:#fff style Render fill:#2b6cb0,stroke:#4299e1,color:#fff style Audit fill:#553c9a,stroke:#805ad5,color:#fff style Gitea fill:#2c7a7b,stroke:#38b2ac,color:#fff style ArgoCD fill:#2c7a7b,stroke:#38b2ac,color:#fff style Cluster fill:#2f855a,stroke:#48bb78,color:#fff

Every layer has a clear responsibility:

Layer	Responsibility
OutpostAI Wizard	Operator UX. Fetches available templates per provider, renders form fields dynamically from each template’s JSON Schema.
TrailbossAI API	Six new endpoints (list / get / schema / preview / render / audit).
Render Engine	JSON Schema validation, Jinja2 render with `StrictUndefined`, multi-doc YAML splitting via `# FILE:` markers, audit row insertion.
gitops_writer	Pythonic Gitea REST client. Pushes the rendered file map to the GitOps repo with idempotent POST → PUT fallback.
Postgres	Source of truth for templates, template versions, and per-cluster render audit.
Gitea	The GitOps source of truth that ArgoCD watches.
ArgoCD on FMC	Reconciles the rendered manifests onto the Fleet Management Cluster.
CAPI Providers	The actual reconcilers that talk to each hyperscaler API and provision real clusters.

Postgres Schema

Three tables in the f3iai / frontier-db Postgres database, created automatically on first connect via CREATE TABLE IF NOT EXISTS:

`cluster_templates`

The active template registry.

Column	Type	Purpose
`id`	SERIAL PK	Stable template identifier
`name`	VARCHAR(64) UNIQUE	Human-readable template name (e.g. `capo-rke2-default`)
`provider`	VARCHAR(32)	CAPI provider — `capo`, `capa`, `capz`, `capoci`
`k8s_distro`	VARCHAR(32)	`rke2`, `eks`, `aks`, `oke`
`description`	TEXT	Operator-facing description
`template_yaml`	TEXT	The Jinja2 source
`values_schema`	JSONB	JSON Schema for input validation
`version`	INT	Current version number
`parent_id`	INT FK	For template inheritance (customer overrides)
`is_active`	BOOLEAN	Soft-delete flag
`created_at`, `created_by`	TIMESTAMPTZ, VARCHAR	Provenance

`cluster_template_versions`

Immutable version history. Every edit to a template inserts a new row here; templates are never updated in place. In-flight cluster renders pin to a specific template_version_id so a template change doesn’t silently break clusters mid-provisioning.

`cluster_renders`

The audit trail. Every render — preview or production — is recorded here.

Column	Type	Purpose
`id`	SERIAL PK	Render identifier
`cluster_name`, `namespace`	VARCHAR(63)	The cluster being rendered
`template_id`, `template_version`	INT	Pinned template + version
`input_values`	JSONB	The exact payload the operator supplied
`rendered_files`	JSONB	The path → content map written to Gitea
`git_commit_sha`	VARCHAR(64)	Resulting Gitea commit
`rendered_at`, `rendered_by`	TIMESTAMPTZ, VARCHAR	Provenance

This table answers the question “what manifest was generated for this cluster, from which template, with which inputs?” for the lifetime of every cluster.

Template Anatomy

A cluster template is three files under common/tools/cluster_templates_seed/:

capo-rke2-default.j2          # Jinja2 source
capo-rke2-default.schema.json # JSON Schema for inputs
capo-rke2-default.meta.json   # provider, k8s_distro, description

Multi-document output via `# FILE:` markers

A single template renders into multiple manifest files, one per Kubernetes object. The render engine splits the output on lines beginning with # FILE::

# FILE: clusters/CLUSTER_NAME/cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: CLUSTER_NAME
  ...
---
# FILE: clusters/CLUSTER_NAME/control-plane.yaml
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
...

(In an actual template, CLUSTER_NAME is the Jinja2 expression {{ cluster_name }} — replaced here with a literal so the syntax highlighter renders cleanly in dark mode.)

The CAPO template produces 7 files; CAPA, CAPZ, and CAPOCI each produce 5. Every rendered file is recorded in the cluster_renders.rendered_files audit JSONB.

JSON Schema validation

The accompanying .schema.json declares every input the template accepts, with types, defaults, enums, ranges, and human-readable titles. The render engine validates every input against this schema before rendering. Bad input fails with HTTP 400 and a structured error pointing at the offending field — not a half-rendered YAML blob written to Gitea.

The schema is also what the OutpostAI wizard reads to render its dynamic Step 2 form. The wizard’s form renderer maps schema constructs to Blueprint UI controls:

JSON Schema construct	Blueprint UI control
`enum` (string array)	`HTMLSelect` dropdown
`type: integer`	`InputGroup` numeric with min/max
`type: boolean`	`Switch`
`type: array`	comma-separated text input
`type: string`	`InputGroup` text input
`default`	pre-filled value
`title`	field label
`description`	helper text
listed in `required`	`(required)` label info

Adding a new hyperscaler requires no UI code changes. The wizard inspects the schema at runtime and renders whatever fields the new template declares.

API Endpoints

Six endpoints on the TrailbossAI API (harbor.vitro.lan/ffp/trailboss:v5.7.3+):

`GET /api/v1/cluster-templates[?provider=capo]`

Returns the list of active templates, optionally filtered by provider.

{
  "templates": [
    {"id": 2, "name": "capo-rke2-default", "provider": "capo", "k8s_distro": "rke2", "version": 1},
    {"id": 1, "name": "capa-eks-default",  "provider": "capa", "k8s_distro": "eks",  "version": 1},
    {"id": 4, "name": "capz-aks-default",  "provider": "capz", "k8s_distro": "aks",  "version": 1},
    {"id": 3, "name": "capoci-oke-default","provider": "capoci","k8s_distro":"oke",  "version": 1}
  ],
  "count": 4
}

`GET /api/v1/cluster-templates/{id}`

Returns a single template including its full Jinja2 source and JSON Schema.

`GET /api/v1/cluster-templates/{id}/schema`

Returns only the JSON Schema. This is what the OutpostAI wizard polls when an operator selects a template — it uses the schema to render the form fields.

`POST /api/v1/cluster-templates/{id}/preview`

Renders a template against operator-supplied values without writing to Gitea or recording an audit row. Used by the wizard’s Review step to show operators exactly what will be committed before they click Create Cluster.

curl -X POST http://localhost:8080/api/v1/cluster-templates/3/preview \
  -H "Content-Type: application/json" \
  -d '{
    "cluster_name": "oracle-test-1",
    "namespace": "workloads",
    "values": {
      "oci_compartment_id": "ocid1.compartment.oc1..aaaa",
      "oci_region": "us-gov-ashburn-1",
      "oci_image_ocid": "ocid1.image.oc1.iad.aaaa",
      "worker_count": 5,
      "k8s_version": "v1.34.0"
    }
  }'

Returns:

{
  "template_id": 3,
  "cluster_name": "oracle-test-1",
  "namespace": "workloads",
  "files": {
    "clusters/oracle-test-1/cluster.yaml": "...",
    "clusters/oracle-test-1/oci-managed-cluster.yaml": "...",
    "clusters/oracle-test-1/oci-managed-control-plane.yaml": "...",
    "clusters/oracle-test-1/managed-machine-pool.yaml": "...",
    "clusters/oracle-test-1/oci-managed-machine-pool.yaml": "..."
  },
  "file_count": 5
}

`POST /api/v1/cluster-templates/{id}/render`

Renders and pushes to Gitea. Records an audit row. This is the endpoint OutpostAI’s wizard hits when the operator clicks Create Cluster.

`GET /api/v1/clusters/{name}/render?namespace=<ns>`

Returns the most recent render audit row for a cluster. Answers “what manifest was generated for this cluster, from which template, with which inputs?” The legacy Trailboss did not have this — debugging a misbehaving cluster meant reverse-engineering Git history. With the audit table, the answer is one query away.

OutpostAI Wizard — Schema-Driven Step 2

The cluster creation wizard in OutpostAI now supports all four hyperscalers. The Provider dropdown in Step 1 lists:

OpenStack (VitroAI) — CAPO
AWS EKS — CAPA
Azure AKS — CAPZ
Oracle Cloud OKE — CAPOCI

When the operator changes provider, Step 2 (Infrastructure) dynamically refetches the template list for that provider and renders form fields from the chosen template’s JSON Schema. Each form field is generated from the schema’s properties object — title becomes the label, description becomes helper text, default becomes the pre-filled value, enum becomes a dropdown.

Step 3 (Add-ons) is unchanged from the existing add-on selection model with the Monaco values.yaml editor.

Step 4 (Review) shows the chosen template name + version and a JSON dump of the input values that will be sent to the render engine.

When the operator clicks Create Cluster, the wizard calls POST /api/v1/cluster-templates/{id}/render, the engine validates → renders → pushes to Gitea, and ArgoCD picks it up within seconds.

Adding a new hyperscaler does not require any frontend code changes. The wizard auto-discovers templates and renders the right fields from the JSON Schema.

TrailbossAI Chatbot Integration

The chatbot in OutpostAI and Compass exposes two new built-in tools that wrap the cluster template system:

`list_cluster_templates(provider?)`

Lets the LLM discover what templates are available before attempting to create a cluster. The LLM uses this to learn the template names and which provider they belong to.

`create_kubernetes_cluster(cluster_name, template_name, values, namespace?)`

Renders and pushes a cluster from natural language. The LLM:

Parses the operator’s request (e.g. “spin up an EKS cluster in us-gov-east-1 with 3 workers”)
Calls list_cluster_templates(provider="capa") to find capa-eks-default
Constructs the values dict (aws_region: "us-gov-east-1", worker_count: 3)
Calls create_kubernetes_cluster(cluster_name="my-eks-1", template_name="capa-eks-default", values=...)

The system prompt instructs the LLM:

When asked to create a Kubernetes cluster: list templates first, pick the one matching the user’s intent, supply provider-specific values. CAPO needs image_name + external_network_id; CAPA needs aws_region + worker_instance_type; CAPZ needs azure_region + resource_group + subscription_id; CAPOCI needs oci_compartment_id + oci_region + oci_image_ocid. The values dict is validated against the template’s JSON Schema before render — if validation fails, the error message tells you which field is wrong.

This is the same workflow the wizard uses, but driven by natural language.

CAPI Provider Installation on FMC

For the cluster template engine to actually provision real clusters, the corresponding Cluster API providers must be running on the Fleet Management Cluster. The current FMC has all four:

Provider	Namespace	Version	Status
CAPI Core	`capi-system`	v1.11.0	Running
CAPO (OpenStack)	`capo-system`	(existing)	Running
CAPA (AWS)	`capa-system`	v2.10.2	Running
CAPZ (Azure)	`capz-system`	v1.23.0	Running
CAPOCI (Oracle)	`cluster-api-provider-oci-system`	v0.24.0	Installed (scaled to 0 pending real OCI credentials)

CAPOCI’s controller manager fails to start without a real capoci-auth-config Secret containing valid OCI tenancy/user/fingerprint/key data. Once an operator populates that Secret (via Sealed Secrets, External Secrets Operator, or kubectl patch), scaling the deployment back to 1 brings it online.

CAPA and CAPZ tolerate empty credentials at startup — they only need real credentials when actually reconciling a cluster. Their controllers run idle until a workload cluster manifest references them.

Per-Cluster Credentials

Each provider expects per-cluster identity references:

Provider	Identity Resource	Required Secret
CAPO	`cloud-config` Secret	OpenStack Application Credentials in INI format
CAPA	`AWSClusterStaticIdentity`	AWS access key + secret
CAPZ	`AzureClusterIdentity`	Service Principal client secret
CAPOCI	`OCIClusterIdentity`	Tenancy OCID + user OCID + API key fingerprint + PEM key

Stub Secrets and Identity templates ship in deploy/overlays/fmc/capi-providers/ in the federal-frontier-platform Gitea repo. Operators populate them out-of-band — credentials should never be committed to git in plaintext. Sealed Secrets, External Secrets Operator pulling from Vault, or kubectl patch are all valid options.

Versioning and Lifecycle

Resource	Lifecycle
Templates	Versioned in `cluster_template_versions`. Edits create a new version; old versions are never modified.
In-flight clusters	Pin to a specific `template_version_id` at render time. Template edits do not break running provisioning.
Active version	Each template row has a current `version` number. New cluster creates use the active version.
Render audit	Retained for 365 days minimum. Older rows can be archived to object storage if needed. ~50 KB per render.

Operator Workflow Examples

Provision a Vitro RKE2 cluster (default OpenStack template)

Open OutpostAI → Clusters → Create Cluster
Step 1: Name satellite-1, namespace workloads, Provider OpenStack (VitroAI) — CAPO
Step 2: Wizard fetches capo-rke2-default, renders form fields:
- Control Plane Nodes: 1 (default)
- Worker Nodes: 3
- Glance Image Name: rke2-node-v1.31-20260407
- External Network ID: <your-vitro-network-uuid>
- Kubernetes Version: v1.31.4+rke2r1 (default)
Step 3: Add-ons — defaults to Canal CNI + Cinder CSI
Step 4: Review the rendered manifest preview, click Create Cluster
ArgoCD picks up the manifests within seconds, CAPO provisions the cluster

Provision an AWS EKS cluster via the chatbot

Operator: spin up an EKS cluster in us-gov-east-1 with 3 t3.large workers
LLM:      [calls list_cluster_templates(provider="capa")]
LLM:      [calls create_kubernetes_cluster(
            cluster_name="govcloud-eks-1",
            template_name="capa-eks-default",
            values={
              "aws_region": "us-gov-east-1",
              "worker_count": 3,
              "worker_instance_type": "t3.large",
              "k8s_version": "v1.31"
            }
          )]
LLM:      Cluster 'govcloud-eks-1' provisioning initiated via template
          'capa-eks-default'. 5 manifests pushed to Gitea. ArgoCD will
          sync them to the FMC; CAPI provider 'capa' will provision the cluster.

Inspect what was generated for an existing cluster

curl http://trailboss.f3iai.svc.cluster.local:8080/api/v1/clusters/satellite-1/render?namespace=workloads

Returns the audit row with the exact template version, input values, rendered files, and git commit SHA.

Adding a New Hyperscaler

No code changes required. Three steps:

Author a Jinja2 template at common/tools/cluster_templates_seed/<provider>-<distro>-default.j2. Use # FILE: markers to split into multi-doc output. Reference upstream CAPI provider documentation for the manifest shapes.
Author the JSON Schema at <name>.schema.json. Declare every input the template accepts, with types, defaults, enums, ranges, titles, and descriptions.

Author the metadata at <name>.meta.json:

{
  "provider": "capv",
  "k8s_distro": "kubeadm",
  "description": "Default vSphere CAPV template"
}

On the next TrailbossAI startup, the seed loader inserts the new template into Postgres. The OutpostAI wizard’s Provider dropdown automatically picks it up. The chatbot’s list_cluster_templates tool also picks it up. Zero code changes, zero rebuild, zero redeploy.

The corresponding CAPI provider controller does need to be installed on the FMC. The deploy/overlays/fmc/capi-providers/ overlay in the federal-frontier-platform repo handles CAPA, CAPZ, and CAPOCI; new providers follow the same pattern.

Where Things Live

Item	Location
Render engine	`common/tools/cluster_templates.py` (federal-frontier-f3iai)
Gitea writer	`common/tools/gitops_writer.py`
Seed templates	`common/tools/cluster_templates_seed/`
API endpoints	`common/api/trailboss_api.py`
Chatbot tools	`common/api/unified_chat.py`
Wizard	`frontend/components/ClusterCreateDialog.tsx` (outpostai-dev)
API client	`frontend/lib/api.ts`
CAPI provider install	`deploy/overlays/fmc/capi-providers/` (federal-frontier-platform)
ADR	`docs/adr/ADR-008-cluster-template-system.md` (federal-frontier-f3iai)
ADR canonical	Confluence FED ADR-008

Storage Architecture (Cinder + Ceph) — How Cinder CSI is the default for Vitro clusters
Cluster Bootstrap Flow — End-to-end flow from Create Cluster click to first PVC mounting
Packer Image Build Process — How the RKE2 node images consumed by CAPO are built
OutpostAI Mission Control — The operator UI that wraps the cluster template system
Unified Chat — The chatbot module that exposes the cluster template tools

Cluster Template System — Multi-Hyperscaler CAPI Provisioning

Supported Hyperscalers

Architecture Overview

Postgres Schema

cluster_templates

cluster_template_versions

cluster_renders