Cluster Template System — Multi-Hyperscaler CAPI Provisioning
Postgres-backed Jinja2 template engine for provisioning Kubernetes workload clusters across OpenStack/Vitro (CAPO), AWS EKS (CAPA), Azure AKS (CAPZ), and Oracle Cloud OKE (CAPOCI) — adding a new hyperscaler is a Postgres insert, not a code change.
The Cluster Template System is the engine that turns “operator clicks Create Cluster” into a real Kubernetes workload cluster running on a hyperscaler. It is the implementation of ADR-008 and the foundation for multi-hyperscaler cluster provisioning across the Federal Frontier Platform.
The short version: cluster manifests are stored as Jinja2 templates with JSON Schema validation, rendered with operator-supplied values, written to Gitea, and reconciled into real cloud clusters by the appropriate Cluster API (CAPI) provider running on the Fleet Management Cluster (FMC). Adding a new hyperscaler is a Postgres insert, not a code change.
Supported Hyperscalers
Phase 1 of the Cluster Template System ships with four production-ready templates covering every major federal cloud target:
| Provider | Template Name | CAPI Provider | Kubernetes Distribution | K8s Version Range |
|---|---|---|---|---|
| OpenStack / Vitro HCI | capo-rke2-default |
CAPO (cluster-api-provider-openstack) | RKE2 on Nova VMs | v1.30.5+rke2r1 → v1.34.0+rke2r1 |
| AWS | capa-eks-default |
CAPA (cluster-api-provider-aws) | EKS managed control plane | v1.30 → v1.34 |
| Azure | capz-aks-default |
CAPZ (cluster-api-provider-azure) | AKS managed control plane | v1.30.0 → v1.34.0 |
| Oracle Cloud | capoci-oke-default |
CAPOCI (cluster-api-provider-oci) | OKE managed control plane | v1.30.1 → v1.34.0 |
All four templates are seeded automatically on first startup of the TrailbossAI backend. Operators do not need to import them manually.
Provider coverage is expandable. Adding vSphere (CAPV), Google Cloud (CAPG), IBM Cloud (CAPIBM), or Equinix Metal is a single Postgres row plus a Jinja2 file — no Trailboss code changes, no new image build, no Trailboss redeploy.
Architecture Overview
in OutpostAI wizard"] --> Wiz["Wizard fetches templates
per provider, renders
schema-driven form fields"] Wiz --> API["POST /cluster-templates/render
with values"] API --> Validate["JSON Schema validation
fail-fast on bad input"] Validate --> Render["Jinja2 render with
StrictUndefined"] Render --> Audit[("Postgres
cluster_renders table
full audit trail")] Render --> Gitea[("Gitea
federal-frontier-platform.git
clusters/name/yaml")] Gitea --> ArgoCD["ArgoCD on FMC
auto-sync"] ArgoCD --> CAPIProvider["CAPI provider on FMC
capo / capa / capz / capoci"] CAPIProvider --> Cloud["Hyperscaler API
OpenStack · AWS · Azure · Oracle"] Cloud --> Cluster["New Kubernetes
workload cluster"] style Validate fill:#553c9a,stroke:#805ad5,color:#fff style Render fill:#2b6cb0,stroke:#4299e1,color:#fff style Audit fill:#553c9a,stroke:#805ad5,color:#fff style Gitea fill:#2c7a7b,stroke:#38b2ac,color:#fff style ArgoCD fill:#2c7a7b,stroke:#38b2ac,color:#fff style Cluster fill:#2f855a,stroke:#48bb78,color:#fff
Every layer has a clear responsibility:
| Layer | Responsibility |
|---|---|
| OutpostAI Wizard | Operator UX. Fetches available templates per provider, renders form fields dynamically from each template’s JSON Schema. |
| TrailbossAI API | Six new endpoints (list / get / schema / preview / render / audit). |
| Render Engine | JSON Schema validation, Jinja2 render with StrictUndefined, multi-doc YAML splitting via # FILE: markers, audit row insertion. |
| gitops_writer | Pythonic Gitea REST client. Pushes the rendered file map to the GitOps repo with idempotent POST → PUT fallback. |
| Postgres | Source of truth for templates, template versions, and per-cluster render audit. |
| Gitea | The GitOps source of truth that ArgoCD watches. |
| ArgoCD on FMC | Reconciles the rendered manifests onto the Fleet Management Cluster. |
| CAPI Providers | The actual reconcilers that talk to each hyperscaler API and provision real clusters. |
Postgres Schema
Three tables in the f3iai / frontier-db Postgres database, created automatically on first connect via CREATE TABLE IF NOT EXISTS:
cluster_templates
The active template registry.
| Column | Type | Purpose |
|---|---|---|
id |
SERIAL PK | Stable template identifier |
name |
VARCHAR(64) UNIQUE | Human-readable template name (e.g. capo-rke2-default) |
provider |
VARCHAR(32) | CAPI provider — capo, capa, capz, capoci |
k8s_distro |
VARCHAR(32) | rke2, eks, aks, oke |
description |
TEXT | Operator-facing description |
template_yaml |
TEXT | The Jinja2 source |
values_schema |
JSONB | JSON Schema for input validation |
version |
INT | Current version number |
parent_id |
INT FK | For template inheritance (customer overrides) |
is_active |
BOOLEAN | Soft-delete flag |
created_at, created_by |
TIMESTAMPTZ, VARCHAR | Provenance |
cluster_template_versions
Immutable version history. Every edit to a template inserts a new row here; templates are never updated in place. In-flight cluster renders pin to a specific template_version_id so a template change doesn’t silently break clusters mid-provisioning.
cluster_renders
The audit trail. Every render — preview or production — is recorded here.
| Column | Type | Purpose |
|---|---|---|
id |
SERIAL PK | Render identifier |
cluster_name, namespace |
VARCHAR(63) | The cluster being rendered |
template_id, template_version |
INT | Pinned template + version |
input_values |
JSONB | The exact payload the operator supplied |
rendered_files |
JSONB | The path → content map written to Gitea |
git_commit_sha |
VARCHAR(64) | Resulting Gitea commit |
rendered_at, rendered_by |
TIMESTAMPTZ, VARCHAR | Provenance |
This table answers the question “what manifest was generated for this cluster, from which template, with which inputs?” for the lifetime of every cluster.
Template Anatomy
A cluster template is three files under common/tools/cluster_templates_seed/:
capo-rke2-default.j2 # Jinja2 source
capo-rke2-default.schema.json # JSON Schema for inputs
capo-rke2-default.meta.json # provider, k8s_distro, description
Multi-document output via # FILE: markers
A single template renders into multiple manifest files, one per Kubernetes object. The render engine splits the output on lines beginning with # FILE::
# FILE: clusters/CLUSTER_NAME/cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: CLUSTER_NAME
...
---
# FILE: clusters/CLUSTER_NAME/control-plane.yaml
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
...
(In an actual template, CLUSTER_NAME is the Jinja2 expression {{ cluster_name }} — replaced here with a literal so the syntax highlighter renders cleanly in dark mode.)
The CAPO template produces 7 files; CAPA, CAPZ, and CAPOCI each produce 5. Every rendered file is recorded in the cluster_renders.rendered_files audit JSONB.
JSON Schema validation
The accompanying .schema.json declares every input the template accepts, with types, defaults, enums, ranges, and human-readable titles. The render engine validates every input against this schema before rendering. Bad input fails with HTTP 400 and a structured error pointing at the offending field — not a half-rendered YAML blob written to Gitea.
The schema is also what the OutpostAI wizard reads to render its dynamic Step 2 form. The wizard’s form renderer maps schema constructs to Blueprint UI controls:
| JSON Schema construct | Blueprint UI control |
|---|---|
enum (string array) |
HTMLSelect dropdown |
type: integer |
InputGroup numeric with min/max |
type: boolean |
Switch |
type: array |
comma-separated text input |
type: string |
InputGroup text input |
default |
pre-filled value |
title |
field label |
description |
helper text |
listed in required |
(required) label info |
Adding a new hyperscaler requires no UI code changes. The wizard inspects the schema at runtime and renders whatever fields the new template declares.
API Endpoints
Six endpoints on the TrailbossAI API (harbor.vitro.lan/ffp/trailboss:v5.7.3+):
GET /api/v1/cluster-templates[?provider=capo]
Returns the list of active templates, optionally filtered by provider.
{
"templates": [
{"id": 2, "name": "capo-rke2-default", "provider": "capo", "k8s_distro": "rke2", "version": 1},
{"id": 1, "name": "capa-eks-default", "provider": "capa", "k8s_distro": "eks", "version": 1},
{"id": 4, "name": "capz-aks-default", "provider": "capz", "k8s_distro": "aks", "version": 1},
{"id": 3, "name": "capoci-oke-default","provider": "capoci","k8s_distro":"oke", "version": 1}
],
"count": 4
}
GET /api/v1/cluster-templates/{id}
Returns a single template including its full Jinja2 source and JSON Schema.
GET /api/v1/cluster-templates/{id}/schema
Returns only the JSON Schema. This is what the OutpostAI wizard polls when an operator selects a template — it uses the schema to render the form fields.
POST /api/v1/cluster-templates/{id}/preview
Renders a template against operator-supplied values without writing to Gitea or recording an audit row. Used by the wizard’s Review step to show operators exactly what will be committed before they click Create Cluster.
curl -X POST http://localhost:8080/api/v1/cluster-templates/3/preview \
-H "Content-Type: application/json" \
-d '{
"cluster_name": "oracle-test-1",
"namespace": "workloads",
"values": {
"oci_compartment_id": "ocid1.compartment.oc1..aaaa",
"oci_region": "us-gov-ashburn-1",
"oci_image_ocid": "ocid1.image.oc1.iad.aaaa",
"worker_count": 5,
"k8s_version": "v1.34.0"
}
}'
Returns:
{
"template_id": 3,
"cluster_name": "oracle-test-1",
"namespace": "workloads",
"files": {
"clusters/oracle-test-1/cluster.yaml": "...",
"clusters/oracle-test-1/oci-managed-cluster.yaml": "...",
"clusters/oracle-test-1/oci-managed-control-plane.yaml": "...",
"clusters/oracle-test-1/managed-machine-pool.yaml": "...",
"clusters/oracle-test-1/oci-managed-machine-pool.yaml": "..."
},
"file_count": 5
}
POST /api/v1/cluster-templates/{id}/render
Renders and pushes to Gitea. Records an audit row. This is the endpoint OutpostAI’s wizard hits when the operator clicks Create Cluster.
GET /api/v1/clusters/{name}/render?namespace=<ns>
Returns the most recent render audit row for a cluster. Answers “what manifest was generated for this cluster, from which template, with which inputs?” The legacy Trailboss did not have this — debugging a misbehaving cluster meant reverse-engineering Git history. With the audit table, the answer is one query away.
OutpostAI Wizard — Schema-Driven Step 2
The cluster creation wizard in OutpostAI now supports all four hyperscalers. The Provider dropdown in Step 1 lists:
- OpenStack (VitroAI) — CAPO
- AWS EKS — CAPA
- Azure AKS — CAPZ
- Oracle Cloud OKE — CAPOCI
When the operator changes provider, Step 2 (Infrastructure) dynamically refetches the template list for that provider and renders form fields from the chosen template’s JSON Schema. Each form field is generated from the schema’s properties object — title becomes the label, description becomes helper text, default becomes the pre-filled value, enum becomes a dropdown.
Step 3 (Add-ons) is unchanged from the existing add-on selection model with the Monaco values.yaml editor.
Step 4 (Review) shows the chosen template name + version and a JSON dump of the input values that will be sent to the render engine.
When the operator clicks Create Cluster, the wizard calls POST /api/v1/cluster-templates/{id}/render, the engine validates → renders → pushes to Gitea, and ArgoCD picks it up within seconds.
Adding a new hyperscaler does not require any frontend code changes. The wizard auto-discovers templates and renders the right fields from the JSON Schema.
TrailbossAI Chatbot Integration
The chatbot in OutpostAI and Compass exposes two new built-in tools that wrap the cluster template system:
list_cluster_templates(provider?)
Lets the LLM discover what templates are available before attempting to create a cluster. The LLM uses this to learn the template names and which provider they belong to.
create_kubernetes_cluster(cluster_name, template_name, values, namespace?)
Renders and pushes a cluster from natural language. The LLM:
- Parses the operator’s request (e.g. “spin up an EKS cluster in us-gov-east-1 with 3 workers”)
- Calls
list_cluster_templates(provider="capa")to findcapa-eks-default - Constructs the
valuesdict (aws_region: "us-gov-east-1",worker_count: 3) - Calls
create_kubernetes_cluster(cluster_name="my-eks-1", template_name="capa-eks-default", values=...)
The system prompt instructs the LLM:
When asked to create a Kubernetes cluster: list templates first, pick the one matching the user’s intent, supply provider-specific values. CAPO needs image_name + external_network_id; CAPA needs aws_region + worker_instance_type; CAPZ needs azure_region + resource_group + subscription_id; CAPOCI needs oci_compartment_id + oci_region + oci_image_ocid. The values dict is validated against the template’s JSON Schema before render — if validation fails, the error message tells you which field is wrong.
This is the same workflow the wizard uses, but driven by natural language.
CAPI Provider Installation on FMC
For the cluster template engine to actually provision real clusters, the corresponding Cluster API providers must be running on the Fleet Management Cluster. The current FMC has all four:
| Provider | Namespace | Version | Status |
|---|---|---|---|
| CAPI Core | capi-system |
v1.11.0 | Running |
| CAPO (OpenStack) | capo-system |
(existing) | Running |
| CAPA (AWS) | capa-system |
v2.10.2 | Running |
| CAPZ (Azure) | capz-system |
v1.23.0 | Running |
| CAPOCI (Oracle) | cluster-api-provider-oci-system |
v0.24.0 | Installed (scaled to 0 pending real OCI credentials) |
CAPOCI’s controller manager fails to start without a real capoci-auth-config Secret containing valid OCI tenancy/user/fingerprint/key data. Once an operator populates that Secret (via Sealed Secrets, External Secrets Operator, or kubectl patch), scaling the deployment back to 1 brings it online.
CAPA and CAPZ tolerate empty credentials at startup — they only need real credentials when actually reconciling a cluster. Their controllers run idle until a workload cluster manifest references them.
Per-Cluster Credentials
Each provider expects per-cluster identity references:
| Provider | Identity Resource | Required Secret |
|---|---|---|
| CAPO | cloud-config Secret |
OpenStack Application Credentials in INI format |
| CAPA | AWSClusterStaticIdentity |
AWS access key + secret |
| CAPZ | AzureClusterIdentity |
Service Principal client secret |
| CAPOCI | OCIClusterIdentity |
Tenancy OCID + user OCID + API key fingerprint + PEM key |
Stub Secrets and Identity templates ship in deploy/overlays/fmc/capi-providers/ in the federal-frontier-platform Gitea repo. Operators populate them out-of-band — credentials should never be committed to git in plaintext. Sealed Secrets, External Secrets Operator pulling from Vault, or kubectl patch are all valid options.
Versioning and Lifecycle
| Resource | Lifecycle |
|---|---|
| Templates | Versioned in cluster_template_versions. Edits create a new version; old versions are never modified. |
| In-flight clusters | Pin to a specific template_version_id at render time. Template edits do not break running provisioning. |
| Active version | Each template row has a current version number. New cluster creates use the active version. |
| Render audit | Retained for 365 days minimum. Older rows can be archived to object storage if needed. ~50 KB per render. |
Operator Workflow Examples
Provision a Vitro RKE2 cluster (default OpenStack template)
- Open OutpostAI → Clusters → Create Cluster
- Step 1: Name
satellite-1, namespaceworkloads, Provider OpenStack (VitroAI) — CAPO - Step 2: Wizard fetches
capo-rke2-default, renders form fields:- Control Plane Nodes: 1 (default)
- Worker Nodes: 3
- Glance Image Name:
rke2-node-v1.31-20260407 - External Network ID:
<your-vitro-network-uuid> - Kubernetes Version:
v1.31.4+rke2r1(default)
- Step 3: Add-ons — defaults to Canal CNI + Cinder CSI
- Step 4: Review the rendered manifest preview, click Create Cluster
- ArgoCD picks up the manifests within seconds, CAPO provisions the cluster
Provision an AWS EKS cluster via the chatbot
Operator: spin up an EKS cluster in us-gov-east-1 with 3 t3.large workers
LLM: [calls list_cluster_templates(provider="capa")]
LLM: [calls create_kubernetes_cluster(
cluster_name="govcloud-eks-1",
template_name="capa-eks-default",
values={
"aws_region": "us-gov-east-1",
"worker_count": 3,
"worker_instance_type": "t3.large",
"k8s_version": "v1.31"
}
)]
LLM: Cluster 'govcloud-eks-1' provisioning initiated via template
'capa-eks-default'. 5 manifests pushed to Gitea. ArgoCD will
sync them to the FMC; CAPI provider 'capa' will provision the cluster.
Inspect what was generated for an existing cluster
curl http://trailboss.f3iai.svc.cluster.local:8080/api/v1/clusters/satellite-1/render?namespace=workloads
Returns the audit row with the exact template version, input values, rendered files, and git commit SHA.
Adding a New Hyperscaler
No code changes required. Three steps:
-
Author a Jinja2 template at
common/tools/cluster_templates_seed/<provider>-<distro>-default.j2. Use# FILE:markers to split into multi-doc output. Reference upstream CAPI provider documentation for the manifest shapes. -
Author the JSON Schema at
<name>.schema.json. Declare every input the template accepts, with types, defaults, enums, ranges, titles, and descriptions. -
Author the metadata at
<name>.meta.json:{ "provider": "capv", "k8s_distro": "kubeadm", "description": "Default vSphere CAPV template" }
On the next TrailbossAI startup, the seed loader inserts the new template into Postgres. The OutpostAI wizard’s Provider dropdown automatically picks it up. The chatbot’s list_cluster_templates tool also picks it up. Zero code changes, zero rebuild, zero redeploy.
The corresponding CAPI provider controller does need to be installed on the FMC. The deploy/overlays/fmc/capi-providers/ overlay in the federal-frontier-platform repo handles CAPA, CAPZ, and CAPOCI; new providers follow the same pattern.
Where Things Live
| Item | Location |
|---|---|
| Render engine | common/tools/cluster_templates.py (federal-frontier-f3iai) |
| Gitea writer | common/tools/gitops_writer.py |
| Seed templates | common/tools/cluster_templates_seed/ |
| API endpoints | common/api/trailboss_api.py |
| Chatbot tools | common/api/unified_chat.py |
| Wizard | frontend/components/ClusterCreateDialog.tsx (outpostai-dev) |
| API client | frontend/lib/api.ts |
| CAPI provider install | deploy/overlays/fmc/capi-providers/ (federal-frontier-platform) |
| ADR | docs/adr/ADR-008-cluster-template-system.md (federal-frontier-f3iai) |
| ADR canonical | Confluence FED ADR-008 |
Related Documentation
- Storage Architecture (Cinder + Ceph) — How Cinder CSI is the default for Vitro clusters
- Cluster Bootstrap Flow — End-to-end flow from Create Cluster click to first PVC mounting
- Packer Image Build Process — How the RKE2 node images consumed by CAPO are built
- OutpostAI Mission Control — The operator UI that wraps the cluster template system
- Unified Chat — The chatbot module that exposes the cluster template tools