Storage Architecture — Cinder, Ceph, and Workload Cluster PVCs

How CAPO-provisioned Kubernetes workload clusters get persistent storage in the Vitro HCI environment, why Cinder is the preferred control plane, and when to use direct Ceph CSI.

This page explains how persistent storage works for Kubernetes workload clusters running on top of Vitro’s Kolla OpenStack HCI deployment, and why Cinder CSI is the preferred storage path rather than direct Ceph CSI.

TL;DR

When OutpostAI provisions a new Kubernetes cluster via CAPO, the default CSI driver is cinder.csi.openstack.org (Cinder CSI), not ceph-csi. This is intentional. Cinder is the storage controller, Ceph RBD is the storage backend, and going around Cinder breaks tenant isolation, quota enforcement, and the OpenStack control plane.

Architecture: Cinder is the Controller, Ceph is the Backend

In a Kolla HCI deployment, Ceph is not acting as a central storage controller. Cinder is. The relationship is:

Tenant / Workload Cluster
        │
        │  OpenStack API (Keystone-authenticated)
        ▼
   ┌──────────┐
   │  Cinder  │   ← Storage controller: API, quotas, lifecycle, snapshots
   │   API    │
   └────┬─────┘
        │  rbd driver
        ▼
   ┌──────────┐
   │   Ceph   │   ← Block backend: replicated storage, no policy
   │   RBD    │
   └──────────┘

Cinder owns:

  • Authentication and authorization — every volume operation passes through Keystone with project-scoped credentials
  • Quotas — per-project volume count, total size, snapshot count
  • Lifecycle — create, attach, detach, snapshot, restore, clone, delete
  • Attach orchestration — talks to Nova to bind RBD volumes to compute instances
  • Multi-backend abstraction — could swap RBD for LVM/NetApp/Pure without workload changes

Ceph RBD owns:

  • Replicated block storage — fast, durable, deduplicated blocks
  • Nothing else — no quotas, no users (other than the Cinder service account), no policy

This separation is what makes the HCI architecture clean. Cinder is the gatekeeper. Ceph is the dumb, fast block layer behind it.

How a Workload Cluster Mounts a PVC

When a pod in a CAPO-provisioned workload cluster requests a PVC, the call chain looks like this:

  1. Pod creates a PersistentVolumeClaim referencing the cinder-rbd StorageClass
  2. cinder-csi-controller (running in the workload cluster’s kube-system) sees the new PVC
  3. CSI controller calls Cinder API at https://cinder.vitro.lan:8776/v3/<project_id>/volumes using credentials from the cloud-config Secret (Application Credentials, scoped to the cluster’s project)
  4. Cinder authorizes the request against Keystone, checks the project quota, and calls its rbd driver
  5. Ceph RBD creates an image in the project’s pool (e.g., volumes)
  6. Cinder returns the volume ID to the CSI controller
  7. CSI node plugin on the worker VM where the pod is scheduled requests Cinder to attach the volume
  8. Cinder calls Nova to attach the RBD volume to the worker VM via the OpenStack compute API
  9. Nova attaches the RBD device to the VM (appears as /dev/vdb or similar)
  10. kubelet bind-mounts the block device into the pod’s filesystem

The pod sees a mounted volume. Everything above kubelet was orchestrated by Cinder, not the workload cluster talking directly to Ceph.

Why Not Direct Ceph CSI?

The ceph-csi-rbd driver lets a Kubernetes cluster talk RBD directly to Ceph monitors, skipping Cinder entirely. In a bare-metal Kubernetes cluster running outside OpenStack, this is the correct pattern. In an HCI environment where OpenStack is the tenancy model, it’s wrong. Direct Ceph CSI gives you:

  • No tenant isolation. Every workload cluster gets the same client.kubernetes CephX credentials, all writing to the same shared pool. There is no per-project boundary at the storage layer.
  • No Keystone authentication. A compromised cluster can issue arbitrary RBD operations to the entire Ceph cluster. There is no audit trail tied to OpenStack identity.
  • No quotas. A runaway workload can fill the Ceph cluster — there’s no Cinder quota in the path to stop it.
  • Network reachability problems. Workload cluster VMs live on tenant networks (e.g., k8s-network). They have no automatic route to the Ceph monitor IPs (typically 192.168.1.241:6789 etc. on the storage VLAN). Making this work requires either putting every tenant on the storage VLAN (a security boundary collapse) or routing every tenant out through an external gateway with a static route to storage (operationally fragile).
  • No volume lifecycle integration with Nova. Direct Ceph CSI doesn’t go through Cinder, so VolumeSnapshots, clones, and backups bypass the OpenStack control plane entirely.
  • Two CSI drivers fighting over the same Ceph pool. Cinder and ceph-csi both want to manage RBD images. This leads to orphaned images, naming collisions, and broken accounting.

The only legitimate use case for direct ceph-csi in this environment is a bare-metal edge cluster that is not running on OpenStack at all — for example, a tactical edge node sitting next to a Ceph cluster on a small footprint deployment. In that case there is no Cinder, so direct Ceph CSI is the only option.

Cluster Bootstrap: Application Credentials and the cloud-config Secret

For Cinder CSI to work, the workload cluster needs OpenStack credentials. CAPO handles this automatically when the operator provisions a cluster through OutpostAI:

  1. OutpostAI receives the cluster create request and forwards it to Trailboss
  2. Trailboss calls the CAPO tools, which create an OpenStack project for the cluster (or use a designated project)
  3. CAPO creates an Application Credential in Keystone scoped to that project — Application Credentials are revocable, password-less, and tied to a single project
  4. CAPO writes a cloud-config Secret into the workload cluster’s kube-system namespace containing:

    [Global]
    auth-url = https://keystone.vitro.lan/v3
    application-credential-id = <id>
    application-credential-secret = <secret>
    region = RegionOne
    
    [BlockStorage]
    bs-version = v3
    ignore-volume-az = true
    
  5. The Helm chart for openstack-cinder-csi is deployed via ArgoCD with secret.create: false, secret.name: cloud-config, so the CSI driver mounts the Secret created by CAPO

When the cluster is deleted, CAPO revokes the Application Credential. No long-lived passwords, no shared service accounts.

Default StorageClass

The Cinder CSI Helm values that OutpostAI generates set a default StorageClass:

storageClass:
  enabled: true
  delete:
    isDefault: true
    allowVolumeExpansion: true
    name: cinder-rbd

Any PVC in the workload cluster that doesn’t explicitly specify a storageClassName lands on cinder-rbd automatically. Volume expansion is enabled by default.

When You’d Override Defaults

The OutpostAI cluster creation wizard exposes the templated values.yaml for every selected add-on via the Customize values.yaml gear button next to each add-on in the drop zone. Common reasons to edit Cinder CSI values:

  • Multiple StorageClasses — add a second StorageClass for a different volume type (e.g., a slower archive pool)
  • Custom topology — restrict volumes to specific availability zones
  • Different region — point at a non-default Keystone region
  • Increased controller replicas — for high-availability multi-node control planes

The wizard pre-fills the editor with the templated defaults (project-scoped, three controller replicas if you have three or more workers, default StorageClass enabled). Edit only what you need to change.

Add-on Catalog Summary

The OutpostAI add-on catalog now reflects the architectural choice:

Add-on Purpose When to Use
Cinder CSI (OpenStack) PVCs via Cinder, backed by Ceph RBD Default for all OpenStack/HCI clusters. Selected automatically.
Ceph CSI (direct) Direct RBD to Ceph monitors, bypassing Cinder Bare-metal/edge clusters with no OpenStack. Advanced use only.
Longhorn Distributed block storage on local node disks Air-gapped or non-Ceph deployments where Longhorn is the storage.

If the wizard provisions an OpenStack cluster and you don’t change the defaults, you get Canal CNI + Cinder CSI, which is the correct baseline for the Vitro HCI environment.