Remediation Policy

How the SRE dispatch agent selects remediation methods — gitops, direct, itsm, or recommend-only — based on resource type, namespace, and customer operational model.

Remediation Policy

The sre-remediation-policy ConfigMap governs how the SRE dispatch agent generates remediation recommendations. The agent reads this policy before generating any recommendation and selects the appropriate method based on resource type, namespace, and customer environment. This is an installation-time configuration — the agent code does not change between customers, only this ConfigMap changes.

Remediation Methods

Method Description
gitops Generate a Kustomize patch targeting the Gitea GitOps repo. ArgoCD syncs the change. Use for all ArgoCD-managed resources.
direct Generate a kubectl command to execute directly against the cluster. Use for runtime objects not managed by GitOps — PVCs, Pods.
itsm Generate a change request description for the configured ITSM provider (ServiceNow, BMC Helix). No cluster action until the CR is approved.
recommend Generate human-readable instructions only. No executable commands. Use for secrets, nodes, system namespaces, and operator-managed resources.

Method Selection Priority

The agent selects a method in this order, stopping at the first match:

  1. Resource-specific policy (resource_policies) — matches on Kubernetes resource kind
  2. Namespace policy (namespace_policies) — matches on the resource’s namespace
  3. ArgoCD ownership check — if argocd_aware is true and the resource is in an ArgoCD Application’s managed resources, use gitops
  4. Global default — default_method (typically gitops)

Resource Policies

Resource Kind Method Notes
PersistentVolumeClaim direct Ceph RBD supports online resize. No pod restart required.
Deployment gitops All Deployments are ArgoCD-managed. Never kubectl apply directly.
StatefulSet gitops spec.selector is immutable. Expand PVCs directly instead of modifying VolumeClaimTemplates.
ConfigMap gitops Most ConfigMaps are ArgoCD-managed.
Secret recommend Never modify Secrets via automated remediation.
Pod direct Pod restarts are safe — the owning controller recreates them.
Node recommend Node-level remediation requires operator judgment.
CronJob gitops ArgoCD-managed.

Namespace Policies

Namespace Default Method Notes
f3iai gitops Exception: PVCs use direct
monitoring recommend Helm-managed. Recommend value changes only.
kube-system recommend RKE2-managed. Do not modify.
argocd recommend Self-managing. Recommend only.
keycloak gitops Target Keycloak CR in GitOps, not operator-managed children.
rook-ceph recommend Rook operator-managed. Do not modify CRs directly.
camel-k recommend Helm-managed. Recommend value changes only.

Approval Matrix

Risk Level Auto-Execute Requires Approval Notes
low Yes No Agent executes immediately and writes outcome to FFO
medium No Yes Recommendation surfaced in OutpostAI Pending Approval
high No Yes Recommendation surfaced in OutpostAI Pending Approval
critical No Yes (elevated) Operator notification fired on dispatch

Recommendation Output Format

Every recommendation the agent generates must include:

Field Description
action Human-readable summary of what to do
method gitops, direct, itsm, or recommend
commands Exact kubectl command, Kustomize patch content, or ITSM change description
rollback How to undo the change if it causes problems
impact What resources and services will be affected
risk_of_fix low, medium, or high
requires_restart true or false
estimated_downtime none, seconds, or minutes

Customer Adaptation

At installation time, the integrator configures the policy for the customer’s operational model:

GitOps-first customerdefault_method: gitops

ITSM-gated customerdefault_method: gitops with itsm.enabled: true, provider: servicenow, endpoint: the customer’s ServiceNow URL, and change_required_for listing Deployment, StatefulSet, DaemonSet.

Audit-only customerdefault_method: recommend

The agent code does not change. The ConfigMap is the only variable.

ITSM Integration

When itsm.enabled is true, the agent generates a change request description instead of a kubectl command or GitOps patch. The description includes all recommendation fields formatted for the configured ITSM provider. The agent does not create the change request automatically — it generates the content for the operator to submit, preserving the customer’s existing change management process.

Full ConfigMap Reference

The complete sre-remediation-policy ConfigMap as deployed on the FMC. This is what gets mounted into the sre-dispatch pod at /etc/sre/remediation-policy/policy.yaml and injected into the agent prompt context for every session.

apiVersion: v1
kind: ConfigMap
metadata:
  name: sre-remediation-policy
  namespace: f3iai
  labels:
    app.kubernetes.io/part-of: federal-frontier-platform
    app.kubernetes.io/component: sre-governance
data:
  policy.yaml: |
    version: "1.0"

    default_method: gitops
    default_requires_approval: true
    argocd_aware: true

    gitops:
      provider: gitea
      base_url: "http://gitea.vitro.lan:30300"
      default_repo: "admin/federal-frontier-platform"
      default_branch: "main"
      commit_message_template: "sre-remediation: {session_id} — {title}"

    itsm:
      enabled: false

    resource_policies:

      - kind: PersistentVolumeClaim
        method: direct
        requires_approval: true
        notes: >
          Ceph RBD supports online expansion. Use kubectl patch to increase
          spec.resources.requests.storage. No pod restart required.

      - kind: Deployment
        method: gitops
        requires_approval: true
        notes: >
          All Deployments are managed by ArgoCD. Never kubectl apply directly.

      - kind: StatefulSet
        method: gitops
        requires_approval: true

      - kind: ConfigMap
        method: gitops
        requires_approval: true

      - kind: Secret
        method: recommend
        requires_approval: true

      - kind: CronJob
        method: gitops
        requires_approval: true

      - kind: Pod
        method: direct
        requires_approval: true

      - kind: Node
        method: recommend
        requires_approval: true

      - kind: DaemonSet
        method: gitops
        requires_approval: true

      - kind: Service
        method: gitops
        requires_approval: true

      - kind: IngressRoute
        method: gitops
        requires_approval: true

    namespace_policies:

      - namespace: f3iai
        default_method: gitops
        exceptions:
          - kind: PersistentVolumeClaim
            method: direct

      - namespace: monitoring
        default_method: recommend

      - namespace: kube-system
        default_method: recommend

      - namespace: argocd
        default_method: recommend

      - namespace: keycloak
        default_method: gitops

      - namespace: rook-ceph
        default_method: recommend

      - namespace: camel-k
        default_method: recommend

      - namespace: cert-manager
        default_method: recommend

      - namespace: linkerd
        default_method: recommend

    approval_matrix:
      low:
        auto_execute: true
        requires_approval: false
        log_to_ffo: true
      medium:
        auto_execute: false
        requires_approval: true
        approval_type: standard
        log_to_ffo: true
      high:
        auto_execute: false
        requires_approval: true
        approval_type: standard
        log_to_ffo: true
      critical:
        auto_execute: false
        requires_approval: true
        approval_type: elevated
        notify_on_dispatch: true
        log_to_ffo: true

    recommendation_format:
      required_fields:
        - action
        - method
        - commands
        - rollback
        - impact
        - risk_of_fix
        - requires_restart
        - estimated_downtime