Remediation Policy
How the SRE dispatch agent selects remediation methods — gitops, direct, itsm, or recommend-only — based on resource type, namespace, and customer operational model.
Remediation Policy
The sre-remediation-policy ConfigMap governs how the SRE dispatch agent generates remediation recommendations. The agent reads this policy before generating any recommendation and selects the appropriate method based on resource type, namespace, and customer environment. This is an installation-time configuration — the agent code does not change between customers, only this ConfigMap changes.
Remediation Methods
| Method | Description |
|---|---|
| gitops | Generate a Kustomize patch targeting the Gitea GitOps repo. ArgoCD syncs the change. Use for all ArgoCD-managed resources. |
| direct | Generate a kubectl command to execute directly against the cluster. Use for runtime objects not managed by GitOps — PVCs, Pods. |
| itsm | Generate a change request description for the configured ITSM provider (ServiceNow, BMC Helix). No cluster action until the CR is approved. |
| recommend | Generate human-readable instructions only. No executable commands. Use for secrets, nodes, system namespaces, and operator-managed resources. |
Method Selection Priority
The agent selects a method in this order, stopping at the first match:
- Resource-specific policy (resource_policies) — matches on Kubernetes resource kind
- Namespace policy (namespace_policies) — matches on the resource’s namespace
- ArgoCD ownership check — if argocd_aware is true and the resource is in an ArgoCD Application’s managed resources, use gitops
- Global default — default_method (typically gitops)
Resource Policies
| Resource Kind | Method | Notes |
|---|---|---|
| PersistentVolumeClaim | direct | Ceph RBD supports online resize. No pod restart required. |
| Deployment | gitops | All Deployments are ArgoCD-managed. Never kubectl apply directly. |
| StatefulSet | gitops | spec.selector is immutable. Expand PVCs directly instead of modifying VolumeClaimTemplates. |
| ConfigMap | gitops | Most ConfigMaps are ArgoCD-managed. |
| Secret | recommend | Never modify Secrets via automated remediation. |
| Pod | direct | Pod restarts are safe — the owning controller recreates them. |
| Node | recommend | Node-level remediation requires operator judgment. |
| CronJob | gitops | ArgoCD-managed. |
Namespace Policies
| Namespace | Default Method | Notes |
|---|---|---|
| f3iai | gitops | Exception: PVCs use direct |
| monitoring | recommend | Helm-managed. Recommend value changes only. |
| kube-system | recommend | RKE2-managed. Do not modify. |
| argocd | recommend | Self-managing. Recommend only. |
| keycloak | gitops | Target Keycloak CR in GitOps, not operator-managed children. |
| rook-ceph | recommend | Rook operator-managed. Do not modify CRs directly. |
| camel-k | recommend | Helm-managed. Recommend value changes only. |
Approval Matrix
| Risk Level | Auto-Execute | Requires Approval | Notes |
|---|---|---|---|
| low | Yes | No | Agent executes immediately and writes outcome to FFO |
| medium | No | Yes | Recommendation surfaced in OutpostAI Pending Approval |
| high | No | Yes | Recommendation surfaced in OutpostAI Pending Approval |
| critical | No | Yes (elevated) | Operator notification fired on dispatch |
Recommendation Output Format
Every recommendation the agent generates must include:
| Field | Description |
|---|---|
| action | Human-readable summary of what to do |
| method | gitops, direct, itsm, or recommend |
| commands | Exact kubectl command, Kustomize patch content, or ITSM change description |
| rollback | How to undo the change if it causes problems |
| impact | What resources and services will be affected |
| risk_of_fix | low, medium, or high |
| requires_restart | true or false |
| estimated_downtime | none, seconds, or minutes |
Customer Adaptation
At installation time, the integrator configures the policy for the customer’s operational model:
GitOps-first customer — default_method: gitops
ITSM-gated customer — default_method: gitops with itsm.enabled: true, provider: servicenow, endpoint: the customer’s ServiceNow URL, and change_required_for listing Deployment, StatefulSet, DaemonSet.
Audit-only customer — default_method: recommend
The agent code does not change. The ConfigMap is the only variable.
ITSM Integration
When itsm.enabled is true, the agent generates a change request description instead of a kubectl command or GitOps patch. The description includes all recommendation fields formatted for the configured ITSM provider. The agent does not create the change request automatically — it generates the content for the operator to submit, preserving the customer’s existing change management process.
Full ConfigMap Reference
The complete sre-remediation-policy ConfigMap as deployed on the FMC. This is what gets mounted into the sre-dispatch pod at /etc/sre/remediation-policy/policy.yaml and injected into the agent prompt context for every session.
apiVersion: v1
kind: ConfigMap
metadata:
name: sre-remediation-policy
namespace: f3iai
labels:
app.kubernetes.io/part-of: federal-frontier-platform
app.kubernetes.io/component: sre-governance
data:
policy.yaml: |
version: "1.0"
default_method: gitops
default_requires_approval: true
argocd_aware: true
gitops:
provider: gitea
base_url: "http://gitea.vitro.lan:30300"
default_repo: "admin/federal-frontier-platform"
default_branch: "main"
commit_message_template: "sre-remediation: {session_id} — {title}"
itsm:
enabled: false
resource_policies:
- kind: PersistentVolumeClaim
method: direct
requires_approval: true
notes: >
Ceph RBD supports online expansion. Use kubectl patch to increase
spec.resources.requests.storage. No pod restart required.
- kind: Deployment
method: gitops
requires_approval: true
notes: >
All Deployments are managed by ArgoCD. Never kubectl apply directly.
- kind: StatefulSet
method: gitops
requires_approval: true
- kind: ConfigMap
method: gitops
requires_approval: true
- kind: Secret
method: recommend
requires_approval: true
- kind: CronJob
method: gitops
requires_approval: true
- kind: Pod
method: direct
requires_approval: true
- kind: Node
method: recommend
requires_approval: true
- kind: DaemonSet
method: gitops
requires_approval: true
- kind: Service
method: gitops
requires_approval: true
- kind: IngressRoute
method: gitops
requires_approval: true
namespace_policies:
- namespace: f3iai
default_method: gitops
exceptions:
- kind: PersistentVolumeClaim
method: direct
- namespace: monitoring
default_method: recommend
- namespace: kube-system
default_method: recommend
- namespace: argocd
default_method: recommend
- namespace: keycloak
default_method: gitops
- namespace: rook-ceph
default_method: recommend
- namespace: camel-k
default_method: recommend
- namespace: cert-manager
default_method: recommend
- namespace: linkerd
default_method: recommend
approval_matrix:
low:
auto_execute: true
requires_approval: false
log_to_ffo: true
medium:
auto_execute: false
requires_approval: true
approval_type: standard
log_to_ffo: true
high:
auto_execute: false
requires_approval: true
approval_type: standard
log_to_ffo: true
critical:
auto_execute: false
requires_approval: true
approval_type: elevated
notify_on_dispatch: true
log_to_ffo: true
recommendation_format:
required_fields:
- action
- method
- commands
- rollback
- impact
- risk_of_fix
- requires_restart
- estimated_downtime