Unified Chat — Bedrock + MCP Tools + Claude Code Dispatch

The unified chat module powers the AI Assistant in OutpostAI and Compass. It uses AWS Bedrock (Claude) for inference, 13 MCP servers (150+ tools) for agentic tool calling, and Claude Code dispatch for autonomous code generation and deployment.

Unified Chat — Bedrock + MCP Tools + Claude Code Dispatch

The unified chat module (unified_chat.py) is the backend that powers the Federal Frontier AI Assistant — the conversational interface available in both OutpostAI and Compass. It replaced two divergent chat implementations (one vLLM-only, one Bedrock-only) with a single agentic backend that provides:

AWS Bedrock inference (Claude Sonnet 4.6 / Opus 4.6) with Ollama fallback for edge
150+ MCP tools from 13 servers, loaded via JSON-RPC
Agentic loop — up to 10 iterations of LLM reasoning and tool calling per request
Claude Code dispatch — spawn Kubernetes Jobs that write code, create PRs, and deploy via GitOps
Runtime model switching — operators select Sonnet, Opus, or Ollama from the UI dropdown

Why It Exists

Before April 2026, OutpostAI and Compass each had their own chat backend:

	OutpostAI (Trailboss API)	Compass API
LLM	vLLM (Qwen 7B) — hardcoded	AWS Bedrock (Claude Sonnet/Opus)
Tools	45 LangGraph tools (hardcoded Python functions)	150+ MCP tools via JSON-RPC
Routing	Keyword matching + InfrastructureAgent	3-layer hybrid (intent → posse → agentic)
Model switching	Not supported (UI dropdown was ignored)	Supported
Code generation	Not supported	Not supported

The OutpostAI chatbot was broken — vLLM was unavailable, causing every request to return “Connection error.” The model selector dropdown showed “AWS Bedrock — Claude Opus 4.6” but the backend ignored it. The two implementations were diverging with no shared code.

The unified chat module replaces both with a single implementation that both frontends use.

Architecture

graph TD User[Operator] --> FE[OutpostAI / Compass
ChatBot.vue] FE -->|POST /api/v1/chat| API[Trailboss API
FastAPI] API --> UC[Unified Chat Module
unified_chat.py] UC --> LLM{LLM Provider} LLM -->|bedrock-sonnet| BR[AWS Bedrock
Claude Sonnet 4.6] LLM -->|bedrock-opus| BRO[AWS Bedrock
Claude Opus 4.6] LLM -->|dev| OL[Ollama
Qwen 35B] UC -->|tool calls| MCP[13 MCP Servers
150+ tools via JSON-RPC] UC -->|claude_code_dispatch| Job[Claude Code
K8s Job] Job -->|code changes| Gitea[Gitea MR] Gitea -->|merge| Argo[ArgoCD Sync] style User fill:#2b6cb0,stroke:#4299e1,color:#fff style FE fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style API fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style UC fill:#2c7a7b,stroke:#38b2ac,color:#e2e8f0 style LLM fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style BR fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style BRO fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style OL fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style MCP fill:#1a365d,stroke:#4299e1,color:#e2e8f0 style Job fill:#2b6cb0,stroke:#4299e1,color:#fff style Gitea fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style Argo fill:#2d3748,stroke:#4299e1,color:#e2e8f0

Request Flow

Operator types a message in the ChatBot.vue component (shared by OutpostAI and Compass)
Frontend sends POST /api/v1/chat with {message, image, context, conversation_history}
Trailboss API receives the request and delegates to unified_chat.chat()
Unified chat builds the conversation with a system prompt, loads MCP tools, and enters the agentic loop
The LLM (Bedrock or Ollama) reasons about which tools to call
Tool calls are executed against MCP servers via JSON-RPC
Results are fed back to the LLM for the next iteration
After up to 10 iterations, the final response is returned to the frontend

Agentic Loop

The core of the unified chat is a multi-iteration agentic loop. The LLM does not simply answer questions — it reasons about what tools to call, executes them, interprets results, and decides whether to call more tools or respond.

Iteration 1: User asks "What's the Ceph health?"
  → LLM decides to call ceph_get_cluster_health
  → Tool returns: {health: "HEALTH_WARN", checks: [...]}
  → LLM decides more data needed, calls ceph_get_storage_summary

Iteration 2:
  → Tool returns: {total: "24TB", used: "18TB", pools: [...]}
  → LLM has enough data, generates final response with formatted table

Total: 2 iterations, 2 tool calls, 1 response

Configuration

Parameter	Bedrock	Ollama
Max iterations	10	5
Max tools per iteration	10	5
Temperature	0.1	0.1
Max tokens	4096	2048

Deduplication

The agentic loop tracks every tool call by (tool_name, arguments) hash. If the LLM attempts to call the same tool with the same arguments twice, the cached result is returned instead of re-executing. This prevents infinite loops and redundant API calls.

MCP Tool Loading

On the first chat request, the unified chat module loads tool definitions from all 13 MCP servers via JSON-RPC tools/list calls. Results are cached for subsequent requests.

Server                    → Endpoint (JSON-RPC)                                    → Tools
ffo-mcp-server            → http://ffo-mcp-server.f3iai:50060/jsonrpc              → 7
keycloak-mcp-server       → http://keycloak-mcp-server.f3iai:50057/jsonrpc         → 12
openstack-mcp-server      → http://openstack-mcp-server.f3iai:8080/jsonrpc         → 31
ceph-mcp-server           → http://ceph-mcp-server.f3iai:50054/jsonrpc             → 12
atlassian-mcp-server      → http://atlassian-mcp-server.f3iai:50052/jsonrpc        → 23
grafana-mcp-server        → http://grafana-mcp-server.f3iai:50056/jsonrpc          → 28
argocd-mcp-server         → http://argocd-mcp-server.f3iai:50051/jsonrpc           → 11
gitea-mcp-server          → http://gitea-mcp-server.f3iai:50055/jsonrpc            → 22
kolla-mcp-server          → http://kolla-mcp-server.f3iai:50061/jsonrpc            → 10
k8s-mcp-server            → http://k8s-mcp-server.f3iai:50062/jsonrpc              → 20
web-mcp-server            → http://web-mcp-server.f3iai:50063/jsonrpc              → 4
federal-compliance-mcp    → http://federal-compliance-mcp-server.f3iai:50064/jsonrpc → 3
trailboss-mcp-server      → http://trailboss-mcp-server.f3iai:50059/jsonrpc        → 1

Tool names are prefixed with the server name to avoid collisions (e.g., keycloak_list_users, openstack_list_vms). The MCP tool format is converted to OpenAI function-calling format, then to Bedrock toolSpec format for the converse API.

Each MCP server URL is configurable via environment variables (e.g., KEYCLOAK_MCP_URL, CEPH_MCP_URL) to support both in-cluster service DNS and local development with port-forwarding.

Tool Categories

Infrastructure Query (Read)

Category	Example Tools	What They Do
Compute	`openstack_list_vms`, `openstack_get_vm`	List and inspect VMs, flavors, images
Kubernetes	`k8s_list_nodes`, `k8s_list_pods`, `k8s_list_clusters`	Nodes, pods, CAPI clusters and machines
Storage	`ceph_get_cluster_health`, `ceph_list_pools`	Ceph health, OSDs, pools, capacity
Networking	`openstack_list_networks`, `openstack_list_security_groups`	Networks, subnets, routers, floating IPs
Identity	`keycloak_list_users`, `keycloak_get_user`	Users, roles, groups, sessions, realms
Monitoring	`grafana_query_prometheus`, `grafana_get_firing_alerts`	PromQL, LogQL, dashboards, alert rules
GitOps	`argocd_list_applications`, `gitea_list_repos`	ArgoCD apps, Gitea repos, branches, commits
Project Mgmt	`atlassian_jira_search`, `atlassian_confluence_get_page`	Jira issues, Confluence pages
Containers	`kolla_list_containers`, `kolla_container_health`	Kolla OpenStack containers across hypervisors
Knowledge Graph	`ffo_query`, `ffo_entity_get`	TypeQL queries against the FFO digital twin

Infrastructure Action (Write)

Category	Example Tools	What They Do
Compute	`openstack_server_create`, `openstack_server_reboot`	Create, delete, stop, start, rebuild VMs
Kubernetes	`k8s_cordon_node`, `k8s_drain_node`, `k8s_restart_deployment`	Node maintenance, pod deletion, rollouts
CAPI	`k8s_scale_machine_deployment`, `k8s_delete_machine`	Scale worker pools, replace machines
Monitoring	`grafana_create_alert_rule`, `grafana_create_dashboard`	Create alerts, dashboards, annotations
GitOps	`argocd_sync_application`, `argocd_rollback_application`	Trigger syncs, rollback deployments
Project Mgmt	`atlassian_jira_create_issue`, `atlassian_confluence_create_page`	Create issues, update docs
Containers	`kolla_restart_container`, `kolla_exec`	Restart services, run diagnostics
Identity	`keycloak_set_user_enabled`	Enable/disable users

Claude Code Dispatch (Code + Deploy)

The claude_code_dispatch tool is unique — it spawns a Claude Code agent as a Kubernetes Job that can write code, create merge requests, and deploy changes through the GitOps pipeline.

Claude Code Dispatch

When an operator asks the chatbot to write code, fix a bug, or deploy a change, the LLM calls claude_code_dispatch with:

Parameter	Required	Description
`task`	Yes	What the agent should do — be specific about repo, files, expected outcome
`repo`	No	Gitea repository in `owner/repo` format (e.g., `admin/federal-frontier-platform`)
`severity`	No	Risk level: `low` (auto-merge), `medium` (MR for review), `high` (investigate only)
`branch`	No	Target branch (defaults to `main`)

Severity Levels

Level	Agent Behavior	Human Review
LOW	Makes changes, commits, pushes directly	Post-mortem only
MEDIUM	Creates a new branch, makes changes, opens a Gitea merge request	Operator reviews and merges the MR
HIGH	Investigates only, documents findings	Operator reads findings, decides next steps

How It Works

The unified chat module receives claude_code_dispatch from the LLM’s tool call
A unique session ID is generated (cc-<12-hex-chars>)
A Kubernetes Job manifest is built with the task prompt, repo, severity, and branch
The Job is created in the f3iai namespace via the K8s MCP server
The Claude Code agent runs with access to the full MCP tool surface
For MEDIUM severity: agent creates a branch, commits changes, opens a Gitea MR
The MR flows through the normal GitOps pipeline — human reviews, merges, ArgoCD syncs

Example Interaction

Operator: "Fix the CORS header bug in the openstack-mcp-server and open a PR"

LLM calls: claude_code_dispatch({
  task: "Fix CORS header handling in the OpenStack MCP server. The server
        returns 'Access-Control-Allow-Origin: *' for preflight requests but
        not for actual requests. Fix the middleware in src/main.py.",
  repo: "admin/federal-frontier-platform",
  severity: "medium"
})

Response: "Claude Code agent dispatched as K8s Job 'claude-code-cc-a3f8b1c2d4e5'.
The agent will:
1. Clone the repo
2. Fix the CORS middleware in openstack-mcp-server/src/main.py
3. Create a Gitea merge request for your review

Track progress in the OutpostAI Dispatches panel."

LLM Configuration

Runtime Model Switching

The chatbot UI includes a model selector dropdown. When the operator changes it, the frontend calls:

PUT /api/v1/llm/config?preset=bedrock-opus

This switches the active LLM at runtime — no restart required. Available presets:

Preset	Provider	Model	Use Case
`bedrock-sonnet`	AWS Bedrock	`us.anthropic.claude-sonnet-4-6`	Default — fast, capable, cost-effective
`bedrock-opus`	AWS Bedrock	`us.anthropic.claude-opus-4-6-v1`	Complex reasoning, large tool sets
`dev`	Ollama	`qwen3.5:35b-a3b-q4_K_M`	Edge / air-gapped / development

Bedrock Integration

The unified chat uses the AWS Bedrock converse API (not invoke_model). This provides:

Native tool calling — Bedrock handles tool use/result cycling natively
Cross-region inference profiles — model IDs prefixed with us. for cross-region routing
Automatic message formatting — OpenAI-format messages are converted to Bedrock’s alternating user/assistant format with toolUse and toolResult blocks

Credentials are provided via the aws-bedrock-credentials Kubernetes secret:

env:
  - name: AWS_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: aws-bedrock-credentials
        key: aws-access-key-id
  - name: AWS_SECRET_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: aws-bedrock-credentials
        key: aws-secret-access-key

Sovereign Inference

For IL6 and tactical edge deployments where AWS Bedrock is unavailable, the dev preset routes to a local Ollama instance running Qwen 35B. The agentic loop works the same way — only the LLM endpoint changes. See Sovereign Inference for details on IL2-IL6 inference backends.

System Prompt

The unified chat includes a system prompt that instructs the LLM to:

Always use tools — never suggest CLI commands or code blocks for the user to run
Select the right tool — a categorized guide maps query types to specific tools
Use Claude Code dispatch for code/deploy tasks — don’t try to generate code inline
Confirm destructive operations — delete, restart, drain require operator confirmation
Use real data — never fabricate infrastructure state

The system prompt is ~30 lines and focuses on tool selection guidance rather than personality.

Deployment

Components

Component	Image	Deployment	Namespace
Trailboss API (includes unified chat)	`harbor.vitro.lan/ffp/trailboss:v5.5.0`	`frontier-cluster-api`	`f3iai`
OutpostAI Frontend	`harbor.vitro.lan/ffp/outpostai-frontend:v5.1.12`	`outpostai-frontend`	`f3iai`

Environment Variables

Variable	Default	Description
`LLM_PRESET`	`bedrock-sonnet`	Active LLM preset
`AWS_REGION`	`us-east-1`	AWS region for Bedrock
`AWS_ACCESS_KEY_ID`	(from secret)	Bedrock credentials
`AWS_SECRET_ACCESS_KEY`	(from secret)	Bedrock credentials
`FFO_MCP_URL`	`http://ffo-mcp-server.f3iai:50060`	FFO MCP server URL
`KEYCLOAK_MCP_URL`	`http://keycloak-mcp-server.f3iai:50057`	Keycloak MCP URL
`OPENSTACK_MCP_URL`	`http://openstack-mcp-server.f3iai:8080`	OpenStack MCP URL
(etc. for each MCP server)

Build and Deploy

# On build server (texas-dell-04):
cd ~/federal-frontier-platform
git pull
docker build -t harbor.vitro.lan/ffp/trailboss:v5.5.0 .
docker push harbor.vitro.lan/ffp/trailboss:v5.5.0

# Update deployment:
kubectl -n f3iai set image deployment/frontier-cluster-api \
  frontier-cluster-api=harbor.vitro.lan/ffp/trailboss:v5.5.0

File Structure

common/api/
  unified_chat.py       # Bedrock client, MCP tools, agentic loop, Claude Code dispatch
  trailboss_api.py      # FastAPI app — /api/v1/chat delegates to unified_chat.chat()

frontend/src/components/
  ChatBot.vue            # Shared chat UI component (model selector, message history, send)

The unified_chat.py module is ~600 lines and has no dependencies beyond httpx, boto3, and pydantic — all included in the Trailboss image.

Validated April 3, 2026

The unified chat was validated with real queries against live infrastructure:

“What MCP tools are available?” — returned formatted tables of all 150+ tools across 11 categories
“Hello, what can you do?” — returned capability overview with tool-backed examples
Model switching — runtime switch from Sonnet to Opus and back via UI dropdown
MCP tool calling — agentic loop successfully queried Ceph, OpenStack, Keycloak, and Kubernetes
Claude Code dispatch — tool definition present and callable by the LLM
End-to-end — OutpostAI at outpostai.vitro.lan serving Bedrock-powered responses through frontier-cluster-api → unified_chat.chat()