Unified Chat — Bedrock + MCP Tools + Claude Code Dispatch
The unified chat module powers the AI Assistant in OutpostAI and Compass. It uses AWS Bedrock (Claude) for inference, 13 MCP servers (150+ tools) for agentic tool calling, and Claude Code dispatch for autonomous code generation and deployment.
Unified Chat — Bedrock + MCP Tools + Claude Code Dispatch
The unified chat module (unified_chat.py) is the backend that powers the Federal Frontier AI Assistant — the conversational interface available in both OutpostAI and Compass. It replaced two divergent chat implementations (one vLLM-only, one Bedrock-only) with a single agentic backend that provides:
- AWS Bedrock inference (Claude Sonnet 4.6 / Opus 4.6) with Ollama fallback for edge
- 150+ MCP tools from 13 servers, loaded via JSON-RPC
- Agentic loop — up to 10 iterations of LLM reasoning and tool calling per request
- Claude Code dispatch — spawn Kubernetes Jobs that write code, create PRs, and deploy via GitOps
- Runtime model switching — operators select Sonnet, Opus, or Ollama from the UI dropdown
Why It Exists
Before April 2026, OutpostAI and Compass each had their own chat backend:
| OutpostAI (Trailboss API) | Compass API | |
|---|---|---|
| LLM | vLLM (Qwen 7B) — hardcoded | AWS Bedrock (Claude Sonnet/Opus) |
| Tools | 45 LangGraph tools (hardcoded Python functions) | 150+ MCP tools via JSON-RPC |
| Routing | Keyword matching + InfrastructureAgent | 3-layer hybrid (intent → posse → agentic) |
| Model switching | Not supported (UI dropdown was ignored) | Supported |
| Code generation | Not supported | Not supported |
The OutpostAI chatbot was broken — vLLM was unavailable, causing every request to return “Connection error.” The model selector dropdown showed “AWS Bedrock — Claude Opus 4.6” but the backend ignored it. The two implementations were diverging with no shared code.
The unified chat module replaces both with a single implementation that both frontends use.
Architecture
ChatBot.vue] FE -->|POST /api/v1/chat| API[Trailboss API
FastAPI] API --> UC[Unified Chat Module
unified_chat.py] UC --> LLM{LLM Provider} LLM -->|bedrock-sonnet| BR[AWS Bedrock
Claude Sonnet 4.6] LLM -->|bedrock-opus| BRO[AWS Bedrock
Claude Opus 4.6] LLM -->|dev| OL[Ollama
Qwen 35B] UC -->|tool calls| MCP[13 MCP Servers
150+ tools via JSON-RPC] UC -->|claude_code_dispatch| Job[Claude Code
K8s Job] Job -->|code changes| Gitea[Gitea MR] Gitea -->|merge| Argo[ArgoCD Sync] style User fill:#2b6cb0,stroke:#4299e1,color:#fff style FE fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style API fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style UC fill:#2c7a7b,stroke:#38b2ac,color:#e2e8f0 style LLM fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style BR fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style BRO fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style OL fill:#553c9a,stroke:#805ad5,color:#e2e8f0 style MCP fill:#1a365d,stroke:#4299e1,color:#e2e8f0 style Job fill:#2b6cb0,stroke:#4299e1,color:#fff style Gitea fill:#2d3748,stroke:#4299e1,color:#e2e8f0 style Argo fill:#2d3748,stroke:#4299e1,color:#e2e8f0
Request Flow
- Operator types a message in the ChatBot.vue component (shared by OutpostAI and Compass)
- Frontend sends
POST /api/v1/chatwith{message, image, context, conversation_history} - Trailboss API receives the request and delegates to
unified_chat.chat() - Unified chat builds the conversation with a system prompt, loads MCP tools, and enters the agentic loop
- The LLM (Bedrock or Ollama) reasons about which tools to call
- Tool calls are executed against MCP servers via JSON-RPC
- Results are fed back to the LLM for the next iteration
- After up to 10 iterations, the final response is returned to the frontend
Agentic Loop
The core of the unified chat is a multi-iteration agentic loop. The LLM does not simply answer questions — it reasons about what tools to call, executes them, interprets results, and decides whether to call more tools or respond.
Iteration 1: User asks "What's the Ceph health?"
→ LLM decides to call ceph_get_cluster_health
→ Tool returns: {health: "HEALTH_WARN", checks: [...]}
→ LLM decides more data needed, calls ceph_get_storage_summary
Iteration 2:
→ Tool returns: {total: "24TB", used: "18TB", pools: [...]}
→ LLM has enough data, generates final response with formatted table
Total: 2 iterations, 2 tool calls, 1 response
Configuration
| Parameter | Bedrock | Ollama |
|---|---|---|
| Max iterations | 10 | 5 |
| Max tools per iteration | 10 | 5 |
| Temperature | 0.1 | 0.1 |
| Max tokens | 4096 | 2048 |
Deduplication
The agentic loop tracks every tool call by (tool_name, arguments) hash. If the LLM attempts to call the same tool with the same arguments twice, the cached result is returned instead of re-executing. This prevents infinite loops and redundant API calls.
MCP Tool Loading
On the first chat request, the unified chat module loads tool definitions from all 13 MCP servers via JSON-RPC tools/list calls. Results are cached for subsequent requests.
Server → Endpoint (JSON-RPC) → Tools
ffo-mcp-server → http://ffo-mcp-server.f3iai:50060/jsonrpc → 7
keycloak-mcp-server → http://keycloak-mcp-server.f3iai:50057/jsonrpc → 12
openstack-mcp-server → http://openstack-mcp-server.f3iai:8080/jsonrpc → 31
ceph-mcp-server → http://ceph-mcp-server.f3iai:50054/jsonrpc → 12
atlassian-mcp-server → http://atlassian-mcp-server.f3iai:50052/jsonrpc → 23
grafana-mcp-server → http://grafana-mcp-server.f3iai:50056/jsonrpc → 28
argocd-mcp-server → http://argocd-mcp-server.f3iai:50051/jsonrpc → 11
gitea-mcp-server → http://gitea-mcp-server.f3iai:50055/jsonrpc → 22
kolla-mcp-server → http://kolla-mcp-server.f3iai:50061/jsonrpc → 10
k8s-mcp-server → http://k8s-mcp-server.f3iai:50062/jsonrpc → 20
web-mcp-server → http://web-mcp-server.f3iai:50063/jsonrpc → 4
federal-compliance-mcp → http://federal-compliance-mcp-server.f3iai:50064/jsonrpc → 3
trailboss-mcp-server → http://trailboss-mcp-server.f3iai:50059/jsonrpc → 1
Tool names are prefixed with the server name to avoid collisions (e.g., keycloak_list_users, openstack_list_vms). The MCP tool format is converted to OpenAI function-calling format, then to Bedrock toolSpec format for the converse API.
Each MCP server URL is configurable via environment variables (e.g., KEYCLOAK_MCP_URL, CEPH_MCP_URL) to support both in-cluster service DNS and local development with port-forwarding.
Tool Categories
Infrastructure Query (Read)
| Category | Example Tools | What They Do |
|---|---|---|
| Compute | openstack_list_vms, openstack_get_vm |
List and inspect VMs, flavors, images |
| Kubernetes | k8s_list_nodes, k8s_list_pods, k8s_list_clusters |
Nodes, pods, CAPI clusters and machines |
| Storage | ceph_get_cluster_health, ceph_list_pools |
Ceph health, OSDs, pools, capacity |
| Networking | openstack_list_networks, openstack_list_security_groups |
Networks, subnets, routers, floating IPs |
| Identity | keycloak_list_users, keycloak_get_user |
Users, roles, groups, sessions, realms |
| Monitoring | grafana_query_prometheus, grafana_get_firing_alerts |
PromQL, LogQL, dashboards, alert rules |
| GitOps | argocd_list_applications, gitea_list_repos |
ArgoCD apps, Gitea repos, branches, commits |
| Project Mgmt | atlassian_jira_search, atlassian_confluence_get_page |
Jira issues, Confluence pages |
| Containers | kolla_list_containers, kolla_container_health |
Kolla OpenStack containers across hypervisors |
| Knowledge Graph | ffo_query, ffo_entity_get |
TypeQL queries against the FFO digital twin |
Infrastructure Action (Write)
| Category | Example Tools | What They Do |
|---|---|---|
| Compute | openstack_server_create, openstack_server_reboot |
Create, delete, stop, start, rebuild VMs |
| Kubernetes | k8s_cordon_node, k8s_drain_node, k8s_restart_deployment |
Node maintenance, pod deletion, rollouts |
| CAPI | k8s_scale_machine_deployment, k8s_delete_machine |
Scale worker pools, replace machines |
| Monitoring | grafana_create_alert_rule, grafana_create_dashboard |
Create alerts, dashboards, annotations |
| GitOps | argocd_sync_application, argocd_rollback_application |
Trigger syncs, rollback deployments |
| Project Mgmt | atlassian_jira_create_issue, atlassian_confluence_create_page |
Create issues, update docs |
| Containers | kolla_restart_container, kolla_exec |
Restart services, run diagnostics |
| Identity | keycloak_set_user_enabled |
Enable/disable users |
Claude Code Dispatch (Code + Deploy)
The claude_code_dispatch tool is unique — it spawns a Claude Code agent as a Kubernetes Job that can write code, create merge requests, and deploy changes through the GitOps pipeline.
Claude Code Dispatch
When an operator asks the chatbot to write code, fix a bug, or deploy a change, the LLM calls claude_code_dispatch with:
| Parameter | Required | Description |
|---|---|---|
task |
Yes | What the agent should do — be specific about repo, files, expected outcome |
repo |
No | Gitea repository in owner/repo format (e.g., admin/federal-frontier-platform) |
severity |
No | Risk level: low (auto-merge), medium (MR for review), high (investigate only) |
branch |
No | Target branch (defaults to main) |
Severity Levels
| Level | Agent Behavior | Human Review |
|---|---|---|
| LOW | Makes changes, commits, pushes directly | Post-mortem only |
| MEDIUM | Creates a new branch, makes changes, opens a Gitea merge request | Operator reviews and merges the MR |
| HIGH | Investigates only, documents findings | Operator reads findings, decides next steps |
How It Works
- The unified chat module receives
claude_code_dispatchfrom the LLM’s tool call - A unique session ID is generated (
cc-<12-hex-chars>) - A Kubernetes Job manifest is built with the task prompt, repo, severity, and branch
- The Job is created in the
f3iainamespace via the K8s MCP server - The Claude Code agent runs with access to the full MCP tool surface
- For MEDIUM severity: agent creates a branch, commits changes, opens a Gitea MR
- The MR flows through the normal GitOps pipeline — human reviews, merges, ArgoCD syncs
Example Interaction
Operator: "Fix the CORS header bug in the openstack-mcp-server and open a PR"
LLM calls: claude_code_dispatch({
task: "Fix CORS header handling in the OpenStack MCP server. The server
returns 'Access-Control-Allow-Origin: *' for preflight requests but
not for actual requests. Fix the middleware in src/main.py.",
repo: "admin/federal-frontier-platform",
severity: "medium"
})
Response: "Claude Code agent dispatched as K8s Job 'claude-code-cc-a3f8b1c2d4e5'.
The agent will:
1. Clone the repo
2. Fix the CORS middleware in openstack-mcp-server/src/main.py
3. Create a Gitea merge request for your review
Track progress in the OutpostAI Dispatches panel."
LLM Configuration
Runtime Model Switching
The chatbot UI includes a model selector dropdown. When the operator changes it, the frontend calls:
PUT /api/v1/llm/config?preset=bedrock-opus
This switches the active LLM at runtime — no restart required. Available presets:
| Preset | Provider | Model | Use Case |
|---|---|---|---|
bedrock-sonnet |
AWS Bedrock | us.anthropic.claude-sonnet-4-6 |
Default — fast, capable, cost-effective |
bedrock-opus |
AWS Bedrock | us.anthropic.claude-opus-4-6-v1 |
Complex reasoning, large tool sets |
dev |
Ollama | qwen3.5:35b-a3b-q4_K_M |
Edge / air-gapped / development |
Bedrock Integration
The unified chat uses the AWS Bedrock converse API (not invoke_model). This provides:
- Native tool calling — Bedrock handles tool use/result cycling natively
- Cross-region inference profiles — model IDs prefixed with
us.for cross-region routing - Automatic message formatting — OpenAI-format messages are converted to Bedrock’s alternating user/assistant format with
toolUseandtoolResultblocks
Credentials are provided via the aws-bedrock-credentials Kubernetes secret:
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-bedrock-credentials
key: aws-access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-bedrock-credentials
key: aws-secret-access-key
Sovereign Inference
For IL6 and tactical edge deployments where AWS Bedrock is unavailable, the dev preset routes to a local Ollama instance running Qwen 35B. The agentic loop works the same way — only the LLM endpoint changes. See Sovereign Inference for details on IL2-IL6 inference backends.
System Prompt
The unified chat includes a system prompt that instructs the LLM to:
- Always use tools — never suggest CLI commands or code blocks for the user to run
- Select the right tool — a categorized guide maps query types to specific tools
- Use Claude Code dispatch for code/deploy tasks — don’t try to generate code inline
- Confirm destructive operations — delete, restart, drain require operator confirmation
- Use real data — never fabricate infrastructure state
The system prompt is ~30 lines and focuses on tool selection guidance rather than personality.
Deployment
Components
| Component | Image | Deployment | Namespace |
|---|---|---|---|
| Trailboss API (includes unified chat) | harbor.vitro.lan/ffp/trailboss:v5.5.0 |
frontier-cluster-api |
f3iai |
| OutpostAI Frontend | harbor.vitro.lan/ffp/outpostai-frontend:v5.1.12 |
outpostai-frontend |
f3iai |
Environment Variables
| Variable | Default | Description |
|---|---|---|
LLM_PRESET |
bedrock-sonnet |
Active LLM preset |
AWS_REGION |
us-east-1 |
AWS region for Bedrock |
AWS_ACCESS_KEY_ID |
(from secret) | Bedrock credentials |
AWS_SECRET_ACCESS_KEY |
(from secret) | Bedrock credentials |
FFO_MCP_URL |
http://ffo-mcp-server.f3iai:50060 |
FFO MCP server URL |
KEYCLOAK_MCP_URL |
http://keycloak-mcp-server.f3iai:50057 |
Keycloak MCP URL |
OPENSTACK_MCP_URL |
http://openstack-mcp-server.f3iai:8080 |
OpenStack MCP URL |
| (etc. for each MCP server) |
Build and Deploy
# On build server (texas-dell-04):
cd ~/federal-frontier-platform
git pull
docker build -t harbor.vitro.lan/ffp/trailboss:v5.5.0 .
docker push harbor.vitro.lan/ffp/trailboss:v5.5.0
# Update deployment:
kubectl -n f3iai set image deployment/frontier-cluster-api \
frontier-cluster-api=harbor.vitro.lan/ffp/trailboss:v5.5.0
File Structure
common/api/
unified_chat.py # Bedrock client, MCP tools, agentic loop, Claude Code dispatch
trailboss_api.py # FastAPI app — /api/v1/chat delegates to unified_chat.chat()
frontend/src/components/
ChatBot.vue # Shared chat UI component (model selector, message history, send)
The unified_chat.py module is ~600 lines and has no dependencies beyond httpx, boto3, and pydantic — all included in the Trailboss image.
Validated April 3, 2026
The unified chat was validated with real queries against live infrastructure:
- “What MCP tools are available?” — returned formatted tables of all 150+ tools across 11 categories
- “Hello, what can you do?” — returned capability overview with tool-backed examples
- Model switching — runtime switch from Sonnet to Opus and back via UI dropdown
- MCP tool calling — agentic loop successfully queried Ceph, OpenStack, Keycloak, and Kubernetes
- Claude Code dispatch — tool definition present and callable by the LLM
- End-to-end — OutpostAI at
outpostai.vitro.lanserving Bedrock-powered responses throughfrontier-cluster-api→unified_chat.chat()