Sovereign Inference — Classified AI Without the Cloud

InfrastructureAI runs sovereign at every classification level — IL2 through IL6 and tactical edge. Inference never traverses the public internet for classified workloads.

Sovereign Inference — Classified AI Without the Cloud

InfrastructureAI inference is sovereign by architecture, not by policy. At IL2-IL4, inference traffic stays within AWS private network fabric via VPC PrivateLink. At IL5, it stays within GovCloud. At IL6, it never leaves the agency’s air-gapped enclave. At the tactical edge, it runs on hardware the operator carries. There is no deployment topology where classified data reaches a commercial AI API over the public internet.

The Principle

For classified workloads, inference never traverses the public internet. No commercial AI API sees CUI content. This is architecturally enforced via VPC PrivateLink and air-gapped deployment, not a policy choice that can be violated.

A policy-based approach — “we promise not to send CUI to external APIs” — is a control that can be misconfigured, bypassed, or violated by a software defect. An architecture-based approach — the network path physically does not exist — cannot be violated without rebuilding the infrastructure. InfrastructureAI takes the architectural approach at every classification level.

IL2-IL4: Claude via Bedrock VPC PrivateLink

For IL2 through IL4 CUI workloads, InfrastructureAI uses Claude (Sonnet and Opus) via AWS Bedrock, accessed through a VPC PrivateLink endpoint.

What VPC PrivateLink means: The VPC endpoint creates a private connection between the customer’s VPC and the Bedrock service. Inference traffic travels over AWS’s private network fabric — not over the internet. There is no internet gateway in the path. There is no NAT gateway. The Bedrock endpoint resolves to a private IP address within the customer’s VPC CIDR range.

The practical consequence: a packet capture on any network segment between the application and the inference endpoint will show traffic between two RFC 1918 addresses. There is no public IP anywhere in the path.

Authorization: FedRAMP Moderate. Bedrock is an AWS service operating within the AWS FedRAMP boundary. The customer’s data is protected by AWS’s FedRAMP controls, the VPC’s network isolation, and the PrivateLink connection’s encryption in transit.

Anthropic’s visibility: None. Bedrock is a managed inference service. Anthropic provides the model weights to AWS. AWS runs the inference. Anthropic does not see the inference payload, does not log prompts or completions, and has no access to the customer’s VPC. This is the same isolation model as any other AWS managed service — the service provider (Anthropic) provides the software, and AWS operates it within its security boundary.

IL5: Bedrock GovCloud

For IL5 workloads, InfrastructureAI uses the same Bedrock VPC PrivateLink pattern within the AWS GovCloud partition.

GovCloud is physically and logically isolated from commercial AWS. It operates in US-only data centers, staffed exclusively by US persons, under a separate FedRAMP High authorization boundary. The GovCloud partition has its own IAM, its own resource namespace, and no cross-partition network connectivity to commercial AWS.

The VPC PrivateLink pattern is identical — a private endpoint within the customer’s GovCloud VPC connecting to the Bedrock service within the GovCloud partition. The inference traffic never leaves GovCloud’s private network fabric.

Authorization: FedRAMP High / DoD SRG IL5.

IL6 Air-Gapped: vLLM on VitroAI

For IL6 workloads, no external connectivity of any kind exists. InfrastructureAI runs inference on vLLM deployed on the customer’s on-premises VitroAI infrastructure within an air-gapped enclave.

Model selection: US-origin models only. The primary model is Llama 3.3 70B, which provides full autonomous reasoning capability — root cause analysis, remediation planning, cross-domain correlation — comparable to the cloud-hosted Claude models for infrastructure operational tasks.

Model provenance: Chinese-origin models (Qwen, DeepSeek, and derivatives) are excluded from IL6 deployments per DoD procurement model provenance requirements. Model provenance is verified at deployment time and enforced by the VitroAI model registry.

Inference backend: vLLM provides production-grade inference with continuous batching, PagedAttention for memory efficiency, and tensor parallelism across multiple GPUs. A single 70B parameter model runs efficiently on two A100 80GB GPUs or equivalent.

Authorization: Agency RMF (Risk Management Framework). The authorization boundary is the air-gapped enclave itself. There is no external network path to audit because there is no external network path.

Tactical Edge: Ollama on Ampere ARM64

At the tactical edge — disconnected, austere, or denied environments — InfrastructureAI runs on Ollama deployed on Ampere ARM64 hardware. The parameter ceiling is 7-8B, constrained by the available compute and memory on edge hardware.

The primary model is Llama 3.1 8B. At this parameter count, the model does not perform the same novel reasoning as the 70B or cloud-hosted models. Instead, the edge deployment uses pre-certified TFO (Tactical Frontier Operations) playbooks — remediations that were reasoned and validated by larger models in connected environments, then packaged as structured playbooks that the edge model executes rather than re-derives. The edge model handles playbook selection, parameter binding to the local environment, and execution verification. Novel reasoning for edge-specific problems escalates to a connected environment when connectivity is restored (ADR-004).

Model Routing Table

InfrastructureAI selects the inference backend based on the workload’s classification level. The routing is determined at deployment time by the environment configuration, not at runtime by the application.

Classification	Environment	Inference Backend	Model	Transport	Reasoning Capability
IL2-IL4 CUI	AWS Commercial	Bedrock VPC PrivateLink	Claude Sonnet/Opus	Private VPC	Full autonomous reasoning
IL5	AWS GovCloud	Bedrock GovCloud	Claude Sonnet/Opus	Private VPC	Full autonomous reasoning
IL6 Air-Gapped	On-premises VitroAI	vLLM	Llama 3.3 70B	Cluster-internal	Full autonomous reasoning
Tactical Edge	Disconnected	Ollama	Llama 3.1 8B	Local	Pre-certified playbook execution
Dev/Operator Workstation	LAN	Ollama	qwen3.5 35B	LAN	Development and testing

The application code is identical across all rows. The MCP tool servers, the agent orchestration, the FFO knowledge graph, and the mission lifecycle are the same. Only the inference endpoint URL and model identifier change between deployment targets. This is why a remediation workflow validated in a development environment against Ollama works identically in production against Bedrock — the tools and orchestration are the same, and only the model providing the reasoning changes.