Netris Fabric Automation — Overview
Technical reference implementation for Netris physical fabric automation in the Federal Frontier Platform — spine/leaf BGP/EVPN underlay, VPC lifecycle, and DPU-enforced multi-tenancy for disaggregated and AI Factory deployments.
Netris Fabric Automation
Technical Reference Implementation. This section documents the target architecture and design for integrating Netris physical-fabric automation into the Federal Frontier Platform (FFP), as defined by the platform’s disaggregated-HCI architecture decision (ADR-007). It is a reference design — not a description of a deployed system. Per-artifact implementation status is given in the Implementation status table below.
Federal Frontier’s standard infrastructure runs OpenStack with OVN/OVS (managed by Neutron) for virtual networking. That model is correct for general-purpose compute, where the physical switches stay intentionally simple and all intelligence lives in the software overlay.
GPU AI infrastructure has fundamentally different network requirements that a software overlay alone cannot serve:
- Lossless east-west fabric for GPU-to-GPU collective communication (RoCEv2 / InfiniBand) — a single dropped packet stalls a training collective.
- A dedicated storage fabric from compute to disaggregated NVMe/TCP storage.
- Hardware-enforced tenant isolation at the switch and DPU level for multi-tenant classified workloads.
- Automated fabric lifecycle — when a node is provisioned, its top-of-rack switch port must be configured automatically with the correct VLANs, BGP peer, and QoS.
Netris is the physical fabric-automation layer that meets those requirements. Operators declare intent (VPCs, BGP peers, subnets, ACLs); Netris translates that intent into precise switch configurations across the entire spine/leaf fabric, manages the BGP/EVPN underlay, and — with NVIDIA BlueField DPUs — enforces hardware tenant isolation via DOCA Host-Based Networking.
Where Netris fits: the three deployment tiers
| Tier | Networking | Storage | Netris role |
|---|---|---|---|
| Standard HCI | OVN/OVS overlay (Neutron) | Ceph (co-located) | None |
| Disaggregated HCI | OVN/OVS overlay + dedicated storage fabric | NVMe/TCP (disaggregated) | Optional — fabric automation where physical switches are manageable |
| AI Factory | RoCEv2/InfiniBand GPU fabric + storage fabric, Netris-managed underlay | NVMe/TCP, dedicated 100GbE fabric | Full — spine/leaf BGP underlay, VPC lifecycle, DPU-enforced isolation |
Netris is additive. Standard HCI with Ceph and OVN is retained unchanged; Netris is activated only when a deployment includes GPU nodes that need a dedicated, automated physical fabric.
What Netris owns — and what it does not
Netris and OVN occupy two distinct layers and never overlap in responsibility:
- OVN/OVS (overlay) — virtual, per-tenant networking that runs in software on the compute nodes: logical switches and routers, security groups, floating IPs.
- Netris (underlay) — the physical fabric: spine/leaf switches, BGP/EVPN, switch-port programming, VPC lifecycle, DPU isolation.
They meet at exactly one point — the OVN-BGP agent — which advertises OpenStack tenant prefixes and floating IPs into the Netris-managed BGP underlay. See Architecture for the full picture and diagram.
Implementation status
| Artifact | Status | Notes |
|---|---|---|
FFO network: ontology (fabric entities & relations) |
Defined & TypeQL 3.x-validated | Entity and relation types for switches, ports, BGP sessions, VPCs, and DPU policy. See Agents & FFO Integration. |
| Netris Controller integration | Reference design | Off-box controller as fabric system of record. |
| Netris Scout (fabric → FFO ingestion) | Reference design | Polls controller, writes structural state to FFO. |
| Netris MCP Server (agent tool surface) | Reference design | Read/live-query + policy-gated write tools. |
| End-to-end autonomous-ops demonstration | Reference design | Detect → diagnose → remediate → verify loop over a simulated fabric. |
In this section
- Architecture — the OVN overlay vs. Netris underlay separation, the integration layers, and the data-flow discipline.
- Agents & FFO Integration — the FFO fabric ontology, the Scout ingestion pattern, the Netris MCP tool surface, and the autonomous detect→diagnose→remediate→verify loop.