Netris Fabric Automation — Overview

Technical reference implementation for Netris physical fabric automation in the Federal Frontier Platform — spine/leaf BGP/EVPN underlay, VPC lifecycle, and DPU-enforced multi-tenancy for disaggregated and AI Factory deployments.

Netris Fabric Automation

Technical Reference Implementation. This section documents the target architecture and design for integrating Netris physical-fabric automation into the Federal Frontier Platform (FFP), as defined by the platform’s disaggregated-HCI architecture decision (ADR-007). It is a reference design — not a description of a deployed system. Per-artifact implementation status is given in the Implementation status table below.

Federal Frontier’s standard infrastructure runs OpenStack with OVN/OVS (managed by Neutron) for virtual networking. That model is correct for general-purpose compute, where the physical switches stay intentionally simple and all intelligence lives in the software overlay.

GPU AI infrastructure has fundamentally different network requirements that a software overlay alone cannot serve:

  • Lossless east-west fabric for GPU-to-GPU collective communication (RoCEv2 / InfiniBand) — a single dropped packet stalls a training collective.
  • A dedicated storage fabric from compute to disaggregated NVMe/TCP storage.
  • Hardware-enforced tenant isolation at the switch and DPU level for multi-tenant classified workloads.
  • Automated fabric lifecycle — when a node is provisioned, its top-of-rack switch port must be configured automatically with the correct VLANs, BGP peer, and QoS.

Netris is the physical fabric-automation layer that meets those requirements. Operators declare intent (VPCs, BGP peers, subnets, ACLs); Netris translates that intent into precise switch configurations across the entire spine/leaf fabric, manages the BGP/EVPN underlay, and — with NVIDIA BlueField DPUs — enforces hardware tenant isolation via DOCA Host-Based Networking.

Where Netris fits: the three deployment tiers

Tier Networking Storage Netris role
Standard HCI OVN/OVS overlay (Neutron) Ceph (co-located) None
Disaggregated HCI OVN/OVS overlay + dedicated storage fabric NVMe/TCP (disaggregated) Optional — fabric automation where physical switches are manageable
AI Factory RoCEv2/InfiniBand GPU fabric + storage fabric, Netris-managed underlay NVMe/TCP, dedicated 100GbE fabric Full — spine/leaf BGP underlay, VPC lifecycle, DPU-enforced isolation

Netris is additive. Standard HCI with Ceph and OVN is retained unchanged; Netris is activated only when a deployment includes GPU nodes that need a dedicated, automated physical fabric.

What Netris owns — and what it does not

Netris and OVN occupy two distinct layers and never overlap in responsibility:

  • OVN/OVS (overlay) — virtual, per-tenant networking that runs in software on the compute nodes: logical switches and routers, security groups, floating IPs.
  • Netris (underlay) — the physical fabric: spine/leaf switches, BGP/EVPN, switch-port programming, VPC lifecycle, DPU isolation.

They meet at exactly one point — the OVN-BGP agent — which advertises OpenStack tenant prefixes and floating IPs into the Netris-managed BGP underlay. See Architecture for the full picture and diagram.

Implementation status

Artifact Status Notes
FFO network: ontology (fabric entities & relations) Defined & TypeQL 3.x-validated Entity and relation types for switches, ports, BGP sessions, VPCs, and DPU policy. See Agents & FFO Integration.
Netris Controller integration Reference design Off-box controller as fabric system of record.
Netris Scout (fabric → FFO ingestion) Reference design Polls controller, writes structural state to FFO.
Netris MCP Server (agent tool surface) Reference design Read/live-query + policy-gated write tools.
End-to-end autonomous-ops demonstration Reference design Detect → diagnose → remediate → verify loop over a simulated fabric.

In this section

  • Architecture — the OVN overlay vs. Netris underlay separation, the integration layers, and the data-flow discipline.
  • Agents & FFO Integration — the FFO fabric ontology, the Scout ingestion pattern, the Netris MCP tool surface, and the autonomous detect→diagnose→remediate→verify loop.