Research Note: The Fundamentals and Feasibility of Secure Network Taps for Verifying AI Datacenter Use
Last updated: April 2026.
Network taps are likely viable for inclusion in a retrofittable system for verifying that AI data centers are used as expected. This post explains why.
Reading time for the summary: 3 mins.
Reading time for the main body: 20-30 mins.
Reading time for appendices: 20-30 mins.
Thanks to Anjay Friedman and Mauricio Baker for substantial contributions to this post.
Summary
Secure network taps for AI compute use verification are most viable between AI data centers and the outside world (“north-south”) due to low bandwidths, costing < 0.01% of the monitored data center upfront. Tapping traffic within the data center (“east-west”) likely requires phased ramp-up and random sampling rather than full capture due to high bandwidth and low latency requirements, with such optical taps costing an estimated ~0.2-1.5% of the total data center cost upfront. The underlying technology exists in commodity hardware, and we encourage rapid R&D on some of the remaining uncertainties around optical link budgets, security and system integration.
In this post, we assess the viability of network taps for verifying how AI data centers are used, under mutual distrust: the datacenter operator (the prover) desires to protect their confidential data, while the verifier desires assurance against evasion of the verification setup.
Network taps are devices that mirror network traffic without disrupting normal operation. In contrast to on-chip governance mechanisms, network taps can be flexibly adapted to any AI data center and retrofitted at different network levels. Network taps can capture traffic between a data center and the outside world, or traffic within the data center, and in doing so provide a “ground truth” for downstream verification. Any captured traffic (or fingerprints thereof) could be retroactively checked for self-consistency or compliance.
To make international AI agreements enforceable where political trust is low, adversarially robust verification is essential to provide mutual assurances (e.g. about compliance to global AI safety standards or restraint on frontier AI development). In the absence of credible verification mechanisms, such international agreements are unlikely to succeed.
Tap placement options
Regarding tap placement, the observed traffic types and expected cost, we provide an overview below.
(Over)simplified schema for the different network fabric types and potential tap points. Out-of-band management and switch hierarchies (used for all-to-all connections) were omitted for simplicity.
While any North-South (NS) traffic traverses all levels of its network hierarchy1, many East-West (EW) network traffic patterns (traversing storage and compute fabric) can be self-contained at lower levels. This means that an EW tap can capture any communication on its level or traffic going to levels above, but can miss traffic at levels below (e.g tapping all links at tap 4’s inter-rack level in the illustration above captures all compute traffic except the intra-rack traffic below).
Legend:
KV = Key-Value cache transfer
MMF and SMF: multi-mode-fiber and single-mode-fiber
MP: Model Parallelism, i.e. spreading weights across devices
TX: Transmission right out of the transceiver, as opposed to reception (RX)
For derivations of our cost estimates, and detailed discussions of the technical challenges, as well as our proposed workarounds, see appendices A and EW. SMF transceiver retrofits are not needed everywhere and industry trends are already moving towards SMF as the default for all non-copper levels of the compute network.
Security
To mitigate security risks from the network taps themselves, measures could include keeping functionalities minimal and having the Prover monitor all tap outputs with their own passive taps: The logic of network taps can be simple enough to be physically inspectable by focusing only on collecting evidence, while outsourcing the more complex compliance checks to a dedicated verification cluster. In addition, passive optical splitters on tap egress links can let the Prover observe every bit sent to the Verifier, enabling independent re-computation to confirm taps behave as specified. This would also make it possible to use processors that are only unilaterally trusted by the Verifier, which has practical benefits compared to setting up internationally trusted supply chains for chips.
There are potential covert communication channels even through a tapped link. These include timing and analog side-channels, as well as steganography in unverifiable free fields. Active network taps erasing such covert channels are often called active wardens, and we expect such devices to be less disruptive at NS links than at the more bandwidth- and latency-critical EW links.
Precedents/available technology
There are established technologies that can be adopted and repurposed: commercially available hardware for passive splitting optical fiber links exists at volume, while network taps based on Field-Programmable-Gate-Arrays (FPGAs) are deployed in automated stock trading networks (for example at the German stock exchange4) and can handle protocol parsing and timestamping at SOTA line rates (800G and higher). Modern hash algorithms (SHA-3 Keccak) also enable hashing of traffic at these line rates on medium-range FPGAs.
Areas for further work
Key uncertainties remain across several areas:
A more rigorous threat model is needed to characterize what covert workloads remain possible under various tap placements and covert bandwidths (including side-channel risks through and around East-West links).
Optical splitter feasibility for tapping compute fabric hinges on unresolved questions around link budgets (especially whether amplification is needed post-split), fiber striping across multiple lanes, and more scalable photonic sampling alternatives to optical circuit switches.
Manufacturing scalability of secure tap hardware—particularly tap hardware capable of handling sufficient traffic bandwidth, and optical circuit switches—must be evaluated against realistic deployment timelines.
Demos and pilots, starting with an inference-only retrofitting demonstration, would help validate the approach and build confidence among policy and industry.
We urgently need more talented individuals, especially with networking, hardware, security or entrepreneurial backgrounds, to help derisk and build out network taps for AI compute verification. If you are interested in contributing to this work, reach out to us.
Table of contents
Summary
1. Introduction
2. Verification protocol, threat model and requirements
3. Network tap options
4. Precedents and available hardware
5. Feasibility of taps on NS traffic
6. Feasibility of taps on EW/backend traffic
7. Conclusion and areas for further work
Appendices
Appendix A: NS traffic throughput calculation
Appendix B: Passive taps and power budgets
Appendix C: Transceiver Cost Estimates
Appendix EW: More Details on EW Network Taps
Details on Non-SMF Tap Approaches
Cost Estimate for Inter-SU EW Taps
Appendix D: The challenge of encrypted traffic
Appendix E: Attack Success Conditions and Defenses
1. Introduction
As AI systems grow more capable and consequential, it is increasingly plausible that governments may seek (international agreements) to manage the risks associated with developing and deploying frontier AI systems. If such agreements emerge, their credibility will hinge on the ability to verify compliance well enough that parties trust that large-scale violations would likely be detected.
Verification of AI agreements can focus on large-scale, concentrated compute, which is difficult to conceal. A standard approach could involve a Prover making declarations about how they are using large amounts of compute and a Verifier checking the correctness, compliance, and completeness of those declarations. A combination of independent components, as outlined in our previous post on a technical verification MVP, can make covert large-scale violations increasingly difficult.
One important component in our proposed MVP is the use of network taps—retrofittable devices that mirror network traffic without disrupting normal operation. In principle, if a Verifier could observe all network traffic into and out of AI compute nodes, they would gain a clear, high-fidelity account of the workloads running on those nodes. Network visibility at this level could dramatically improve the ability to detect large-scale violations of an agreement by ensuring that all observed network traffic is accounted for within the Prover’s declared use of compute.
However, the feasibility of network taps for use in verification is not obvious. Frontier AI workloads rely on extremely high-bandwidth links and performance-sensitive network architectures, leaving little room for additional latency or error. It is not clear whether network taps can meet these constraints while remaining trusted by both parties and imposing only modest cost overhead.
2. Verification protocol, threat model and requirements
2.1 Verification protocol (high level)
This section outlines how network taps could be used within a broader verification protocol focused on checking the completeness of Prover declarations.
Declarations: The Prover declares how large-scale compute is being used (e.g. code, data, models).
Traffic observation: Verifier-controlled network taps are placed to observe network traffic into and out of AI compute nodes. Taps may be passive (optical splitting) or active (inline forwarding), depending on link type.
Evidence generation: Tap hardware processes observed traffic by timestamping, signing, and committing only headers and hashes of the message contents to a trusted log of traffic. The Prover retains message contents (or sufficient information to reconstruct them if challenged).
Verification: At a later stage, the Verifier checks whether the committed log of observed traffic is consistent with the Prover’s declarations of compute usage.
2.2 Threat model
We adopt the same broad threat model used in our previous post on a Verification MVP overview where the Verifier and the Prover are OC5-level cooperative but covert adversaries participating in the verification protocol. The Prover may attempt to violate an agreement while still appearing compliant, and both sides must maintain acceptable confidentiality and integrity guarantees.
For network taps specifically, the most relevant threats are:
Bypassing observation: The Prover attempts to route traffic around taps or use alternative paths invisible to the Verifier.
Manipulating tap outputs: Forging, dropping, delaying, or modifying tap commitments; interfering with hash/signature generation; or manipulating delivery to the Verifier.
Compromising tap hardware or software: Supply-chain tampering, malicious firmware, or physical interference with device behaviour.
Steganographic signalling or covert channels: Encoding hidden information in timing, packet structure, or other traffic properties.
Confidentiality risks: Leakage of model weights, data, or usage patterns through tap outputs (within the few-bit leakage budget allowed in the MVP).
To mitigate these threats, we assume the same hardware security measures outlined in the MVP [link], where possible:
For the Verifier: Formally verified FPGAs, supply chain vetting, tamper-evident enclosures and cameras, data authentication through standard cryptography and more.
For the Prover: Egress limits, vetted, open-sourced designs, hardware teardowns, physical separation of Verifier hardware and more.
A highly useful security property of passive observability (through fiberoptic coupling) is that it can be applied to any device’s ingress and egress, including the network taps themselves, provided they use fiber optics compatible with passive splitting (see section 3.5.1).
An analysis of Prover and Verifier attack success conditions, including residual risks and potential countermeasures, is provided in Appendix E.
2.3 Functional requirements
To be viable under the threat model above, network taps must meet several concrete requirements:
No bypass: All node-level ingress/egress traffic must pass through at least one tap; traffic must be fully observed (or sampled) as specified.
Correct capture and commitments: Taps must accurately mirror traffic and correctly timestamp, hash, and sign headers and content digests, with no omission, duplication, or forgery.
Authenticated delivery: Signed digests must reliably reach the Verifier, with protections against dropping, reordering, or tampering.
Simple, verifiable logic: Tap functions (L1 capture, timestamping, hashing, signing, forwarding) should be minimal, deterministic, and simple enough to plausibly support formal verification of tap hardware or at least inspection of critical components.
Secure key handling: Device-held signing keys must be protected via supply-chain controls and on-site physical security consistent with MVP assumptions.
2.4 Practical requirements
In addition, taps must satisfy practical constraints to be usable in frontier AI agreements:
Performance: Taps must not significantly harm the latency, bandwidth or reliability of networks.
Cost: Hardware and deployment costs should be modest (<5%, ideally <1%) relative to monitored compute.
Scalable manufacturing and integration: Devices must be manufacturable and deployable at scale.
3. Network tap options
There is a diversity of options for what parts of a datacenter network to tap into, and the traffic types to record/monitor. To provide clarity for unfamiliar readers, we briefly describe the main network types commonly found in AI datacenters.
3.1 North-South (NS) vs. East-West (EW)
Traditionally, one distinguishes between two types of datacenter traffic:
North-South (NS) (or “frontend”): This is traffic between the datacenter and the outside world. It includes user API service (inference), diagnostics and administrative access (e.g. SSH).
East-West (EW) (or “backend”): Traffic between the devices within the datacenter. In AI clusters, this is dominated by the (massive volume of high-speed) communication between accelerators (GPUs/TPUs) during distributed inference and training, as well as traffic between compute units and datacenter storage.
For both traffic types (NS and EW), there exist hierarchies from individual processors, to servers, to racks5, to pods/SUs, to networking switches and ultimately, the datacenter edge.
Multi-tier networking within a “Scalable unit” (SU) as described above. (Source: Supermicro datasheet SRS-48UGPU-AI-LCSU)
Importantly, the different network types are not always entirely physically separate. While for NVIDIA’s “scalable units” (SUs) the compute fabric is its own network with dedicated switches, in-band Ethernet and storage fabric are isolated in some network configurations but connected by shared switches in others.
3.2 Protocol layers
Here we briefly discuss what is sent through network links at a fundamental level –and in turn– what a tap would observe directly. Ethernet traffic (most common/relevant case) consists of nested layers. For our purposes, the key distinction is between transport layers and the application layer6. This distinction between payload and protocol headers/metadata matters for taps in two ways:
Where traffic is captured as hashes rather than plaintext data (likely preferable for security and confidentiality), it is important that the original data (the preimage) can be recovered. It is not common to retain/store protocol headers long term, meaning that a tap hashing a raw bitstream is likely insufficient, as any “lost” nonce in a preimage makes the hash non-reproducible.
Second, transport headers contain degrees of freedom that can carry information independent of the payload. If unmitigated, they can be a side-channel for covert communication (see Security Properties below).
This means that at the very least, taps need logic able to parse protocol layers, which in and of itself is an unproblematic requirement.7 However, for hashes to be verifiable in retrospect, one needs to either hash only the payloads, or store the protocol headers. In the latter case, the storage requirement for the verifier roughly triples [Appendix EW].
3.3 Active vs. passive taps
An active tap can be useful when interaction with the traffic (re-timing, header scrubbing, TLS termination) is a design goal, e.g. with active wardens against steganography or re-timing against the timing side-channel. Active tapping is a more attractive choice for NS traffic, which is a natural communication chokepoint between the prover and their devices. NS traffic is substantially lower volume and less latency-sensitive than the backend/EW. In the security properties section below, we go into more detail on how active taps can close potential covert communication channels at the north-south links.
A passive tap observes the traffic without interference on the “live link”, (apart from reduced amplitude). Passive taps are likely to be the preferred option where latency and bandwidth are key, in order to not interfere with the most demanding traffic (EW storage and compute fabric). One does however need to respect the link’s amplitude budget—a key detail explained further in appendix B.
3.4 Copper vs. optical
Datacenters transmit data via electrical signals over copper wires, or by light over optical fiber. Copper is cost-effective for high-speed communication at short distances within racks (<3–5m), while optics are essential for longer links. Optical fiber can –under certain circumstances8– enable straightforward passive tapping using simple optical splitters; tapping copper is significantly more complex, requiring active hardware to electronically capture and re-transmit the signal, all while keeping latency and bandwidth impacts tolerable. Unlike fiber, copper cannot be passively split – any tap must digitize the signal, which at 224G PAM4 requires full DSP equalization compensating for up to 45 dB of channel loss. We consider two approaches:
Switch-level tap (”Tapswitch”): Modify the switching ASIC (e.g. NVSwitch) to copy sampled traffic from its internal digital fabric to a monitoring interface. Switches are among the few components where digital bitstreams, rather than PAM4, exist, but exposing lanes for inspectable traffic cloning requires custom silicon. This adds minimal die area (comparable in complexity to standard Design-for-Test infrastructure) and introduces zero latency on the production path, but requires the switch vendor’s participation. No independent path to modify proprietary ASICs exists.
Bump-in-wire retimer tap: Insert a signal retimer into the copper link and extract a digital copy from its internal equalized bitstream. 224G retimers are entering production (e.g. Credo Blue Heron, Q3 2026). This shifts the cooperation requirement to retimer vendors, but introduces significant per-hop latency (current DSP-based PAM4 retimers run ~60ns per hop, likely higher at 224G where full MLSD equalization is mandatory). Standard retimers do not expose internal bitstreams externally; the vendor would need to add an interface, as with the “tapswitches”.
At best, we expect tapping copper with either approach to be costly and complex, with at least 1 year of lead time before manufacturing.
3.5 Placement
Depending on the use case and the threat models, one can choose to tap into:
Server, rack, pod/SU, datacenter network: Tapping into different levels of the network hierarchy offers tradeoffs between granularity of signal and cost/complexity. This is primarily a concern for EW traffic; NS traffic is strictly ingress/egress and must eventually traverse the datacenter edge, making it visible at higher, more concentrated aggregation points of the network hierarchy.
In-band Ethernet (NS): The standard management and API network. Tapping here captures the “intent” (workload schedulers, training scripts, user prompts) and the “result” (inference outputs). It is generally lower volume but contains the highest semantic signal for verifying what task is being performed.
Storage fabric (EW): The network connecting compute nodes to file systems (often separate from the compute fabric). Tapping here observes data provenance: which datasets are read for training and which model weights are loaded or saved as checkpoints.
Compute fabric (EW): The high-bandwidth, low-latency interconnects (e.g., InfiniBand, NVLink, RoCE) linking accelerators so that they can work like unified processors. This traffic is raw, low-level mathematics, such as model activations and gradients. Capturing this traffic can effectively rule out secret workloads that require the monitored link for effective, distributed computation (e.g. gradient accumulation in training or model parallelism in inference).
Disclaimer: There is no single “standard” AI datacenter network. Some deployments physically separate compute, storage, and in-band/NS; others converge parts of them (for example in-band and storage fabric). The verifier needs to make sure every communication path they care about has at least one tap in the way. Multiple taps along the path help detect unexpected routing. Physical separation makes this easier: Fewer paths to cover, and wire location gives more robust assurance than the metadata of layer 3/4 protocols. Also, different traffic types have different bandwidth needs by orders of magnitude, which means that observability of specific traffic types would be simplified further by physical separation.
Finally, all true ingress/egress must traverse some edge, and chokepoint monitoring there gives comprehensive coverage. The strongest example for this is the datacenter edge: This covers all NS traffic, provided the facility is otherwise air-gapped. While the downstream network infrastructure that delivers traffic between a datacenter and end users can be enormously complex9, none of this complexity is relevant to the tap point. The tap sits at the physical facility boundary, on the few fiber uplinks leaving the building, upstream of all of it.
(Over)simplified schema for the different network fabric types and potential tap points. Out-of-band management and switch hierarchies were omitted for simplicity.
3.5.1 Security properties of NS taps
In theory, if NS traffic could be captured and screened perfectly (every bit of communication between the Prover and their infrastructure with insignificant covert channel leakage), no other taps would be necessary. The reason being that in principle, all interaction between the Prover and their AI accelerators must traverse the datacenter edge, including all semantic workload information (instructions, inputs and outputs).
However, in the previous protocol layer section, we mentioned how metadata can be a covert channel if not averted. For NS links, their fraction of total data volume is significant and warrants mitigation. While solutions for this specific challenge –active wardens– have been built and tested, prior work assumes single-party trust in the warden device. Extending this to mutual distrust is non-trivial: an active warden must control header generation to eliminate the prover’s covert channel, but if implemented naively, this gives the verifier or the prover their own channel for colluding with the warden. We refer to a related piece for a proposed design built from commodity hardware, addressing these security concerns via a “watching the watchers” principle using passive optical splitters and redundant processors.
3.5.2 Granularity-dependent security properties of EW taps
If side-channels and steganographic degrees of freedom can not be sufficiently averted, additional taps at the storage and compute fabrics can offer defense in depth. Loads from storage and intermediate tensors (activations, KV cache, gradients etc.) are highly effective unique identifiers of individual computations10, but obtaining them gets increasingly challenging the further down the network hierarchy one aims to tap into. In return, finer-grained signals become directly accessible (KV transfer and model parallelism in particular). For most deployments, we expect spine-to-leaf to be the lowest practical level for tapping intra-SU/pod compute fabric. Going lower increases tap count and runs into constraints: tight link budgets requiring retrofits11, and many frontier architectures (NVL72, TPU v7 cubes) are copper-connected at this level.
Note: Google TPU pods can exceed 9,000 accelerators before inter-pod scale-out. At the scale-out tap level only, an entire frontier training run could complete without crossing a tap. Without intra-pod compute fabric taps, NS and storage fabric capture would need to suffice.
3.6 Replacement vs. Retrofit
Two options for taps are to 1) replace existing networking hardware with modified hardware that provides the original functionality and additionally performs the tap functions, or 2) use network taps that are just a “bump in the wire”. While 1) offers the potential benefit of verifiable separation of traffic types against re-routing12, we expect 2) to be the more practical choice in the near-term, as it is less disruptive. That being said, “bump in the wire” passive tapping likely requires transceiver retrofits where multimode, rather than singlemode optics are installed. See appendix B.
3.7 Analysis: real-time vs. asynchronous
To automate data capture while meeting the combined security, scalability and versatility needs, one can separate the workflow into two distinct phases. This ensures that the hardware sitting directly on the wire is simple enough to be verifiable, while the complex analysis happens elsewhere.
Evidence Collection (Real-time): The tap’s primary role is to capture the raw signal, extract metadata, and cryptographically commit to the data (hashing + timestamping). Because these tasks are mathematically simple but require extreme speed (line-rate), they are ideal for Field-Programmable Gate Arrays (FPGAs) or minimalist ASICs. These devices offer high throughput with a minimal, cheaply defensible attack surface.
Analysis & Verification (Asynchronous): The replay verification, semantic analysis and compliance checking does not need to happen in real-time at the tap. Since the data is already cryptographically committed to, this work can be offloaded to separate, more complex hardware and leverage statistical sampling.
3.8 Trust: mutually trusted silicon vs. observation and re-execution
To provide mutual assurance to both the prover and the verifier that the network taps work only as specified, one can either establish trust in processing hardware itself, or provide passive observability of its inputs and outputs. These two options are not mutually exclusive and we expect defense-in-depth to be a useful principle here.
Mutually trusted chips: The required logic of passive taps is simple enough to be handled by FPGAs: Header parsing and hashing. The circuits of these devices may be simple enough to be inspectable both before, and potentially after installation. However, we note that setting up globally trusted supply chains for processors is a monumental logistical challenge.
Tapping the taps: The network taps themselves can be monitored. Passive splitters at their ingress and egress links can provide assurance to the prover where needed, even if the processor itself is only unilaterally trusted by the verifier. For example, a passive tap sending hashes to a verifier can itself be passively tapped by the prover, giving them visibility –but not influence– on every bit sent out. This would allow the prover to re-execute these hashes on their own devices, based on the plaintext data they have already, confirming the verifier’s tap works as specified.
In the near term, we expect the second option to be the key enabler for fast deployment, absent semiconductor supply chains that mutually distrusting nations can mutually trust.
3.9 Delivery mechanism
A key question is how the raw data captured by the taps is transferred to the device/facility that stores or screens the data.
What moves where. There is no need to send confidential data to the Verifier. Instead, hashes can uniquely identify this data as well. This does however require storage of the original data (and potentially encryption keys if the captured traffic was not plaintext). As long as the hashes are secure, storage can be the responsibility of the Prover without this compromising evidence integrity in any way. In such a setup, the verifier can challenge hashes (which may be timestamped) retroactively: The verifier enters the hashes –and the Prover the preimage data– into a secure verification cluster performing compliance checks and potentially re-execution.
This setup avoids the need for large storage capacity or complex logic on the taps, but it may not work universally: While storing all NS traffic is likely feasible, storing all EW traffic is infeasible due to sheer volume. Re-generation faces some non-determinism challenges, and even an almost perfect regeneration can not satisfy a cryptographic hash. Two solution approaches: A) Since EW capture would likely be sparse samples regardless, the taps may store the sampled tensors themselves. B) With sufficient, stored knowledge about their own software stack and inference conditions (such as the momentary batch size concurrent with the captured hash’s timestamp), the prover may indeed be able to re-execute bitwise, though this is more speculative.
Online streaming or batched delivery. For batched delivery, one needs a trusted storage medium, and physical double-observation of the retrieval. For online streaming, the communication channel (one wire would be enough for hashes and timestamps) would need to be secured likewise. One additional possibility would be a passive tap in an optical connection which allows the Prover to see the hashes sent to the Verifier without being able to interfere, allowing the Prover to confirm their legitimacy by re-computing them on their own hardware. This last option elegantly removes the need for trusted storage media.
Retention period. We expect that commercial and company-internal inference traffic, training data, and (some) checkpoints will be technically feasible to store indefinitely (by the prover), technically speaking. The legal situation of data retention will vary on a case-by-case basis.
Sampling rate. Full plaintext retention of the NS traffic and full hashing of the storage traffic seem feasible based on our estimates [Appendix EW].13 In section 6 we argue that compute traffic will likely require random sampling. The specific rate is an engineering/security choice.
4. Precedents and available hardware
There is strong precedent for FPGA-based network taps operating successfully in adversarial, high-stakes environments, most notably high-frequency trading, financial regulation and secure networks for national security.
These systems are deployed at scale, operate at line rate (10–400 Gb/s), introduce negligible latency, and are relied upon for regulatory compliance where evidence integrity is legally contested (e.g. Reg NMS, MiFID II). Their threat model is explicit: traders attempt to hide behavior; regulators require complete, lossless, timestamped records.
A canonical example is Metamako/Arista Networks, whose 7130 FPGA switch/tap platform is used for lossless capture and precision timestamping. The German stock exchange (Deutsche Börse) publicly disclosed deployment of dozens of such devices for full-fidelity capture of co-location traffic.
Other commercially available devices:
LDA NeoTap X FPGA tap/aggregator with 100 ps timestamp precision, line-rate 100 GbE capture, optional in-band mode, and 60TB onboard SSDs for MiFID II / FINRA evidence retention.
Napatech SmartNICs (NT-series) FPGA NICs supporting up to 2×200 Gb/s capture with zero loss and nanosecond timestamps, widely used for trading surveillance and compliance
Neox PacketRaven FPGA network taps offered in model-specific speed variants (1G–400G), emphasizing physical data diodes, secure boot, and “network-invisible” passive capture
A strong precedent for secure network cards used in high-stakes applications is Infodas SDoT, an FPGA-based NIC developed in Germany for classified networks. Its FPGA is personalized after chip fabrication in a secure facility, defending against hidden hardware Trojans. The card is approved (BSI-VSA-10331) for DEU/EU/NATO SECRET levels, underscoring its use in national-security systems.
The processing power requirements are modest, even at full line rate. A passive tap needs to only parse L2–L4 protocol headers and hash application-layer payloads. This, along with the optical and SerDes components is a solved engineering problem with commodity hardware below $10,000 a unit.14
The bottom line: high-bandwidth passive capture, cryptographically defensible timestamps, minimal to zero software attack surface, and operation by mutually distrustful parties, already exists and has been deployed successfully. Adapting available hardware –or developing new systems– for use in AI datacenter monitoring can make use of existing technology.
5. Feasibility of taps on NS traffic
In this section, we discuss the surprising feasibility of using active network taps to capture all datacenter NS traffic comprehensively, at low cost and high reliability, using minimalistic (potentially software-free15), supply-chain secure hardware.
Active taps that introduce intentional latency are preferable to mitigate analog (light modulation) and timing side-channels. This can be made negligible compared to typical inference response times of 10-100ms per language token, and much more time per generation run. Additionally, they can perform header scrubbing (see Security Properties 3.5.1), removing degrees of freedom in metadata that could otherwise be covert channels.
In appendix A, we present a calculation demonstrating that the North-South tap for an entire H100 SuperPOD (worth ~$10M) requires less than 1Gbps of bandwidth in a real-world grounded example.16 A cheap 10G generic NIC could monitor this with 90% headroom. An active tap with intercept and forward17 is highly feasible here, as the requirements on bandwidth and latency are lenient. The tap hardware cost would be comfortably below 1% of the monitored pod. Aggregated over a 200,000 GPU datacenter, we would have just below 800 SUs, meaning ≈ 600Gb/s of total traffic at the edge.18
About protocol parsing:
The network tap captures raw data and produces hashes with timestamps. At minimum, it operates at layer 1 (raw bits) with no parsing — whether a hash corresponds to a complete API request, a partial packet, or arbitrary byte boundaries would be determined by the verification protocol, not the tap. However, an active tap with re-timed forwarding of data risks slicing between Ethernet frames19, effectively crippling communication. If one wants active taps that do not cripple traffic, and wants hashes and timestamps at meaningful boundaries, one can integrate hardware-only parsing logic into FPGAs or ASICs. FPGA-based packet parsing up to L4 is proven at >400G speeds. Separately, formally verified hardware synthesis has been demonstrated for cryptographic cores. The parsing logic required for a network tap is far simpler than either.
About encryption:
In practice, TLS is typically terminated at the provider’s CDN/POP edge, not at the datacenter. Traffic at the facility boundary is therefore often already plaintext, making tapped data directly interpretable. Where internal transport encryption is used, the prover holds the keys, and can provide them to a verification cluster after their sessions expire and the keys are no longer in active use. One exception would be if key management is fully within TEEs, making them unavailable for even the server host. For more details on this, see appendix D.
About reliability:
A failure of an active tap at the NS uplinks of a device –the only communication pathway(s) between it and the prover– can be more than a nuisance: An SU producing 50B output tokens per day at a cost of $2 per million tokens generates $100,000 dollars per day in revenue. For feasibility, the taps would need to be at least as reliable as the network they tap into. As reliability is a (mostly) solved problem for more complicated hardware – the datacenter network– we expect this to not be an issue, especially with added redundancy in the tap’s hardware.
5.1 Cost estimate
We expect the material cost to be insignificant for NS taps.
Even if the real datacenter edge traffic discussed in the previous section exceeds API/inference by a factor of 100x, only few FPGAs and switches would be needed to parse and hash at line rate (60Tb/s). ~300 transceivers for the active taps, plus about ~150 mid-range FPGAs are sufficient according to our estimates; the required FPGA processing at 400G (parsing and SHA-3 hashing) consumes <25% of a mid-range device such as the Kintex UltraScale+ KU15P (522,720 LUTs, ~$4,000 at volume).20 We conservatively assume $5,000 per FPGA and $450 per DR4 transceiver.
The material cost would amount to $885,000. For comparison, xAI’s Colossus 1, built using the SRS-48UGPU-AI-LCSU platform we based our calculations on, cost about $9.3B according to Epoch AI’s estimates. This means that even with 100 times more traffic than in our estimates, the material cost would be ~0.01% of the monitored datacenter.
6. Feasibility of taps on EW/backend traffic
East-west traffic in an AI datacenter is composed of:
The compute fabric connecting accelerators, effectively combining them into unified processors
The storage fabric between compute units and storage units
For both fabrics, we expect there to be feasible options for traffic capture. Random sampling for the compute fabric, and plausibly full capture (only hashes and timestamps) of storage traffic. In contrast to NS traffic, compute communication can be self-contained within lower layers of the network hierarchy. Also, bandwidths and latencies are the most extreme here, which is why we only see random sampling strategies as practically viable.21 Generally, we found that for the compute fabrics, the cost and technical difficulty of network taps increases, the lower the level one aims to tap into.
For derivations of our estimates, and detailed discussions of the technical challenges, as well as our proposed workarounds, see appendices EW and B.
Compared to the TCO of the monitored datacenter, we expect such tapping to cost anywhere from ~0.3%-0.5% for the storage fabric, and ~0.2%-1.5% for the optical levels of the compute fabric, depending on the specific tapped level of the hierarchy.
Legend:
KV = Key-Value cache transfer
MMF and SMF: multi-mode-fiber and single-mode-fiber
MP: Model Parallelism: Spreading weights across devices
For the compute fabric, we want to note that as copper scale-up domains grow further, more and more traffic types (such as model parallel communication) are likely to fit exclusively within hard-to-access racks. We expect this trend to raise the relative importance of NS and storage taps from the verifier’s perspective.
Another noteworthy trend is the industry transition away from MMF across all network levels, as it reaches range-bandwidth limits compared to SMF.
7. Conclusion and areas for further work
Network taps for AI treaty verification appear feasible with current technology, though the strength of coverage varies significantly by network segment, and important uncertainties remain.
North-south taps are the strongest starting point. Active taps at the datacenter edge can capture all external communication at negligible cost (<0.01% of facility TCO), using commodity FPGA hardware with strong precedent in financial regulation and national security. The bandwidth requirements are modest — a single mid-range NIC could monitor an entire 256-GPU scalable unit with headroom to spare. Combined with active warden techniques to constrain covert channels in protocol metadata, NS taps provide the highest security value per dollar of any tap placement.
Storage fabric taps are the natural second priority. Full hash-and-timestamp capture of storage traffic appears feasible at sub-percent of facility cost, providing visibility into which models are loaded, which datasets are read, and when checkpoints are saved. This complements NS taps by covering data provenance. However, feasibility of the lowest-cost solutions may depend on whether passive optical splitters work within link budgets, which is not yet fully understood.
Compute fabric taps are feasible but face increasing constraints at lower network levels. Passive optical splitters on inter-SU/pod links enable random sampling of gradient and tensor traffic at ~0.2% cost, while tapping deeper into the hierarchy (spine-leaf, leaf-node) requires either inline optical-electrical-optical conversion or replacing MMF transceivers. The greatest technical challenge is the growth of copper-interconnected scale-up domains: NVL72 today, NVL576 on the horizon. Traffic that never leaves a copper rack is opaque to passive optical taps without vendor cooperation for switch-level or retimer-based solutions. We expect that this either requires increased reliance on NS and storage fabric taps, or custom copper link monitoring with vendor cooperation.
Several design choices reduce deployment complexity. Passive observability of tap outputs — letting the prover see every bit sent to the verifier — eliminates the need for mutually trusted silicon in the near term. Asynchronous verification, where taps commit hashes in real time and analysis happens later, keeps on-wire hardware minimal and verifiable.
Open questions and areas for further work:
Threat model and covert channel quantification. A more rigorous threat model is needed to characterize what covert workloads remain possible under various tap placements and covert bandwidths, including side-channel risks through North-South and East-West links, covert channels that bypass taps entirely, and steganographic channels within tapped traffic. Our Fermi estimates for constrained covert bandwidth through NS taps (~10² bps) need empirical validation against real datacenter edge traffic composition. This is the single most important unknown for assessing whether NS taps alone provide sufficient assurances.
Optical splitter feasibility. Tapping compute and storage fabric hinges on unresolved questions around link budgets — especially whether amplification is needed post-split, and whether OSNR holds up even with amplification — as well as fiber striping across multiple lanes, and more scalable photonic sampling alternatives to optical circuit switches.
Protocol design. This post identifies feasibility and constraints. An implementable verification protocol — specifying hash boundaries, sampling strategies, challenge-response mechanisms, and metadata requirements for tensor identification — is a natural next step.
Copper tapping. As scale-up domains grow, the fraction of compute traffic invisible to passive optical taps increases. Whether tapswitch or retimer approaches can be made trustworthy and cost-effective deserves focused engineering investigation, likely requiring vendor engagement.
Encrypted traffic. NVIDIA’s confidential computing roadmap will eventually make TEE-encrypted fabrics more common. Verification protocols need to be designed to securely handle encrypted traffic.
Manufacturing scalability and speed. Can secure tap hardware — particularly hardware capable of handling sufficient traffic bandwidth, and optical circuit switches — be produced at the volumes and pace needed for global deployment? OCS supply in particular looks like a potential bottleneck, though recent upward revisions in market forecasts suggest production capacity may scale with demand.
Reproducibility of network traffic. Can network traffic (excluding ML non-determinism) be reliably reproduced? Early simulation work looks promising but needs further validation.
Empirical demonstrations and pilots. The most convincing evidence for feasibility is a working prototype. A small-scale deployment could be built with off-the-shelf components and would surface engineering challenges that paper analysis misses. Starting with an inference-only retrofitting demonstration would help validate the approach and build confidence among policymakers and industry. We expect this to be achievable at a small scale with off-the-shelf components.
The underlying technology for network taps exists, but we urgently need more talented individuals, especially with networking, hardware, security or entrepreneurial backgrounds, to help derisk and build out network taps for AI compute verification. If you are interested in contributing to this work, reach out to us:
Acknowledgements
Thanks to Anjay Friedman and Mauricio Baker for substantial contributions to this post.
I also want to thank Sam Reynolds, Joe Pater, Tom Milton, Phil Bladen, Halfdan Holm and the rest of the Amodo Design team for their excellent technical feedback on the engineering challenges of network taps, and for getting involved with their own research. Lastly, I thank Jonathan Happel and Aaron Scher for their feedback and steelmanning.
Appendices
Appendix A: NS traffic throughput calculation
As a demonstrative example, we can consider the Supermicro SRS-48UGPU-AI-LCSU SU of 32 H100/H200 servers, which is a typical configuration used for AI inference and training in datacenters.22 Such an SU contains 256 Hopper-GPUs in total, enabling model parallelism similar to Deepseek’s setup for serving their v3/R1 models. Deepseek used prefill units spread over 4 nodes, and decode units spread over 18 nodes. They specify:
“Each H800 node delivers an average throughput of ~73.7k tokens/s input (including cache hits) during prefilling or ~14.8k tokens/s output during decoding.”
Keeping that ratio, we can extrapolate to 6 prefill units and 26 decode units to fill the whole capacity of the SU, and keep the same per-node throughput. This yields 385,000 decode tokens/s.
NS traffic is not only tokens, though, and API traffic is mostly json format packaging. The OpenAI streaming format, applied to each token generation, looks as follows:
{”id”:”chatcmpl-123”, “object”:”chat.completion.chunk”, “created”:1694268190,”model”:”gpt-4o-mini”, “system_fingerprint”: “fp_44709d6fcb”, “choices”:[{”index”:0,”delta”:{”content”:”Hello”},”logprobs”:null,”finish_reason”:null}]}
228 characters in this example, for one “Hello” token. We can round up and assume 250 Bytes per token. This makes up the bulk of traffic, far exceeding ingress of ASCII symbols for prefill.23
→ 385,000 × 250B = 96MB/s ≈ 0.77Gb/s.
Appendix B: Passive taps and power budgets
B.1 Optical Fundamentals for Passive Tapping
Understanding why passive tapping is feasible for some datacenter links but not others requires a brief tour of fiber optic basics. Datacenters can use two fundamentally different fiber technologies:
Multimode Fiber (MMF) uses a wide glass core (~50μm) that allows light to travel in multiple paths (modes). It operates at 850nm using inexpensive VCSEL lasers that can be manufactured and tested at wafer scale, making them significantly less expensive than edge-emitting lasers. The tradeoff: Modal dispersion limits 400GBASE-SR8 to 100m over OM4, achieved via eight parallel 50Gbps lanes across 16 fibers. Also, no optical amplification technology suitable for high bandwidth PAM4 exists at 850nm, in turn making passive taps impractical here (see appendix B.6).
Single-mode Fiber (SMF) uses a narrow core (~9μm) that permits only one light path. It operates at 1310nm (O-band) or 1550nm (C-band) using more expensive laser types (EML, DFB, or silicon photonics).24 SMF offers superior reach: 10-40km at 1310nm (O-band) and 40-120km+ at 1550nm (C-band), with distances varying by data rate and transceiver type. Short-reach datacenter applications like 400G-DR4 are specified for 500m.
Link-PP:
“While MMF continues to evolve (especially OM5 for short-reach WDM), the relentless demand for higher speeds (400G, 800G, 1.6T) over increasingly longer distances within and between data centers solidifies single mode fiber as the long-term strategic choice.”
In AI clusters, the internal high-bandwidth links within a compute unit (scale-up) often use MMF for cost reasons, while the links between compute units (scale-out) and storage typically use SMF. With MMF approaching its bandwidth-distance limits at 224G per lane, standards bodies are focusing 1.6T development on single-mode fiber, suggesting future high-bandwidth datacenter fabrics will trend toward SMF-only fabrics.
B.2 Channel Insertion Loss Budgets
Modern 400G and 800G links operate with (tight) margins for signal loss between transceivers. Optical link budgets are measured in decibels (dB), a logarithmic unit of power ratio. Key reference points:
Losses are additive: a 1.5 dB fiber loss plus a 0.7 dB connector loss equals 2.2 dB total. A link fails when total loss exceeds the receiver’s sensitivity threshold.
IEEE 802.3 specifies maximum channel insertion loss (IL) for each Ethernet application. This budget covers all passive losses between transmitter and receiver: fiber attenuation, connector matings, and splices. Any device inserted into the link—including a tap—must fit within this budget.
Links:
*The DR4 budget assumes connector reflectance of -50dB; worse reflectance reduces available IL to 2.7dB.
B.3 Baseline Link Loss (Without Tap)
We calculate baseline loss for representative intra-SU (multimode) and inter-SU (single-mode) links under two cabling scenarios:
4 matings: Structured cabling with 2 patch panels (NVIDIA reference design).25
2 matings: Direct point-to-point trunk without intermediate panels (potential optimization).
Component loss assumptions:
OM4 fiber: 3.0 dB/km (TIA-568)
OS2 fiber: 0.4 dB/km (ISO 11801)
MPO connector mating: 0.35 dB typical (SENKO, Fluke Networks)
B.4 Tap Insertion Loss by Split Ratio
A passive optical splitter diverts a fraction of light to the monitor leg. We assume fusion-spliced installation to minimize loss (vs. connectors).
Theoretical split loss: −10 log₁₀(fraction to live leg)
Excess loss: <0.2 dB for quality fused splitters
Fusion splices: 2 splices @ 0.05 dB = 0.10 dB
* The theoretical calculations of this table match the specifications of some commercial products well, especially for low split ratios.
B.5 Feasibility Matrix (Live Leg Integrity)
Combining baseline margins with tap insertion loss to see if the production link survives.
* Not including additional losses due to more matings between SUs/pods
Legend: ✅ Feasible (≥1.5 dB margin) | ⚠️ Marginal (0.5 to 1.5 dB) | ❌ Infeasible (<0.5 dB)
Summary:
Intra-SU multi-mode: Passive taps feasible only at 95/5 with direct cabling. Infeasible with standard structured cabling.
Inter-SU single-mode: Passive taps feasible at 90/10 or 95/5 with either cabling scenario.
Production telemetry from Meta’s optical fleet (John et al., “Production Monitoring of Optics in Meta Datacenters,” DTS 2025) confirms that margin cannot be treated as uniformly available across a deployed fleet. Their Fig. 4 shows a broad distribution of minimum received optical power over a day for a representative optics type, with P1 values at -3.42 dBm against a wide tail extending below -8 dBm. Any tap insertion loss shifts this distribution’s low-performing tail into failure territory. Combined with laser degradation over the multi-year lifetime of deployed optics, even 0.5 dB of tap insertion loss (90/10 split) will measurably increase the failure rate at the fleet level. This strengthens the case for 95/5 splits on the live leg where feasible, accepting the harder monitor-leg recovery problem.
Meta also proposes Annual Interruption Rate (AIR) as a reliability metric better suited to AI infrastructure than traditional MTBF or Annual Swap Rate, since even brief link events (link flaps, error packets) can disrupt distributed training jobs that depend on all-to-all GPU communication. The cost of tap-induced reliability degradation should be evaluated in these terms, not just link-down counts.
B.6 Signal Restoration (The Monitor Leg)
With typical 400G-DR4 transmit power (~0 to +3 dBm per lane), a 90/10 tap delivers approximately -10 to -7 dBm and a 95/5 tap delivers approximately -13 to -10 dBm to the monitor leg, both possibly within direct detection range of a dedicated PIN receiver without optical pre-amplification. (the monitor leg does not require a standard transceiver). PIN receivers achieve better sensitivity than production transceiver modules: Chen et al. (IEEE PTL, Dec 2024) demonstrated a 4-channel SiGe-BiCMOS optical receiver achieving -9.7 dBm OMA sensitivity for 56-Gbaud PAM-4.
Note that achieving this level of sensitivity requires equalization (CTLE/FFE); a bare PIN+TIA without DSP is likely 3-4 dB worse (Okamoto et al. report ~-6 dBm at 56 GBd PAM4 without equalization).
At 112 GBaud (800G per lane), the state of the art for PIN-based direct detection is substantially worse. Declercq et al. (IEEE JLT, Feb 2025) demonstrated -4.8 dBm OMA sensitivity at KP4-FEC for 112 GBd PAM4 using a 55nm SiGe BiCMOS traveling-wave TIA with 10-tap FFE. This is the best published result we are aware of at this baud rate.
Where PIN sensitivity is insufficient, avalanche photodiodes (APDs) offer 5-8 dB improvement. Production 400G-ER4 transceivers using APDs achieve -12.1 dBm OMA receiver sensitivity per lane (Smartoptics TQD023-SL4C-SO datasheet), demonstrating that the underlying APD+TIA technology is available at volume for 53 GBaud PAM4. The MACOM MARP-BA56 is a commercially available 56 GBaud APD targeting 100G-400G Ethernet and 50G-PON applications. Bismuth-doped fiber amplifiers (BDFAs) remain an option where APDs are insufficient, but at higher cost and size.
Monitor leg feasibility at 56 GBaud (400G per lane), SMF 1310nm O-band:
Monitor leg power is calculated from typical 400G-DR4 transmit power (0 to +3 dBm outer OMA per lane) minus tap split loss and excess/splice losses. PIN sensitivity assumes an equalized receiver with CTLE/FFE, per Chen et al. 2024.
Monitor leg feasibility at 112 GBaud (800G per lane), SMF 1310nm O-band:
At 112 GBaud, the best demonstrated PIN sensitivity we found is -4.8 dBm (Declercq et al. 2025). This is relevant for upcoming 1.6T transceivers and CPO and substantially worse than at 56 GBaud.
For 850nm multimode (intra-SU links):
We do not expect passive splitting+amplification to be viable here.
850nm RX sensitivity: IEEE 802.3db SR4 spec (-4.6 dBm OMAouter); dedicated PIN estimate accounts for ~1-2 dB TIA optimization over module spec. VCSEL TX power per IEEE 802.3db.
B.7 Edge Case: Google’s Circulator Optics
While single-fiber bi-directional is not new, Google’s fully bi-directional fiberoptics –used for modern TPU pods– are. For different reasons than MMF, these are also particularly unfriendly to passive-tapping: Bi-directional traffic at the same wavelength means that circulators are needed to separate directions, and these introduce high insertion losses. On top of that, their Optical Circuit Switches (OCS) already consume a large portion of the transceiver’s link budget, which is in turn not available for passive taps. Pods can contain over 9,000 TPUs, enough for even the most demanding workloads, such as pretraining frontier LLMs without inter-pod scaleout. If this were to be considered a blindspot and taps into the pod compute fabric were required, new tap technologies or retrofits may be needed (see the next appendix section).
Appendix C: Transceiver Cost Estimates
For reasons stated in Appendix A, one may need to replace transceivers in the compute fabric with tap-compatible options, such as single-mode duplex optics. For the per-transceiver cost, we make use retail prices to estimate per-piece cost at high-volume:
To estimate the relative cost of a transceiver retrofit compared to the cost of the monitored hardware, we use datasheets of a variety of systems from Supermicro and NVIDIA, and anchor on Epoch AI’s breakdown of total AI datacenter cost.
* We include the cost of the datacenter utilities such as power supply, cooling, land value and facility ops. In aggregate, this is assumed to be a third of the datacenter cost, with two thirds being the hardware (complete pods, storage and networking).
** Importantly, while Intra-SU scale-up competes for switch ports with inter-SU scale-out for Hopper-generation systems, the Blackwell generation can use switches with double the port number. For this table, we assume that half of the spine switches are provisioned for scale-out.
° DR4 (single mode) is also an option for any level of the fabric, and customers can choose to fit DR4 instead of MMF anywhere. In the table “SR4” only means that this level could be SR4 if a customer has chosen the small cost saving that this would bring. Pluggable transceivers are modular by design.
°° We include extra spine switches for DP training scale-out for the two HGX configurations in this table. The others are either already configured for non-blocking scaleout, or generally not intended for DP training.
^ This is an assumption, and one that probably holds for both the GH200 Supermicro setup (only 1 GPU per node) and the air-cooled, low density B200 SuperPOD.
Links:
For TPU pods, replacing transceivers would negate the OCS-halving advantage of their bi-directional technology. Emerging transceiver technology with higher bandwidths may be of help by serving the same aggregate bandwidth with fewer, higher-capacity duplex links, thereby requiring the same number of OCS switches despite the conversion to duplex.
Appendix EW: More Details on EW Network Taps
Details on Non-SMF Tap Approaches
See the appendix C for more details and derivations for our transceiver cost estimates.
Passive-tapping East-West fabric is limited by the optical link budget. Modern 400G and 800G links operate with oftentimes thin margins for signal loss, especially the multi-mode-fiber connections for short-distance communication within many NVIDIA SUs or the single-fiber fully bidirectional connections used within modern Google TPU pods. While the industry trend is moving towards SMF links all throughout, current systems likely require compromise or retrofitting along the path:
Option 1: Accept reduced intra-SU/pod visibility. This renounces the additional defense in depth that capturing tensors at this level provides. But note that copper-based scale-up (NVLink26, TPU cube) increasingly makes this traffic inaccessible to passive taps regardless. And security loss is bounded: NS and storage taps still capture workload semantics; only the tensor-level direct measurement of intra-unit parallelism is lost.
Option 2: Replace MMF with SMF infrastructure. Swap transceivers and fiber for tap-compatible options, ideally single-mode duplex. For multi-hop connections in hierarchical datacenter networks, this is only needed at one layer. Estimated cost for compute fabric: ~0.1% of TCO if specified at procurement; ~1% of TCO for full retrofit for devices not already using SMF duplex.
Option 3: Custom technical solutions. This is more speculative, but could possibly offer cost benefits over option 2, while also not needing to replace ~ a third of the cluster’s MMF transceivers with SMF. This could –for instance– be electrical retimers acting as interposers between a transceiver and its socket in the server node. Such a solution may even be applicable to copper connections such as NVLink.
Storage Network
Capturing and storing all this traffic in plaintext would require at least as much storage as the monitored datacenter itself, but one can still achieve full coverage: Only store hashes and timestamps. The prover would need to preserve the data the hashes correspond to. This could be temporary if otherwise infeasible, only until the data has been screened by trusted verification devices. Upcoming BlueField 4 DPUs handle storage traffic at 800Gb/s peak. Without requiring slowdowns on the prover’s side, a storage tap would need to match the peak rate, sustained – most notably for model weight loads. SmartNICs such as ConnectX-9 SuperNIC demonstrate that programmable NICs are scalable at cost – one would just need to offload to hash-specialized ASICs or parallel FPGA banks to hash at true wire rate. Since this traffic is seldom at peak rate – only for short bursts – one does not need processors for every individual storage fabric link, only an aggregate connection to an array, possibly handling multiple pods/SUs at once. This requires the prover to avoid saturating all storage links simultaneously, a minor27, non-disruptive28 scheduling constraint that the taps themselves can verify.
Compute Network
Within a SU/pod, many compute fabric links are copper (NVLink) instead of optical fiber29, and much higher bandwidth. To make things even more challenging, data-parallel training means that pods can no longer be treated as independent singular units with only the NS/storage links tapped. Instead, tens or even hundreds of pods share gradients in so-called all-reduce communication over the compute fabric. If a Supermicro SRS-48UGPU-AI-LCSU SU can make use of half of its spine switches for scale-out communication (the other half for SU-internal scale-up), that would still be 8 spine switches30 × 64 ports × 400Gbps = 204.8 Tbps bidirectional. Full coverage of this traffic is infeasible – the storage requirement would be astronomical – yet partial physical coverage would leave gaps for evasion. To address this, the verifier can employ a strategy of 100% physical coverage combined with temporal sampling as follows:
One can install passive optical splitters on every single scale-out link.31 This creates a “panopticon” of the multi-Tbps traffic without introducing latency or requiring active electronics on the cables. Instead of trying to process this datastream in full, the fibers connect to an optical sampler.32 The sampler acts as a random selector, physically routing one link at a time to a bank of FPGA processors while leaving the others disconnected. Such a setup can make any (not every) transmitted tensor subject to scrutiny.
Random sampling is powerful despite low capture rates: a single gradient tensor encodes one or more minibatches, and a single KV cache slice can fingerprint inference on tens of thousands of tokens. The high dimensionality of these tensors makes them effectively unique. The prover cannot predict which sample will be captured, and any captured sample can be verified against declared checkpoints via partial replay.
To make the captured data useful for verification purposes, it needs to be mapped to declared workloads.33 KV tensors, activations, gradients etc. are tensors and not semantically useful for verification without metadata. The network overhead from adding contextual headers to each transfer — job ID, tensor type (KV, hidden state activation, gradient), layer index, token position, training step — is negligible relative to tensor size, and replay/verification devices can then check captured samples with corresponding workloads.34 With the help of such metadata, the tap’s logic need not reconstruct a tensor’s origin or its boundaries in the bitstream, and only capture raw data. The design could follow similar principles as the NS taps explored in the previous section (timestamping, hashing, perhaps protocol parsing), just with the ability to quickly cache small data packets at the line rate of the compute fabric.
Cost Estimate for Inter-SU EW Taps
Starting with the panopticon setup for the inter-SU compute fabric:
If we again assume 8 spine switches × 64 ports × 400Gbps scale-out for the compute fabric, we have 256 400G fibers to tap into.35 The cost of fused fiber couplers is trivial relative to other components.36
The traffic would be aggregated into an analog selector, perhaps opening 1 out of the 256 ports at a time, randomly. A practical architecture might deploy a 1x256 optical circuit switch (e.g., Agiltron 1xN series, estimated ~$10,000–$15,000 at volume based on 1x64 retail pricing of ~$5,000).
Since low tap ratios (90/10 or 95/5) yield weak monitor-leg signals, the question is whether optical pre-amplification is required before detection. As discussed in Appendix B.6, direct detection at 90/10 with PIN receivers is TX-power-dependent at 56 GBaud, with between -0.3 and +2.7 dB of margin depending on transmitter power. For 1.6T optical links at 112 GBaud, amplification is probably indispensable.37 Depending on which amplification technology is used (SOA, APD, BDFA), the cost per concurrently routed link can range from hundreds to thousands of dollars.
A sampling rate of 0.01% of the compute fabric traffic would mean about 20Gb/s of traffic to timestamp and hash (and maybe parse). A single FPGA should be suitable for the task, and as before in the NS tap cost estimate, we assume a high per-piece price of $5,000. This would add to ~$25,000 in total (factoring in additional components such as splices) for an upper estimate of the compute fabric panopticon’s price. This is ~0.2% of our inferred per-SU price of $12M.38
The cost of capturing all storage traffic depends on two factors: The average utilization of the installed bandwidth, and the peak utilization. By aggregating the traffic of multiple SUs or storage banks into shared tap processors, one can smoothen out the per-SU throughput, avoiding the need to provision the peak rate for individual devices that mostly idle the storage fabric. In theory, one SU can use all 128 200G ports in the SU simultaneously.
The Deepseek 3FS documentation helps anchor typical traffic patterns for storage: We see that traffic is indeed bursty, with ~40GB/s peak and ~3GB/s average.
Going by the available information from Deepseek’s 3FS file system, we assume 6% average utilization and close to 100% peak on individual links. With load balancing over multiple SUs, provisioning for 12% of the average throughput may be enough. This would result in 3Tb/s throughput per SU. Once again we conservatively assume that $15,000-$30,000 worth of FPGA processing (three to six mid-range FPGAs, 28nm) is capable of handling this throughput. Regarding the optical components, the main cost driver will be the sensors and DSP. Per SU, 64 transceivers for 400G line rate (which include unneeded TX/laser) would cost ~30,000 in total. Cost would be lower with dedicated receivers (PIN+TIA+DSP) only, likely ~$20,000.
In total, this would be ~0.6-0.75% of $12M per-SU cost. But this is only in case the storage fabric already includes at least one SMF hop to tap into. If this is not the case, one would need to retrofit one hop with more tap-compatible SMF transceivers. For our SU in question, this would be 64*$585 (see appendix C) for twin transceivers, adding ~$40,000. In total:
Appendix D: The challenge of encrypted traffic
NVIDIA is increasingly prioritizing encrypted communication within its accelerator platforms, with a roadmap toward unified TEEs across CPUs, GPUs, and NVLink. To the best of our knowledge, this applies to at least NVLink and the storage fabric via BlueField DPUs.
However, this is an NVIDIA-specific trajectory, not an industry norm. Google TPUs, AWS Trainium/Inferentia, AMD Instinct, Cerebras, and Groq do not encrypt their accelerator interconnects. The majority of frontier AI compute today runs on unencrypted internal fabrics.
Where internal traffic is encrypted but not inside a TEE, session keys are typically held by the server host and can be stored and shared after the session concludes. Only TEEs hide session keys from the infrastructure operator by design. This means that for non-TEE encrypted fabrics, the operator (i.e., the prover) can retain keys to enable post-hoc decryption of tapped ciphertext. Encryption alone does not foreclose verification.
Even NVIDIA’s own confidential computing stack is not yet mature for production training. Hopper CC relies on software-encrypted bounce buffers over PCIe, incurring measurable throughput overhead on data transfers. Blackwell introduces TDISP/IDE for inline encryption that eliminates bounce buffers, but NVIDIA’s own whitepaper notes that “to enable TDISP/IDE end-to-end, both the GPU and CPU should support it”. Blackwell B200 ships with PCIe 5.0, which lacks native TDISP support — only Blackwell Ultra (B300) officially supports PCIe 6.0. Full deployment of low-overhead encrypted RDMA will likely require Rubin-generation hardware at the earliest.
Unencrypted traffic does not imply loss of customer privacy. Most privacy-focused AI inference providers protect customer data through operational controls: no logging, immediate erasure, encrypted storage of saved history.
Where TEE-encrypted traffic is deployed, making tapped data useful requires either:
Unencrypted traffic on tapped links. The tradeoff is not customer privacy but the hardware-enforced isolation TEEs provide against a malicious infrastructure operator.
Fully inaccessible client data — the core design goal of TEEs — is in tension with verification. Governance logic can in principle run inside a TEE, but trust in that logic then depends on trust in the TEE’s silicon vendor and attestation (possibly unacceptable between adversarial nation-states).
A verification protocol based on passive optical taps sidesteps this by extending the confidentiality boundary to encompass both the prover’s datacenter and a verification cluster. Network taps can defend against the very threat TEEs are designed to counter: a malicious infrastructure operator exfiltrating customer data. NS taps capture all egress, making unauthorized data extraction detectable, in turn achieving the same protective goal through off-chip, automated observability rather than opaque hardware isolation.
Post-hoc decryption. For TEE-encrypted traffic, this is fundamentally difficult: the TEE hides session keys from the infrastructure operator (the prover) by design. If keys are ephemeral and managed entirely within the TEE’s HSM, neither the prover nor the manufacturer can recover them after the session concludes. A potential workaround could be active taps acting as men-in-the-middle, though this would be more complex and likely introduce some amount of overhead.
Encrypted internal traffic is the exception for frontier AI compute today. Addressing it will grow in importance as NVIDIA’s confidential computing roadmap matures, but it is not a prerequisite for deploying network tap verification.
Appendix E: Attack Success Conditions and Defenses
E.1 Prover Attack Chain
For a prover’s attack against the tap setup to succeed and to run workloads in secret on the tapped hardware, they would need to:
Encode covert job orchestration into the NS traffic.
Via the remaining bandwidth in the timing side-channel (limited by the active tap)
Via steganography in tokens (imperfect replay, noise tolerance)
Via non-inference management traffic (SSH etc.)
Via file uploads (image, video, audio etc.)
Make the devices run the workload in secret, without creating a detectable signature in either
EW taps (note that intra-node traffic tapping is unlikely and NVL576 etc. are on the horizon)
If running the secret workload on anything other than the models known by the verifier, they’d also need to smuggle through (or past) the storage taps.
Get the results of the computation back out of the cluster, through the NS taps.39
This communication would need to fit below the detection threshold of the verification setup. Arguably the worst case for the verifier would be secret workloads fully within a copper-scale-up node/rack, where taps are likely infeasible in the near term. NVL576 is a beast and could pretrain what we consider “frontier” today in a single rack. Still, NS and storage taps still offer most of the security properties needed for 1,2a, 3 and 4.
E.1.1 Physical Side-Channels
A natural concern is whether physical side channels (electromagnetic emanation, acoustic, optical, thermal, or vibration-based covert channels) could bypass network taps entirely. There are demonstrated data exfiltration methods from air-gapped computers through radio signals from RAM buses, fan noise, hard drive LEDs, and other unintentional emissions. We surveyed this literature and find these channels pose negligible risk in our context, provided some rudimentary defenses are in place.
Among the fastest demonstrated air-gap channels, electromagnetic emanation from RAM achieves ~1,000 bps at 3 meters under laboratory conditions, degrading to ~100 bps at 7 meters. Acoustic and thermal channels are slower by orders of magnitude. Most demonstrated attacks additionally rely on hardware absent from rack-mounted GPU servers (hard drive LEDs, keyboards, monitors, speakers). At these rates, even a single inference query-response pair (tens of kilobytes) would take minutes to exfiltrate.
More broadly, physical containment is a necessary complement to network monitoring but a separable engineering problem. With broadband RF jamming and acoustic noise generation (feasible at negligible cost when no legitimate wireless or audio exists inside the facility) electromagnetic and acoustic channels can be strongly suppressed. Remaining channels (thermal, magnetic; see Table 2) are sub-meter range and have not been demonstrated in datacenter environments. The detailed design of physical containment is outside the scope of this post.
E.1.2 Constraining Covert Channel Capacity
It is possible that north-south taps could become the main, if not only defense against secret ML workloads, where compute fabric taps are not feasible at sufficiently low levels in the network.40 The core security property of NS taps rests on a simple principle: If the verifier observes all communication into and out of a facility, any covert computation requires smuggling instructions in and results out.
For clarity, we assume a (somewhat arbitrary) threat model: For a 100,000-GPU cluster producing approximately 10⁸ tokens per second at full utilization, exfiltrating 0.1% of inference output requires moving 10⁵ tokens per second. Using efficient encoding (16-32 bit token IDs and no json packaging), this translates to 1.6-3.2 Mbit/s of sustained covert bandwidth. For 0.01% exfiltration, the requirement drops to 160-320 kbit/s
Since ingress can be arbitrary (user prompts), egress traffic must be explainable by ingress traffic plus declared cluster state. If the verifier can observe inputs and securely replay the observed/declared computation, they can independently derive what the outputs should have been. This applies not only to inference responses but also to deterministic interactions such as SSH, telemetry and any other derived quantities the prover reports.
However, some variables are not directly reproducible from observed traffic, such as diagnostics of detailed, opaque hardware state such as temperature or nanosecond timestamps. Such “free variables” in egress that the verifier cannot deterministically attribute to securely41 observed causes offer degrees of freedom for steganography. Free bits per event multiplied by event frequency yields total covert channel capacity.
Ergo, one can constrain covert channel bandwidth either within the events (e.g. summarizing non-inference traffic such as diagnostics, dictating rigid formatting, canonical ordering, etc.) or by limiting the frequency of such events (e.g. fewer diagnostic heartbeats). Sampling and selection (what gets sent in what order) can be tightly canonicalized in advance to combat the ordering channel as well.
These approaches have substantial precedent. W3C’s XML Canonicalization standard and RFC 8785’s JSON Canonicalization Scheme solve a closely analogous problem for digital signatures, ensuring that logically equivalent documents produce byte-identical representations through deterministic attribute/namespace ordering and whitespace/encoding normalization. Active wardens in network steganography introduced the concept of “Minimal Requisite Fidelity”, normalizing protocol traffic to remove all degrees of freedom not strictly required for semantics. Modern enterprise Data Loss Prevention systems implement egress filtering with deep content inspection for exactly these purposes.42
To gain an intuition of orders of magnitude and to tie those back to the threat model, we can use Fermi estimates for the unconstrained and constrained covert bandwidths. Note that the numbers presented here are informed guesses at best, and that concrete data about typical composition of AI cluster NS edge traffic was unavailable to the authors as of writing.
Estimates based on a standard 100,000 GPU cluster handling 10,000 requests/sec:
Further work is needed to test/check if these BOTECs are sound. Traffic composition of the NS edge traffic is tacit knowledge a domain expert (not us) may have, but ultimately, empirical work would offer the most credible grounding here.
E.2 Verifier Attacks
Our scope here specifically is the attacks that may be enabled by the taps:
Exfiltrate secrets
Poison a training run or sabotage/damage the prover’s cluster
For a verifier’s attack against the tap setup to succeed and exfiltrate secrets or harm the prover’s cluster, they would need to:
Plant a backdoor in tap hardware. This backdoor would need to go undetected even in a mutually monitored –or third party– secure facility performing inspection, or even a facility which manufactures/configures the hardware in the first place.
Alternatively, they may smuggle in “impostor taps” among the others before installation in the prover’s cluster.
A defense against this could be to keep the taps and their transportation enclosures/vehicles under surveillance from end to end.
Next, the compromised taps would need to encode secrets in hashes or otherwise low-volume data.
A straightforward defense would be to passively tap the link between the network taps and the verifier, so that the prover can passively observe everything the verifier receives. They can then verify legitimacy via recomputation.
While 2a) would legitimize the content of hashes, there may still be a covert channel via hash selection/order/timing. Defense: Fix delivery order to match original traffic timestamps. Batch hash delivery at fixed intervals.
Another attack vector does not require compromised taps at all: Dictionary attacks can decode preimages via hash lookup. To combat this, the taps would need to be configured such that hash preimage boundaries are sliced in a way that makes lookup combinatorially infeasible.
Another attack that would still require compromising the tap’s supply chain, albeit with a different goal, is a man-in-the-middle attack: If the prover can only ever communicate with their cluster through the verifier’s active tap, how do they know the tap is not editing the content rather than just the timing of the traffic? Solution: Passive taps attesting to the prover, before and after the verifier’s active tap on the NS link.
Redundant defense in depth: Supply chain security, potentially multi-supply chain redundancy for processors, with trusted comparison logic. Observability for the prover. Hashing scheme robust against advanced dictionary attacks.
E3: Conclusion
For both attackers, defense in depth narrows attack surfaces into risky, interdependent chains of potential failure points. In the prover’s case, attacks need to succeed not only once, but in every single attempt. Against an adversarial verifier, the most powerful defense is giving the prover passive observability of any declarations.
Appendix F: Feasibility of deployment at a global scale
Whole unit costs of network taps and their components are one aspect of their feasibility, the numbers needed for worldwide deployment (e.g. for a treaty) are another. We begin with repeating the point that deployment of active NS taps is trivial from the numbers perspective as well, based on the calculation in the NS tap feasibility section where we artificially inflated traffic by 100x and still landed below 0.01% of datacenter cost.
For EW traffic, we first consider the scale of worldwide AI accelerator deployment. Epoch AI estimates the current stock at ~15M H100 equivalents, with the stock growing to ~100M by 2027. Note that interconnect speeds are not growing in proportion to compute density. Anchoring on the 15M figure, we assume a scale of 58,600 SU-equivalents.
Per-SU component counts (Supermicro HGX H100 4U reference):
Storage: 64 splitters, 64 PINs, 3–6 FPGAs; retrofit adds 128 DR4 transceivers
Compute (any level): ~256 splitters, 1 OCS/selector, 1 FPGA; retrofit adds 512 DR4 transceivers
Total demand based on the Fermi estimate
Storage
~25,000,000 splitters
~25,000,000 PINs
~1,170,000–2,350,000 FPGAs
~50,000,000 DR4 transceivers (retrofit)
Compute (any level)
~100,000,000 splitters
~390,600 OCS/selectors
~390,600 FPGAs
~200,000,000 DR4 transceivers (retrofit)
Annual production numbers of the components:
Mid-range/high-end FPGAs: uncertain. Likely in the low millions, based on revenue figures. (No public unit-volume data available.)
400G/800G transceivers
2024: >20M 400G/800G units shipped (Cignal AI, Jan 2025)
2025: datacom optics market +60%, >$16B revenue (Cignal AI, May 2025)
2026: ~49M 800G + ~22M 1.6T units projected (CMBI equity research, Nov 2025)
1.6T reaching 10M units/year within 4 years of introduction (LightCounting, Sep 2024)
AI cluster optics market doubling from $5B (2024) to >$10B (2026) (LightCounting, Jan 2025)
OCS: ~10K units in 2023; >50K projected by 2029 (LightCounting via fibermall). Note: Cignal AI’s Dec 2025 revision projects the OCS market at >$2.5B by 2029, ~40% above their Jan 2025 forecast, suggesting the 50K figure may already be conservative.
Passive splitters: China Mobile alone ~200.9M over 2025–27 (ICC News, Jan 2025)
PAM4 DSPs: ~42M in 2024 (back-calculated); tripling to ~127M by 2029 (LightCounting, quoted in Marvell press release, Jan 2025)
Conclusion: The most critical bottlenecks are in the supply of optical circuit switches and SMF transceivers. Both are already struggling to serve existing demand.
For verification of a global treaty, a phased rollout strategy seems the most likely: Beginning with NS taps, continuing with storage taps over the course of perhaps months if urgent, or one year. Ramping up production of OCS (or other solutions for sampling light) as well as transceivers (or other solutions for tapping MMF links) over multiple years, or faster if urgent. Custom ASICs can help at high volume instead of FPGAs, and OCS technology appears primarily demand-constrained at current volumes, rather than facing fundamental supply-side bottlenecks. The rapid upward revision of OCS market forecasts (Cignal AI, Dec 2025) suggests production capacity can scale with demand.
This is true for any traffic communicating inputs and results of ML workloads in and out of individual servers.
Cost figures are only Bill of Materials estimates and do not include the physical security needed to protect against bypassing around taps. Facility cost spans the whole datacenter TCO, including accelerators.
This does not apply to rack-scale systems.
Deutsche Börse publicly disclosed deployment of dozens of such devices
Some network architectures skip some steps and scale directly from accelerators to racks (NVL72) or even to pods (CM384).
The payload (a user prompt, an inference response, a gradient tensor) gets wrapped in successive layers of metadata that tell the network how to deliver it: the device it is headed for, how to reassemble it if split into pieces, how to detect transmission errors, and more. Each layer adds its own header before passing the package down to the next. A tap observing raw traffic sees all of this: the payload plus every envelope wrapped around it. The application payload is what is semantically relevant for verification: what task was requested, what result was returned. Transport metadata is generated by networking software/firmware and cryptographic libraries, often incorporating (pseudo-)randomness for security. For example, TLS encryption requires fresh random values (nonces) in each session handshake. The TLS termination point –where encrypted internet traffic (North-South) is decrypted– determines who controls the layers above and below. In a typical datacenter, TLS terminates at an API gateway (which under usual circumstances is) controlled by the facility operator (the prover, in our context).
Network hardware (using Digital Signal Processors) already does this, and parsing specific bit-combinations is possible without any software.
See section 6 and appendix B.
(involving global SDN traffic engineering across dozens of peering metros (Google Espresso), unified WAN backbones 10× larger than prior SDN deployments (Microsoft ONEWAN), and CDN networks peering with 13,000+ networks at 405+ Tbps aggregate capacity)
Replay can check if the tensors match the expectation from the concurrent NS traffic/declarations of the prover.
Tight link budgets at these levels may make passive taps infeasible. See section 6 and appendix B.
When retrofitting hardware such as switches, one can do so in a way that also physically separates networks by purpose: Compute, In-band, OOB, Storage.
Technically feasible ≠ recommended. Full hashing can be sufficient, and simply requires the prover to retain NS traffic data.
SHA-3 256 (Keccak) is a natural choice for hardware hashing at high bandwidth, achieving ~34 Gbps per core in ~1,375 slices (~5,500 LUTs) on a Virtex-7 FPGA. Twelve parallel cores sustain ~410 Gbps hashing in ~66,000 LUTs, roughly 13% of a commodity Kintex UltraScale+ KU15P (522,720 LUTs, ~$4,000 or lower at volume). L2–L4 parsing at 400G adds another 5–10% of logic (Cabal et al., FPGA’18 demonstrated >1 Tbps; Attig & Brebner, ANCS 2011 demonstrated 686 Gbps on a single FPGA). The combined footprint—under 25% of a mid-range FPGA—leaves substantial headroom and means that processing power is not expected to be the main cost driver for network taps.
Whether TLS termination and HTTP parsing can be implemented in formally-verified, software-free hardware logic remains an open question (to us). The logic is narrower than general-purpose networking (only the specific API formats the datacenter uses need be handled) but modern TLS 1.3 and HTTP/2 state machines are nontrivial. We flag this as an area for further work.
The appendix example focuses on language models. In our own experiments with Z-image on H100 GPUs, we found that (fast, lightweight) image generation produces about ~300 times more output data per GPU hour compared to language models. However, this is within the order of magnitude of json overhead of individually streamed language tokens, especially if image compression is used for transfer.
Against analog and timing side-channels. Potentially even “active warden” functionality to constrain or eliminate steganographic degrees of freedom.
This estimate represents a “minimum traffic for maximum inference use”, rather than practical ground truth. There may be less critical, but higher-volume communication between a datacenter and the outside world.
Also known as protocol layer 2. See section “3.2 Protocol layers”.
SHA-3 256 (Keccak) is a natural choice for hardware hashing at high bandwidth, achieving ~34 Gbps per core in ~1,375 slices (~5,500 LUTs) on a Virtex-7 FPGA. Twelve parallel cores sustain ~410 Gbps hashing in ~66,000 LUTs, roughly 13% of a commodity Kintex UltraScale+ KU15P (522,720 LUTs, ~$4,000 or lower at volume). L2–L4 parsing at 400G adds another 5–10% of logic (Cabal et al., FPGA’18 demonstrated >1 Tbps; Attig & Brebner, ANCS 2011 demonstrated 686 Gbps on a single FPGA). The combined footprint—under 25% of a mid-range FPGA—leaves substantial headroom and means that processing power is not expected to be the main cost driver for network taps.
The sampling rates are mostly bottlenecked by evidence storage and the cost of optical components.
xAI’s Colossus-1 facility uses this exact hardware.
6 prefill units at 73.7k tokens/s, four characters per token average for English language, one Byte per character → 1.8MB/s ingress for the pod. Far less json packaging, as the whole input is packaged as one
“On average, single-mode transceivers continue to cost from 1.5 to 4–5 times more than multimode transceivers, depending on the data rate” – ofsoptics
NVIDIA DGX SuperPOD Data Center Design Guide (DG-11301-001). Reference designs typically use 4 connection points per link: Transceiver → Patch Panel (Rack) → Patch Panel (Spine) → Transceiver.
The GB200 NVL72 uses copper cabling rather than optics for its 130TB/s NVLink fabric (Continuum Labs), with 13.5TB unified HBM3e per rack (HPE). Rubin NVL72 scales to ~21TB HBM4 (VideoCardz); Rubin Ultra NVL576 to ~150TB HBM4e (Next Platform). As model-parallel workloads fit within these copper-interconnected scale-up domains, externally-visible (tappable) traffic shrinks correspondingly, without (active) copper tapping.
About KV transfer from storage: This is localized and occasional, not saturating the whole fabric globally.
Checkpointing a training run hits the storage fabric, but at what rate? If there is a checkpoint every 100 training steps, it would first go into RAM. 100 training steps of time to transfer a sharded checkpoint should be manageable for taps at the storage fabric.
Avoiding tapping into copper practically means tapping into inter-rack Infiniband/Ethernet, or SU/pod scale-out. This could mean that model parallelism fully within copper-NVLink would be practically inaccessible.
We assume provisioning of the 8 additional spine switches that would be needed for non-blocking scale-out.
Making the taps purely passive also addresses a potential concern about data poisoning: The verifier injecting backdoors into a prover’s training run.
This “analog selection” omits the need to parse the full traffic, and select digitally.
Or workloads whose instructions, inputs and outputs were captured by NS taps.
This drastically simplifies the verification with little added cost on the prover’s side. Without metadata, the verifier would need to search for the tensor in the enormous intermediate tensors of their replay.
If monitoring the scale-out traffic of the SU, rather than internal scale-up, is the goal. We assume non-blocking fat-tree topology.
APD-based receivers, using technology already in volume production for 400G-ER4 transceivers, provide a comfortable margin at 90/10 and are feasible at 95/5. Where APDs are unavailable, BDFA pre-amplification remains an option at higher cost. At 800G lane rates (112 GBaud), PIN receivers are insufficient for any practical tap ratio, and amplification becomes mandatory. Eliminating the amplifier where possible further reduces per-tap cost.
xAI’s Colossus 1 uses the SRS-48UGPU-AI-LCSU platform and cost approximately $9.3B for 200,000 GPUs (Epoch AI), including utilities and land. This implies ~$12M per 256-GPU SU.
We assume the tapped NS links to be the only communication channel considered within scope here, as this post is about network taps, rather than physical monitoring of a datacenter.
Taps at the storage traffic can help detect loading of undeclared models, but not covert, additional usage of declared models. Large scale data-parallel training, on the other hand, is directly measurable via all-reduce over the compute fabric. Still, enormous copper scale-up racks are on the horizon with NVIDIA’s Rubin Ultra generation, which are opaque without copper taps.
Secure, as in: Attested by the tap, not declarations or anything under the prover’s control.
DLP’s primary goal is preventing leakage of sensitive content, not eliminating covert-channel degrees of freedom. Still, it’s a reasonable “precedent for egress inspection and enforcement,” just not the same security claim.

















