Catching misreporting about ML hardware use, Part III
You do not need any (mutually) trusted chips after all. Only trust in physics and basic cryptography.
Summary
My team and I tackled the specific problem I left hanging at the end of part II: how to build network taps that both parties, the mutually distrusting prover and verifier, can trust. Since the trust and security requirements are immense, we need a solution that does not require mutual trust in complex, hard-to-inspect logic. Can you build a mutually trusted device using only unilaterally trusted processors? In this post, we describe a possible solution: Passive optical splitters enable a “watching the watchers” architecture that satisfies both parties’ assurance needs.
This part reports on joint work with Jakub Kryś, Jonathan Ng, Felix Krückel and Luke Marks conducted at the Apart Research Technical AI Governance Challenge. [Link to the paper]
Recap and Intro
In order to verify compliance to an international AI treaty, one needs more than voluntary commitments and blind trust. Verification needs to be robust enough to make the expected return of cheating negative. We distinguish four levels of assurance for treaty verification:
With one exception, my blog has so far focused on levels 2 and 3:
In part 1, I showed how re-execution on identical hardware can pin down the precise conditions under which ML outputs were produced. Naturally, a problem here is that using the prover’s technology stack to catch the prover’s lies seems fallible.
In part 2, I presented how you can still achieve this: by using information isolation for a clone of the prover’s system, trusted network taps and a verifier-controlled replay device for (non-bitwise) additional sanity checks.
Let us recall that setup:
The whole setup critically relies on those network taps:
On the side of the Prover’s servers, we need a mutually trusted tap that sends confidentiality-preserving, but still untampered fingerprints (hashes) of those servers’ I/O to the verifier.
On the verification facility’s side, we need a tap that only forwards computational inputs, and compares the untrusted replay device’s1 outputs with the prover’s claimed outputs.
I ended the post with a promise to dive deeper into the network taps that make this possible. For both parties, they are a critical point of failure, so they need strong assurance that those TCBs –as I called them– can actually be trusted. At the end of this post, I will also present how passive taps can improve the design sketched out above.
Trust In Silicon Is Hard
The naive approach would be to try to design a custom chip with transparent design, supply chains and tooling. Supply-chain hardened FPGAs exist and are used in high-stakes applications such as air-defense comm networks.
But OC5 adversaries are extremely sophisticated and resourceful. Proving a negative is inherently challenging here, especially considering the impressive creativity of attackers: In one example, researchers have demonstrated hardware trojans that work by altering dopant distributions during fabrication. No added gates, no changed layout, nearly impossible to detect even with electron microscopy. Also, about the logic being simple: If a network tap intercepts light, it needs a sophisticated, complex digital signal processor to turn light amplitudes into bits. Supply-chain hardening such processors for even unilateral natsec trust takes years and expensive infrastructure. For three decades, nuclear arms control researchers have tried to build mutually trusted electronic systems for nuclear treaty verification, and failed. As one review put it, “there is at present no accepted method for proving the absence of vulnerabilities in an electronic system.”
Watching the Watchers
We could try to solve the hard problem that has been elusive for decades (supply-chain verification of digital logic). Instead, my team and I propose a different approach: “Watching the watchers” by having mutually distrusting chips check each other’s inputs and outputs. The key enabler here is passive signal splitting, which works particularly well for optical fiber:
Passive optical fiber splitters divide light using no logic, or any digital processing. FBT2 splitters are simply two strands of glass fused together. PLCs3 are etched glass chips. Neither contains transistors, firmware, or updatable logic. This is straightforward to fabricate and inspect, especially given that glass is transparent. In principle, this gives us a rudimentary, physics based method for cloning the signal in and out of any device fitted with a transceiver.4 Optical isolators are devices that transmit light in only one direction. They are commonly used in fiber optics to protect lasers against backreflection. In principle, this gives us a simple, physics based method for ensuring that light signals travel only one way.
In short, instead of trusting any processor to perform its task correctly, their inputs and outputs can be made passively observable, by transmitting them via optical fiber. With this, we can eliminate the need for any mutually trusted processor, and design a self-monitoring system.
The Border Patrol Device
Let us consider the network tap on the prover’s servers first. Its responsibilities are:
Ensure that all traffic between the prover and their servers is hashed.
Send these hashes to the verifier, and nothing else.
Erase any potential communication channels between the prover and their servers that can not be retroactively verified. This includes analog and timing side-channels, but also ephemeral (short-lived) nonces (single-use numbers) in metadata.
Do all this without needing any mutually trusted digital processing.
This is how our team pictures the solution for the first and second goals (at the macro-level):
The most important points about the macroarchitecture (placement of the border patrol device):
The facility is under physical monitoring, ensuring that any traffic in and out of the cluster needs to pass through the border patrol chokepoint.
Every bit of traffic sent to the verifier (hashes only) can be passively observed by the prover. This places the root of trust in physics: Beam splitting optical isolators ensure the prover can observe and double-check that hashes have been computed exactly as they expect, without being able to interfere with the verifier’s monitor leg. Timestamps can be computed by both parties upon receiving the hashes at practically the same time.
For unfamiliar readers, TLS is the encryption between the outside world and the datacenter. Unencrypted communication within datacenters is not unusual. With key escrows, or retroactive zero-knowledge proofs, or similar, we expect the proposed setup to also be adaptable for working with encrypted traffic, but here we leave this out of scope for simplicity.
Closing the Remaining Channels
Now that the border patrol sits at the datacenter edge network chokepoint, the question is: what exactly happens inside the “patrol box”?
In the paper, we call our solution the “border patrol device”, as its responsibility goes beyond simply fingerprinting inputs and outputs of the prover’s cluster: It also actively decontaminates the traffic from any possible covert channels that could not be closed off by post-hoc verification via replay (see part I and part II of this series). For those channels, hashes would be insufficient:
Timing. The precise time of the communication of a packet can carry information by itself. For maximum theoretical capacity, the Lee et al. state: “if an overt channel sends at 1 Gbps with 1518 byte packets, the maximum capacity of the covert channel is 81,913 bits per second (bps).” Interestingly, the covert bandwidth decreases on higher bandwidth links, as the headroom of timing between packets diminishes at higher packet density.
Analog side channels. Light modulation that the monitor leg (the verifier’s receiver) can not detect as suspicious. In optical fiber, this may be different wavelengths, polarizations etc. A straightforward mitigation would be to receive, digitalize, and re-transmit the signal, which re-timing is doing anyway.
Steganography. This is where solutions are less obvious. Steganography hides information in secret encodings that are only meaningful to someone who knows their heuristics. Post-hoc verification can verify against it by tightly constraining outputs, proving precisely which inputs and computations produced them (see part I). However, not all steganographic degrees of freedom (arbitrary choice that can hide information) are constrained by strict computational causality. Our paper goes into more detail on this.
Active Wardens Without Mutually Trusted Chips
Post-hoc verification via replay constrains steganographic degrees of freedom in payload data, but not necessarily for the metadata wrapped around it. Steganographic degrees of freedom can be erased by overwriting arbitrary fields with canonical, strictly formatted bits. For example, only zeros for padding IP packets. Devices that actively replace such information are commonly called active wardens.
However, this introduces yet another device whose silicon and software (if there is any) would be challenging to establish mutual trust in. So far, every existing implementation assumes unilateral trust of whoever made and operates the warden device.
To make matters worse, some network protocol fields can not be overwritten with canonicalized numbers (such as zeros), as they need to be random and unpredictable for security reasons, such as initial sequence numbers for establishing TCP handshakes between the cluster and its remote users5. How can two mutually distrusting parties produce random numbers that neither party can predict?6
There is an established answer to this question: Coin flip protocols. Two mutually distrusting parties first commit to numbers by sharing hashes, then reveal. These numbers are combined as seeds into a function that generates an output neither party could have predicted, as they committed their seeds before seeing the other’s seeds.
Putting It Together: The Device Architecture
With all this context now established, we can introduce the design of the border patrol device:
The most important points about the microarchitecture (the border patrol device itself):
Here we illustrate the functionality of the border patrol device: An active warden without supply chain trust. We combine a coin flip protocol and passive observability of the warden’s behaviour to eliminate the requirement of mutual trust in any device. The Prover and Verifier each integrate a device only they alone trust to represent them. Both devices participate in the coin flip and check if the warden scrubs data packets correctly.7
In the illustration above, we colour-code the minimum device trust assumptions. This does not mean that transparent design and supply chains could not also establish mutual trust in components, which would further add defense in depth.8
Re-timing eliminates both analog side channels (e.g. light modulation in optical fiber) as well as timing modulation at any relevant bandwidth. Retiming is performed by the Active Warden device by buffering all incoming packets (e.g. for 10ms) and then sending them out in their original order at maximum bandwidth to combat attacks such as PHY-Covert Channels.
We present an enhanced design in the appendix.
While our proposal is conceptual and tested in simulation rather than real-world conditions, we emphasize that all of the individual components9 for the border patrol device are established: Passive optical fiber splitters, coin flip protocols, hash functions+timestamps10 and supply-chain hardened FPGAs for computing at high line rates, protocol header scrubbing/traffic normalization, and re-timing devices against timing and analog side channels.
Another reason why we expect the development, testing and deployment of such devices to be a high-return “low hanging fruit” for technical AI governance is the surprisingly low bandwidth needed to serve millions of customers with hundreds of thousands of GPUs in a gigacluster such as xAI’s Colossus: In our paper, we present a calculation for the order of magnitude of tokens/s such a datacenter receives and sends at full capacity. For 100,000 Hopper GPUs streaming 200 million tokens per second, each token sent in its own, inefficient json packet, 400Gb/s of bandwidth should be sufficient. This is eight household Ethernet cables, and commercially available NICs and FPGAs can process >800Gb/s in single, couple thousand dollar devices.
The particular combination of these components we are proposing is novel, while the components are not, and we simulate their behaviour using software written in Python and available on Github.
Conclusion, Implications and Further Work
Mutually trusted (or observable) network taps can be built without requiring mutual trust in complex processors. We do not expect our proposed design to be definitive, but the principle of passive observation and independent comparison works not only on the macro-scale (the verifier re-executing the prover’s claimed computations in a dedicated cluster), but also on the micro-scale (the monitoring devices checking each other against tampering or exfiltration, using re-execution).
One important clarification is that the proposed border patrol device could only robustly defend against covert egress out of the cluster, not steganography in ingress. The reason is that inputs, such as user prompts, are more difficult to re-write in real time while preserving semantics and service quality.11 We leave this to further work. The implication of constrained egress, but not ingress, is that the border patrol device does not defend against orchestration and execution of covert workloads, only the exfiltration of their results. Threat modeling for significant, but extremely low egress-bandwidth computations is also left to future work. However, we expect there to be potential for synergy with other approaches which are more focused on verification of computation directly, rather than communication, such as the floating point fingerprinting proposed in the first part of this series.
Passive optical splitters are an under-appreciated asset. The “watching the watchers” architecture enables verification setups where neither party needs to trust the other’s processors. This sidesteps the tough problem of building mutually trusted silicon under nation-state threat models.
Hardware demonstration. We encourage further work on a physical implementation of the proposed border patrol device, and are open to contributing and collaborating. We expect the bill of materials to cost less than the researcher and engineer time required to build it, given that no new supply chains are needed and most if not all required components are on the market already.
If you find the proposal compelling and would like to engage in/support further work, please do not hesitate to reach out:
naci.c@protonmail.com.
I want to thank Jakub Kryś, Jonathan Ng, Felix Krückel and Luke Marks for having been an amazing team to work with. I am proud of, and impressed by, everyone’s contributions.
Bonus: An update for the TRD/URD design from part II
Motivated by the new insights presented in this post, we can update the previous post’s TRD/URD system using passive taps. The colour-coding was changed to match the trust assumptions better, with orange denoting the verifier’s trust, blue for the prover and green for the parts that need physical monitoring and inspection (which is more tractable for cables, buildings and optical splitters than for chips).
The most important change here is the re-labeling of the network tap from “network tap trusted by both parties” to “border patrol device”, highlighting how trust assumptions can be loosened thanks to the observability that passive, one-way taps provide. The prover now also need not trust the verifier’s TRD anymore, as they can observe every bit it communicates to the verifier, which drastically improves near-term feasibility. One could even integrate a re-timer controlled by the prover (timing warden) and passively monitored by a verifier-controlled device, as in the appendix below. One other new detail is the elimination of the TCB/network tap at the URD. After all, if the verifier already has the TRD, why not let it control and instruct the URD?
Appendix: An enhanced design for the border patrol device
The architecture shown above combats timing and analog modulation side-channels between the verifier and their hash processor in the border patrol device. We refer to Uttarwar et al. for more details on the timing warden. Another enhancement here is passive observation on both sides of the active warden.
Remember that this one is identical to the system (hardware and software) used by the Prover. Hence, it is not trusted by the Verifier.
Fused Biconical Taper, FBT. They work via evanescent coupling between closely positioned fiber cores, which is a fancy way of saying that the light wave of one core leaks into the other.
Planar Lightwave Circuits, PLC. Multimode interference within a lithographed glass chip.
We avoid the complexities of transceiver link budgets here, as we are talking about devices that are designed and built for having their fiber links duplicated.
Modern TCP implementations use random ISNs to prevent connection hijacking attacks. If ISNs were predictable (as they were in early TCP implementations), attackers could guess sequence numbers and inject malicious data into active connections or spoof connections.
Since datacenters communicate with the internet, the ability to predict session IDs would allow the verifier to compromise the prover’s TCP sessions or the prover to secretly communicate with the cluster.
To prevent latency penalties, random nonces are not generated on-demand. Instead, the coin-flip protocol populates a pre-computed entropy buffer. The active warden consumes these numbers in a strict First-In-First-Out (FIFO) order to overwrite protocol headers at line rate.
The information path between the verifier and the hashing processor in particular is precarious if the processor is controlled by only them. While the prover can observe and double-check this information, photonic or timing modulation on the outputs themselves may be dangerous channels. In the appendix, we present an extended version of the border patrol device designed to defend against such channels.
With the possible exception of the coin flip devices. However, since only unilateral trust is required, off-the-shelf CPUs, perhaps TEEs may suffice.
Timestamps can of course be computed after the verifier and prover have received their hashes. We do not expect nanosecond-microsecond precision to be necessary here.
Potentially, active wardens could even re-write this, while preserving semantics. For images, there are a variety of publications about erasing steganography by adding (barely) perceptible noise.







