Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present NAO: a Nondeterministic tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. NAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement NAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. NAO reconciles scalability with verifiability for real-world heterogeneous ML compute.
翻译:神经网络日益运行在用户无法控制的硬件上(云端GPU、推理市场)。然而机器学习即服务几乎不透露实际运行内容或返回输出是否忠实反映预期输入。用户面对服务降级(模型替换、量化、图重写或广告嵌入篡改等差异)缺乏追索手段。验证输出之所以困难,是因为异构加速器上的浮点运算本质上具有非确定性。现有方法要么对实际浮点神经网络不实用,要么重新引入供应商信任。我们提出NAO:一种非确定性容差感知的乐观验证协议,该协议接受符合原则性算子级接受区域的输出,而非要求比特级相等。NAO融合两种误差模型:(i)严谨的逐算子IEEE-754最坏情况边界;(ii)跨硬件校准的严格经验百分位分布。差异触发基于默克尔树锚定、阈值引导的争议博弈,递归分割计算图直至剩余单个算子,此时裁决简化为轻量级理论边界检查或针对经验阈值的小规模诚实多数投票。未受挑战的结果在争议窗口期后最终确认,无需可信硬件或确定性内核。我们将NAO实现为PyTorch兼容运行时及当前部署于以太坊Holesky测试网的合约层。该运行时对计算图进行插装、计算逐算子边界,并以可忽略开销(Qwen3-8B上为0.3%)在FP32精度下运行未经修改的供应商内核。在A100、H100、RTX6000、RTX4090上对CNN、Transformer和扩散模型的测试表明,经验阈值较理论边界严格$10^2-10^3$倍,边界感知对抗攻击成功率为0%。NAO为现实世界异构机器学习计算实现了可扩展性与可验证性的统一。