We present a holistic design for GPU-accelerated computation in TrustZone TEE. Without pulling the complex GPU software stack into the TEE, we follow a simple approach: record the CPU/GPU interactions ahead of time, and replay the interactions in the TEE at run time. This paper addresses the approach's key missing piece -- the recording environment, which needs both strong security and access to diverse mobile GPUs. To this end, we present a novel architecture called CODY, in which a mobile device (which possesses the GPU hardware) and a trustworthy cloud service (which runs the GPU software) exercise the GPU hardware/software in a collaborative, distributed fashion. To overcome numerous network round trips and long delays, CODY contributes optimizations specific to mobile GPUs: register access deferral, speculation, and metastate-only synchronization. With these optimizations, recording a compute workload takes only tens of seconds, which is up to 95% less than a naive approach; replay incurs 25% lower delays compared to insecure, native execution.
翻译:我们为信任区TEE的GPU加速计算提供了一个整体设计。 我们不将复杂的 GPU 软件堆放在TEE中, 我们采取简单的方法: 提前记录 CPU/ GPU 交互作用, 并在运行时重现TEE中的交互作用 。 本文介绍了这个方法缺失的关键部分 : 记录环境, 它需要强大的安全性和使用各种移动GPU。 为此, 我们提出了一个叫做 CODY 的新结构, 其中移动设备( 拥有 GPU 硬件) 和可信赖的云服务( 运行 GPU 软件) 以协作、 分配的方式运行 GPU 硬件/ 软件 。 为了克服众多的网络往返和长时间的延误, CCDY 贡献了移动式GPUs 特有的优化: 登记访问延迟、 投机和 仅建立元州同步 。 这些优化, 记录计算工作量只需要数十秒, 比天真的方法少95%; 重玩游戏比不安全的本地执行少25% 。