This study identifies and proposes techniques to alleviate two key bottlenecks to executing deep neural networks in trusted execution environments (TEEs): page thrashing during the execution of convolutional layers and the decryption of large weight matrices in fully-connected layers. For the former, we propose a novel partitioning scheme, y-plane partitioning, designed to (i) provide consistent execution time when the layer output is large compared to the TEE secure memory; and (ii) significantly reduce the memory footprint of convolutional layers. For the latter, we leverage quantization and compression. In our evaluation, the proposed optimizations incurred latency overheads ranging from 1.09X to 2X baseline for a wide range of TEE sizes; in contrast, an unmodified implementation incurred latencies of up to 26X when running inside of the TEE.
翻译:这项研究确定并提出了缓解在可信赖的执行环境中实施深神经网络的两个关键瓶颈的技术:在实施进化层过程中的页面擦拭和在完全相连的层中解密大型重量矩阵。对于前者,我们提出一个新的分隔方案,即Y-平板分割方案,目的是(一) 在层输出与TEE安全内存相比大的情况下提供一致的执行时间;以及(二) 显著减少卷积层的记忆足迹。对于后者,我们利用量化和压缩。在我们的评估中,拟议的优化产生了从1.09X到2X基线的悬浮间接压,用于广泛的TEE尺寸;相比之下,在TEE内运行时,未调整的执行时间为26X。