The large number of ReLU non-linearity operations in existing deep neural networks makes them ill-suited for latency-efficient private inference (PI). Existing techniques to reduce ReLU operations often involve manual effort and sacrifice significant accuracy. In this paper, we first present a novel measure of non-linearity layers' ReLU sensitivity, enabling mitigation of the time-consuming manual efforts in identifying the same. Based on this sensitivity, we then present SENet, a three-stage training method that for a given ReLU budget, automatically assigns per-layer ReLU counts, decides the ReLU locations for each layer's activation map, and trains a model with significantly fewer ReLUs to potentially yield latency and communication efficient PI. Experimental evaluations with multiple models on various datasets show SENet's superior performance both in terms of reduced ReLUs and improved classification accuracy compared to existing alternatives. In particular, SENet can yield models that require up to ~2x fewer ReLUs while yielding similar accuracy. For a similar ReLU budget SENet can yield models with ~2.32% improved classification accuracy, evaluated on CIFAR-100.
翻译:在现有的深神经网络中,大量RELU的非线性操作在现有的深神经网络中,数量庞大的ReLU非线性操作使其不适合长期有效私人推断(PI)。现有的减少RELU操作的技术往往涉及人工操作和牺牲显著的准确性。在本文件中,我们首先提出非线性层ReLU敏感性的新尺度,从而能够减轻在识别同一内容方面耗费时间的人工操作。基于这种敏感性,我们然后提出Senet,这是一个三阶段培训方法,即对特定 ReLU预算进行三阶段培训,自动分配每层ReLU的计数,为每个层的激活图确定RELU位置,并用大大更少的ReLU进行模型培训,以产生潜在的延迟和通信效率PI。在各种数据集上采用多种模型的实验性评估显示SENet在减少ReLU和与现有替代品相比提高分类准确性方面的优劣性。特别是SENet可以产生要求最高为~2xRELU的模型,同时得出类似的精确性。对于类似的ReLU预算SENet100可产生具有~2.2%的精确性评估。