Is it possible to restructure the non-linear activation functions in a deep network to create hardware-efficient models? To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount of non-linearity in models to improve their hardware-awareness and efficiency. First, we propose RAN-explicit (RAN-e) -- a new hardware-aware search space and a semi-automatic search algorithm -- to replace inefficient blocks with hardware-aware blocks. Next, we propose a training-free model scaling method called RAN-implicit (RAN-i) where we theoretically prove the link between network topology and its expressivity in terms of number of non-linear units. We demonstrate that our networks achieve state-of-the-art results on ImageNet at different scales and for several types of hardware. For example, compared to EfficientNet-Lite-B0, RAN-e achieves a similar accuracy while improving Frames-Per-Second (FPS) by 1.5x on Arm micro-NPUs. On the other hand, RAN-i demonstrates up to 2x reduction in #MACs over ConvNexts with a similar or better accuracy. We also show that RAN-i achieves nearly 40% higher FPS than ConvNext on Arm-based datacenter CPUs. Finally, RAN-i based object detection networks achieve a similar or higher mAP and up to 33% higher FPS on datacenter CPUs compared to ConvNext based models.
翻译:能否在深网络中重组非线性激活功能,以创建硬件效率模型?为了解决这一问题,我们提议了一个新的范例,即改造可结构的激活网络(RAN),在模型中操纵非线性数量,以提高硬件意识和效率。首先,我们提议了RAN(RAN-e)扩展功能(RAN-e) -- -- 一个新的硬件认知搜索空间和半自动搜索算法 -- -- 以硬件识别区块取代效率低下的区块。接下来,我们提议了一种名为RAN-imapit(RAN-i)的不培训模式缩放方法,从理论上证明网络表态与非线性单位数量之间的关联。我们表明,我们的网络在不同尺度和几种类型的硬件中取得了最新水平的图像网络(RAN-e)结果。例如,与高效的Net-Lite-B0、RAN-e(PS)相比,在以1.5x基准为基础的超线性超线性卫星网络(FAS)中,RAN-i(RAN-I)将近2的精确度降低。