The choice of activation functions is crucial for modern deep neural networks. Popular hand-designed activation functions like Rectified Linear Unit(ReLU) and its variants show promising performance in various tasks and models. Swish, the automatically discovered activation function, has been proposed and outperforms ReLU on many challenging datasets. However, it has two main drawbacks. First, the tree-based search space is highly discrete and restricted, which is difficult for searching. Second, the sample-based searching method is inefficient, making it infeasible to find specialized activation functions for each dataset or neural architecture. To tackle these drawbacks, we propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method. It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO. For example, on ImageNet classification dataset, PWLU improves 0.9%/0.53%/1.0%/1.7%/1.0% top-1 accuracy over Swish for ResNet-18/ResNet-50/MobileNet-V2/MobileNet-V3/EfficientNet-B0. PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.
翻译:激活功能的选择对现代深层神经网络至关重要。 普通的手工设计的激活功能, 如校正线性单元( RELU) 及其变体, 显示在各种任务和模型中的有希望的性能。 自动发现的激活功能Swish 已经提出, 并且在许多具有挑战性的数据集上优于 ReLU 。 但是, 它有两个主要的缺点。 首先, 基于树的搜索空间高度离散和限制, 难以搜索。 其次, 基于样本的搜索方法效率低下, 使得无法为每个数据设置或神经结构找到专门的激活功能。 为了解决这些缺陷, 我们提议一个新的激活功能, 叫做Papwistef- 线性单元( PWLULU), 其中包括精心设计的配制和学习方法。 它可以学习专门的激活功能, 并在图像网络和COCO等大型数据集上实现SOTA的性能。 例如, PWLU在图像网络应用数据设置上改进了0.9%/ 0. 5/1.0%/ 1. 1.7/1.0% /1.0% 在Swish real- real- real- ResNet- 应用程序中, 也是在实际应用的P50/Wile2/Wile2/Ubwildeal) 。 在实际应用中可以广泛执行。 。 。