Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in CIFAR-100 and ImageNet tasks. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization. Code is available at https://github.com/cognizant-ai-labs/aquasurf, and the benchmark datasets are at https://github.com/cognizant-ai-labs/act-bench.
翻译:本文旨在通过三个步骤提高神经网络在许多机器学习任务中的性能,其中激活函数的设计尤其重要。然而,人类很难构建最优激活函数,并且目前的激活函数搜索算法过于昂贵。本文通过首先训练三种不同架构(卷积、残差和视觉转换器)的模型,并通过2,913个系统生成的激活函数进行评估。其次,本文通过对所建立的基准数据集的表征方法进行改进,提出了一种基于代理建模的优化方法。具体来说,模型预测分布的Fisher 信息矩阵的谱和激活函数输出分布被发现高度预测性能。第三,我们使用代理模型在 CIFAR-100 和 ImageNet 任务中发现了更好的激活函数。本文的每个步骤都是对激活函数优化的进一步研究的实用和理论基础。代码可在 https://github.com/cognizant-ai-labs/aquasurf 上获得,基准数据集可在 https://github.com/cognizant-ai-labs/act-bench 上获取。