Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in CIFAR-100 and ImageNet tasks. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization. Code is available at https://github.com/cognizant-ai-labs/aquasurf, and the benchmark datasets are at https://github.com/cognizant-ai-labs/act-bench.
翻译:仔细设计的激活功能可以在许多机器学习任务中改善神经网络的性能。然而,人类很难建立最佳激活功能,而当前激活功能的搜索算法也极其昂贵。本文件旨在通过三个步骤改善最新状态:第一,基准数据集法案-邦奇-CNN、Act-Bench-ResNet和Act-Bench-Vit 法案-Bench-Vit,这是通过培训共进、剩余和视觉变压器结构从头到尾改进的,同时系统生成了2,913个启动功能。第二,开发了基准空间的特征,从而形成了一个新的基于代孕的优化方法。更具体地说,与模型在初始化时的预测分布和激活功能的输出分布相关的渔业信息矩阵的频谱被认为对性能具有高度的预测性。第三,代号用于在CFAR-100和图像网络任务中发现经改进的激活功能。这些步骤本身就有所贡献;它们一起成为了对激活功能优化进行进一步研究的实用和理论基础。代码可在 https://giflabs-qus/slievalbubs basbs.