优化激活函数的高效方法：基于代理建模 (Efficient Activation Function Optimization through Surrogate Modeling)

Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in CIFAR-100 and ImageNet tasks. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization. Code is available at https://github.com/cognizant-ai-labs/aquasurf, and the benchmark datasets are at https://github.com/cognizant-ai-labs/act-bench.

翻译：机器学习任务中，经过精心设计的激活函数可以提高神经网络的性能。然而，人类很难构造最优激活函数，而当前的搜索算法又过于昂贵。本文试图通过三个步骤改进现有技术：一、通过使用 2,913 种系统生成的激活函数，从零开始训练卷积、残差和 Vision Transformer 架构，创建基准数据集 Act-Bench-CNN、Act-Bench-ResNet 和 Act-Bench-ViT。二、对基准空间进行表征，开发出一种新的基于代理模型的优化方法。具体而言，在初始化时模型预测分布所关联的 Fisher 信息矩阵与激活函数输出分布的光谱被发现高度预测了性能。三、代理模型被用于在 CIFAR-100 和 ImageNet 任务中发现了改进的激活函数。每个步骤都是独立的贡献；它们共同为激活函数优化的进一步研究提供了实际和理论基础。代码可在 https://github.com/cognizant-ai-labs/aquasurf 上找到，基准数据集可在 https://github.com/cognizant-ai-labs/act-bench 找到。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【NeurIPS 2021】通过元学习优化可重用知识实现持续学习

专知会员服务

23+阅读 · 2021年9月30日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NeurIPS 2020 】神经网络结构生成优化

专知会员服务

21+阅读 · 2020年10月24日