通过代用模型模型的高效激活功能优化 (Efficient Activation Function Optimization through Surrogate Modeling)

Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in CIFAR-100 and ImageNet tasks. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization. Code is available at https://github.com/cognizant-ai-labs/aquasurf, and the benchmark datasets are at https://github.com/cognizant-ai-labs/act-bench.

翻译：仔细设计的激活功能可以在许多机器学习任务中改善神经网络的性能。然而,人类很难建立最佳激活功能,而当前激活功能的搜索算法也极其昂贵。本文件旨在通过三个步骤改善最新状态:第一,基准数据集法案-邦奇-CNN、Act-Bench-ResNet和Act-Bench-Vit 法案-Bench-Vit,这是通过培训共进、剩余和视觉变压器结构从头到尾改进的,同时系统生成了2,913个启动功能。第二,开发了基准空间的特征,从而形成了一个新的基于代孕的优化方法。更具体地说,与模型在初始化时的预测分布和激活功能的输出分布相关的渔业信息矩阵的频谱被认为对性能具有高度的预测性。第三,代号用于在CFAR-100和图像网络任务中发现经改进的激活功能。这些步骤本身就有所贡献;它们一起成为了对激活功能优化进行进一步研究的实用和理论基础。代码可在 https://giflabs-qus/slievalbubs basbs.

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日