自适应估计器展示了深度神经网络中的信息压缩 (Adaptive Estimators Show Information Compression in Deep Neural Networks)

from arxiv, Accepted as a poster presentation at ICLR 2019 and reviewed on OpenReview (available at https://openreview.net/forum?id=SkeZisA5t7). Pages: 11. Figures: 9

To improve how neural networks function it is crucial to understand their learning process. The information bottleneck theory of deep learning proposes that neural networks achieve good generalization by compressing their representations to disregard information that is not relevant to the task. However, empirical evidence for this theory is conflicting, as compression was only observed when networks used saturating activation functions. In contrast, networks with non-saturating activation functions achieved comparable levels of task performance but did not show compression. In this paper we developed more robust mutual information estimation techniques, that adapt to hidden activity of neural networks and produce more sensitive measurements of activations from all functions, especially unbounded functions. Using these adaptive estimation techniques, we explored compression in networks with a range of different activation functions. With two improved methods of estimation, firstly, we show that saturation of the activation function is not required for compression, and the amount of compression varies between different activation functions. We also find that there is a large amount of variation in compression between different network initializations. Secondary, we see that L2 regularization leads to significantly increased compression, while preventing overfitting. Finally, we show that only compression of the last layer is positively correlated with generalization.

翻译：为改善神经网络的功能，理解它们的学习过程至关重要。深度学习的信息瓶颈理论提出，神经网络通过压缩其表示来忽略与任务无关的信息，从而实现良好的泛化。然而，关于这一理论的经验证据存在冲突，因为只有当网络使用饱和激活函数时才观察到压缩。相反，使用非饱和激活函数的网络实现了相当水平的任务性能，但未表现出压缩。在本文中，我们开发了更加鲁棒的互信息估计技术，可以自适应于神经网络的隐藏活动并产生对来自所有函数的激活的更敏感的测量，特别是对于未给出边界的函数。使用这些自适应估计技术，我们探索了具有一系列不同激活函数的网络的压缩情况。通过两种改进估计的方法，首先，我们证明饱和的激活函数并非必需的压缩，不同激活函数之间的压缩量也有所不同。我们还发现，在不同的网络初始化之间压缩的变化很大。其次，我们发现L2正则化会导致显著增加压缩，同时防止过拟合。最后，我们发现只有最后一层的压缩与泛化呈正相关。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

【苏黎世联邦理工博士论文】深度神经网络的鲁棒性与正则化，233页pdf

专知会员服务

48+阅读 · 2022年11月4日