论ReLU激活函数在物理信息机器学习中的失效问题 (On the failure of ReLU activation for physics-informed machine learning)

Physics-informed machine learning uses governing ordinary and/or partial differential equations to train neural networks to represent the solution field. Like any machine learning problem, the choice of activation function influences the characteristics and performance of the solution obtained from physics-informed training. Several studies have compared common activation functions on benchmark differential equations, and have unanimously found that the rectified linear unit (ReLU) is outperformed by competitors such as the sigmoid, hyperbolic tangent, and swish activation functions. In this work, we diagnose the poor performance of ReLU on physics-informed machine learning problems. While it is well-known that the piecewise linear form of ReLU prevents it from being used on second-order differential equations, we show that ReLU fails even on variational problems involving only first derivatives. We identify the cause of this failure as second derivatives of the activation, which are taken not in the formulation of the loss, but in the process of training. Namely, we show that automatic differentiation in PyTorch fails to characterize derivatives of discontinuous fields, which causes the gradient of the physics-informed loss to be mis-specified, thus explaining the poor performance of ReLU.

翻译：物理信息机器学习利用控制常微分方程和/或偏微分方程来训练神经网络以表示解场。与任何机器学习问题一样，激活函数的选择会影响通过物理信息训练获得的解的特性与性能。多项研究在基准微分方程上比较了常见激活函数，一致发现整流线性单元（ReLU）的表现逊于sigmoid、双曲正切和swish等激活函数。本文旨在诊断ReLU在物理信息机器学习问题中表现不佳的原因。尽管已知ReLU的分段线性形式使其无法用于二阶微分方程，但我们证明ReLU甚至在仅涉及一阶导数的变分问题中也会失效。我们将此失效归因于激活函数的二阶导数——这些导数并非在损失函数构建中直接出现，而是在训练过程中产生。具体而言，我们证明PyTorch中的自动微分无法准确表征不连续场的导数，导致物理信息损失的梯度被错误指定，从而解释了ReLU的较差性能。