Operator learning techniques have recently emerged as a powerful tool for learning maps between infinite-dimensional Banach spaces. Trained under appropriate constraints, they can also be effective in learning the solution operator of partial differential equations (PDEs) in an entirely self-supervised manner. In this work we analyze the training dynamics of deep operator networks (DeepONets) through the lens of Neural Tangent Kernel (NTK) theory, and reveal a bias that favors the approximation of functions with larger magnitudes. To correct this bias we propose to adaptively re-weight the importance of each training example, and demonstrate how this procedure can effectively balance the magnitude of back-propagated gradients during training via gradient descent. We also propose a novel network architecture that is more resilient to vanishing gradient pathologies. Taken together, our developments provide new insights into the training of DeepONets and consistently improve their predictive accuracy by a factor of 10-50x, demonstrated in the challenging setting of learning PDE solution operators in the absence of paired input-output observations. All code and data accompanying this manuscript are publicly available at \url{https://github.com/PredictiveIntelligenceLab/ImprovedDeepONets.}
翻译:操作员学习技巧最近成为在无限的Banach 空间之间学习地图的强大工具。 在适当的制约下,它们也能够有效地以完全由自己监督的方式学习部分差异方程式(PDEs)的解决方案操作员。在这项工作中,我们通过神经唐氏内核理论(NTK)的透镜分析深操作员网络(DeepONets)的培训动态,并揭示一种偏向,偏向偏向偏向,偏向于更大范围的功能接近。为了纠正这种偏向,我们提议适应性地重新权衡每个培训范例的重要性,并展示这一程序如何有效地平衡通过梯度下降进行训练时反向再造梯度的梯度。我们还提出了一个新的网络结构,更能适应于消失梯度病变。加在一起,我们的发展为DeepONets的培训提供了新的见解,并以10-50x系数不断提高预测准确性,这表现在缺乏对口投入-输出观测时学习PDE解决方案操作员的富有挑战性的设置中。所有与该手稿所附的代码和数据都公开在urlasimate/Dasimaction/Gistriambligy/Inutistriabus/Intipligyoffetretretats/Drigy/Dismlusmlgy/Dislgy/Dislgligligligymbly}。