In this work, we present an analysis of the generalization of Neural Operators (NOs) and derived architectures. We proposed a family of networks, which we name (${\textit{s}}{\text{NO}}+\varepsilon$), where we modify the layout of NOs towards an architecture resembling a Transformer; mainly, we substitute the Attention module with the Integral Operator part of NOs. The resulting network preserves universality, has a better generalization to unseen data, and similar number of parameters as NOs. On the one hand, we study numerically the generalization by gradually transforming NOs into ${\textit{s}}{\text{NO}}+\varepsilon$ and verifying a reduction of the test loss considering a time-harmonic wave dataset with different frequencies. We perform the following changes in NOs: (a) we split the Integral Operator (non-local) and the (local) feed-forward network (MLP) into different layers, generating a {\it sequential} structure which we call sequential Neural Operator (${\textit{s}}{\text{NO}}$), (b) we add the skip connection, and layer normalization in ${\textit{s}}{\text{NO}}$, and (c) we incorporate dropout and stochastic depth that allows us to generate deep networks. In each case, we observe a decrease in the test loss in a wide variety of initialization, indicating that our changes outperform the NO. On the other hand, building on infinite-dimensional Statistics, and in particular the Dudley Theorem, we provide bounds of the Rademacher complexity of NOs and ${\textit{s}}{\text{NO}}$, and we find the following relationship: the upper bound of the Rademacher complexity of the ${\textit{s}}{\text{NO}}$ is a lower-bound of the NOs, thereby, the generalization error bound of ${\textit{s}}{\text{NO}}$ is smaller than NO, which further strengthens our numerical results.
翻译:在此工作中, 我们对神经操作员( NOs) 和衍生架构进行常规化分析 。 我们建议了一组网络, 我们命名为 $( textit{ sá{ text{ no ⁇ varepsilon$ ), 将NOs 的布局修改为一个类似变形器的架构; 我们主要将注意模块替换为 NOs 的综合操作员部分 。 由此产生的网络维护了普遍性, 对隐性数据有更好的概括化, 以及类似NOs 的更小参数 。 一方面, 我们通过将NOs 逐渐转换为 $( textit{ no ⁇ text{ noávarpsilon$ ), 核实测试损失的减少情况, 以不同频率修改时间- 协调波浪数据数据集。 我们将综合操作员( 非本地) 和( 本地) 反馈网络( MLL) 分成不同的层, 产生一个连续操作员结构 ( text{ text{ { norent_n_ $$) 。 (b) 在初始关系中, 我们增加数字连接, 和图层 测试中的 。