统计上有意义的近似接近:关于与变压器相近的涡轮机的案例研究 (Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers)

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, the constructions from approximation theory often have unrealistic aspects, for example, reliance on infinite precision to memorize target function values, which make these results potentially less meaningful. To address these issues, this work proposes a formal definition of statistically meaningful approximation which requires the approximating network to exhibit good statistical learnability. We present case studies on statistically meaningful approximation for two classes of functions: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can statistically meaningfully approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the approximating network. In addition, we show that transformers can statistically meaningfully approximate Turing machines with computation time bounded by $T$, requiring sample complexity polynomial in the alphabet size, state space size, and $\log (T)$. Our analysis introduces new tools for generalization bounds that provide much tighter sample complexity guarantees than the typical VC-dimension or norm-based bounds, which may be of independent interest.

翻译：理论上研究神经网结构的常见透镜是分析它们可以估计的功能。然而,近似理论的构造往往具有不切实际的方面,例如,依靠无限精确度来记忆目标功能值,使这些结果可能不太有意义。为了解决这些问题,这项工作提议对具有统计意义的近似进行正式定义,要求接近网络来显示良好的统计学习能力。我们介绍了关于两种功能类别:布尔伦电路和图灵机具有统计意义、具有统计意义的近近似值的案例研究。我们显示,超分度的饲料神经网在统计上能够有意义地接近具有样本复杂性的布林电路,其抽样复杂性仅取决于电路大小,而不是近似网络的大小。此外,我们表明,变压器在统计上可以有实际意义地接近图灵机,计算时间由$T约束,要求按字母大小、州空间大小和美元/log(T)的样本复杂度。我们的分析提出了与典型的VC-dimenion或基于规范的利息界限相比,提供更紧密的样本复杂性保障的新工具。