A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, constructions from approximation theory may be unrealistic and therefore less meaningful. For example, a common unrealistic trick is to encode target function values using infinite precision. To address these issues, this work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can SM approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the network. In addition, we show that transformers can SM approximate Turing machines with computation time bounded by $T$ with sample complexity polynomial in the alphabet size, state space size, and $\log (T)$. We also introduce new tools for analyzing generalization which provide much tighter sample complexities than the typical VC-dimension or norm-based bounds, which may be of independent interest.
翻译:----
统计上有意义的近似算法:基于Transformer的图灵机近似案例研究
Translated abstract:
本文提议了一种形式化的“统计上有意义的”(SM)近似定义,要求近似网络具有良好的统计可学性。我们研究了两类函数的SM近似:布尔电路和图灵机。我们证明过度参数化的前向神经网络可以用多项式级别的样本复杂度(而不是网络大小)对布尔电路进行SM近似。此外,我们证明了Transformer可以使用以字母表大小、状态空间大小和$\log (T)$为多项式的样本复杂度SM近似图灵机计算时间限制在$T$内。我们还介绍了用于分析泛化的新工具,提供比典型的VC维或基于范数的边界更紧的样本复杂度,这可能是独立意义的。