We present Sprecher Networks (SNs), a family of trainable neural architectures inspired by the classical Kolmogorov-Arnold-Sprecher (KAS) construction for approximating multivariate continuous functions. Distinct from Multi-Layer Perceptrons (MLPs) with fixed node activations and Kolmogorov-Arnold Networks (KANs) featuring learnable edge activations, SNs utilize shared, learnable splines (monotonic and general) within structured blocks incorporating explicit shift parameters and mixing weights. Our approach directly realizes Sprecher's specific 1965 sum of shifted splines formula in its single-layer variant and extends it to deeper, multi-layer compositions. We further enhance the architecture with optional lateral mixing connections that enable intra-block communication between output dimensions, providing a parameter-efficient alternative to full attention mechanisms. Beyond parameter efficiency with $O(LN + LG)$ scaling (where $G$ is the knot count of the shared splines) versus MLPs' $O(LN^2)$, SNs admit a sequential evaluation strategy that reduces peak forward-intermediate memory from $O(N^2)$ to $O(N)$ (treating batch size as constant), making much wider architectures feasible under memory constraints. We demonstrate empirically that composing these blocks into deep networks leads to highly parameter and memory-efficient models, discuss theoretical motivations, and compare SNs with related architectures (MLPs, KANs, and networks with learnable node activations).
翻译:本文提出Sprecher网络(SNs),这是一类受经典Kolmogorov-Arnold-Sprecher(KAS)多元连续函数逼近构造启发的可训练神经架构。与采用固定节点激活函数的多层感知机(MLPs)以及具有可学习边激活函数的Kolmogorov-Arnold网络(KANs)不同,SNs在包含显式平移参数和混合权重的结构化模块中,利用共享的可学习样条(单调与通用样条)。我们的方法在其单层变体中直接实现了Sprecher于1965年提出的特定平移样条求和公式,并将其推广至更深层的多层组合。我们进一步通过可选的横向混合连接增强该架构,该连接实现了输出维度在模块内的通信,为全注意力机制提供了一种参数高效的替代方案。除了具备$O(LN + LG)$的参数效率(其中$G$为共享样条的节点数),优于MLPs的$O(LN^2)$,SNs还允许一种顺序评估策略,将前向传播的峰值中间内存从$O(N^2)$降低至$O(N)$(将批次大小视为常数),从而使得在内存限制下构建更宽的网络架构成为可能。我们通过实验证明,将这些模块组合成深度网络可得到高度参数和内存高效的模型,讨论了理论动机,并将SNs与相关架构(MLPs、KANs以及具有可学习节点激活函数的网络)进行了比较。