Constant bit-size Transformers are known to be Turing complete, but existing constructions require $Ω(s(n))$ chain-of-thought (CoT) steps per simulated Turing machine (TM) step, leading to impractical reasoning lengths. In this paper, we significantly reduce this efficiency gap by proving that any $(t(n),s(n))$-bounded multi-tape TM can be simulated by a constant bit-size Transformer with an optimal $O(s(n))$-long context window and only $O(s(n)^c)$ CoT steps per TM step, where $c>0$ can be made arbitrarily small by letting the Transformers' head-layer product sufficiently large. In addition, our construction shows that sparse attention with fixed geometric offsets suffices for efficient universal computation. Our proof leverages multi-queue TMs as a bridge. The main technical novelty is a more efficient simulation of multi-tape TMs by synchronous multi-queue TMs, improving both time and space complexity under stricter model assumptions.
翻译:已知恒定比特大小的Transformer具有图灵完备性,但现有构造要求每个模拟的图灵机(TM)步骤需要Ω(s(n))步思维链(CoT)推理,导致推理长度不切实际。本文通过证明任何(t(n), s(n))有界多带图灵机均可由恒定比特大小的Transformer以最优O(s(n))长度上下文窗口进行模拟,且每个TM步骤仅需O(s(n)^c)步CoT推理(其中c>0可通过增大Transformer头层乘积任意减小),从而显著缩小了这一效率差距。此外,我们的构造表明具有固定几何偏移的稀疏注意力机制足以实现高效通用计算。证明过程以多队列图灵机作为桥梁,主要技术创新在于通过同步多队列图灵机更高效地模拟多带图灵机,在更严格的模型假设下同时改进了时间与空间复杂度。