State-space models (SSMs), Mamba in particular, are increasingly adopted for long-context sequence modeling, providing linear-time aggregation via an input-dependent, causal selective-scan operation. Along this line, recent "Mamba-for-vision" variants largely explore multiple scan orders to relax strict causality for non-sequential signals (e.g., images). Rather than preserving cross-block memory, the conventional formulation of the selective-scan operation in Mamba reinitializes each block's state-space dynamics from zero, discarding the terminal state-space representation (SSR) from the previous block. Arcee, a cross-block recurrent state chain, reuses each block's terminal state-space representation as the initial condition for the next block. Handoff across blocks is constructed as a differentiable boundary map whose Jacobian enables end-to-end gradient flow across terminal boundaries. Key to practicality, Arcee is compatible with all prior "vision-mamba" variants, parameter-free, and incurs constant, negligible cost. As a modeling perspective, we view terminal SSR as a mild directional prior induced by a causal pass over the input, rather than an estimator of the non-sequential signal itself. To quantify the impact, for unconditional generation on CelebA-HQ (256$\times$256) with Flow Matching, Arcee reduces FID$\downarrow$ from $82.81$ to $15.33$ ($5.4\times$ lower) on a single scan-order Zigzag Mamba baseline. Efficient CUDA kernels and training code will be released to support rigorous and reproducible research.
翻译:状态空间模型(SSMs),尤其是Mamba模型,正日益广泛地应用于长上下文序列建模,通过输入依赖的因果选择性扫描操作实现线性时间聚合。沿此方向,近期出现的“视觉Mamba”变体主要探索多种扫描顺序,以放宽对非序列信号(如图像)的严格因果性约束。传统Mamba中的选择性扫描操作并非保留跨块记忆,而是将每个块的状态空间动态从零重新初始化,丢弃了前一块的终端状态空间表示。Arcee作为一种跨块循环状态链,将每个块的终端状态空间表示复用为下一块的初始条件。块间传递被构造为一个可微分的边界映射,其雅可比矩阵实现了跨终端边界的端到端梯度流。Arcee的关键实用优势在于:与所有现有“视觉Mamba”变体兼容、无需额外参数、且引入恒定且可忽略的计算开销。从建模视角,我们将终端状态空间表示视为因果遍历输入所诱导的温和方向先验,而非非序列信号本身的估计器。为量化其影响,在CelebA-HQ(256×256)数据集上使用流匹配进行无条件生成时,Arcee将单扫描顺序Zigzag Mamba基线的FID从82.81降至15.33(降低5.4倍)。我们将发布高效的CUDA内核与训练代码,以支持严谨且可复现的研究。