We present the Recurrent Interface Network (RIN), a neural net architecture that allocates computation adaptively to the input according to the distribution of information, allowing it to scale to iterative generation of high-dimensional data. Hidden units of RINs are partitioned into the interface, which is locally connected to inputs, and latents, which are decoupled from inputs and can exchange information globally. The RIN block selectively reads from the interface into latents for high-capacity processing, with incremental updates written back to the interface. Stacking multiple blocks enables effective routing across local and global levels. While routing adds overhead, the cost can be amortized in recurrent computation settings where inputs change gradually while more global context persists, such as iterative generation using diffusion models. To this end, we propose a latent self-conditioning technique that "warm-starts" the latents at each iteration of the generation process. When applied to diffusion models operating directly on pixels, RINs yield state-of-the-art image and video generation without cascades or guidance, while being domain-agnostic and up to 10$\times$ more efficient compared to specialized 2D and 3D U-Nets.
翻译:我们推出经常接口网, 这是一种神经网结构, 用来根据信息传播情况对输入进行适应性计算, 使输入量能够根据信息传播量进行按比例计算, 从而能够推广到迭代生成高维数据。 RIS的隐藏单位被分割到接口中, 与输入量和潜层在当地连接, 与输入量和潜层脱钩, 可以在全球范围内交流信息。 RIS区块有选择地从界面读到高容量处理的潜层, 并有选择地读回界面 。 堆积多个区块可以让本地和全球各级的有效线路连接。 虽然路由增加间接费用, 费用可以在经常计算设置中摊销, 输入量逐渐变化, 而全球环境则长期存在, 例如使用扩散模型进行迭代生成。 为此, 我们提出一种潜在的自我调节技术, 在生成过程的每一次循环中“ 暖化启动” 潜在潜力。 当应用直接在像素上运行的传播模型时, RINS 生成最先进的图像和视频生成方式, 没有级联或指导, 同时, 成为域域- 和最高10美元- D 。