Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that stronger performance on harder examples (which require more iterations of inference to get correct) strongly correlates with the path independence of the system -- its tendency to converge to the same steady-state behaviour regardless of initialization, given enough computation. Experimental interventions made to promote path independence result in improved generalization on harder problem instances, while those that penalize it degrade this ability. Path independence analyses are also useful on a per-example basis: for equilibrium models that have good in-distribution performance, path independence on out-of-distribution samples strongly correlates with accuracy. Our results help explain why equilibrium models are capable of strong upwards generalization and motivates future work that harnesses path independence as a general modelling principle to facilitate scalable test-time usage.
翻译:设计能够以更高的推论预算取得更好业绩的网络十分重要,有助于向更困难的情况推广。最近的努力通过利用深度的经常性网络,显示了朝这个方向的可喜结果。我们表明,称为均衡模型的一大批结构在向上展示了很强的概括性,并发现较难的例子(这要求更多迭代推论以获得正确性能)的更强性能与系统的道路独立性密切相关 -- -- 其趋势是,不论初始化程度如何,都趋向于相同的稳定状态行为,经过足够的计算后。促进独立道路的实验性干预使更难的问题案例更加普遍化,而惩罚这种能力则削弱这种能力。路径独立分析在每例实例的基础上也是有用的:具有良好分配性表现的均衡模型,分配外采样的路径独立性与准确性密切相关。我们的成果有助于解释平衡模型为什么能够强有力地向上概括性,并激励未来工作,利用路径独立作为一般建模原则,便利可缩放的测试时间使用。