Visual long-range interaction refers to modeling dependencies between distant feature points or blocks within an image, which can significantly enhance the model's robustness. Both CNN and Transformer can establish long-range interactions through layering and patch calculations. However, the underlying mechanism of long-range interaction in visual space remains unclear. We propose the mode-locking theory as the underlying mechanism, which constrains the phase and wavelength relationship between waves to achieve mode-locked interference waveform. We verify this theory through simulation experiments and demonstrate the mode-locking pattern in real-world scene models. Our proposed theory of long-range interaction provides a comprehensive understanding of the mechanism behind this phenomenon in artificial neural networks. This theory can inspire the integration of the mode-locking pattern into models to enhance their robustness.
翻译:视觉长距离互动是指在图像中的遥远特征点或块块之间建模依赖性,这可以大大增强模型的稳健性。CNN和变异器都可以通过分层和补丁计算来建立远程互动。然而,视觉空间远程互动的基本机制仍然不清楚。我们提出模式锁定理论作为基本机制,它制约了波浪之间的相位和波长关系,以达到模式拉动干扰波形。我们通过模拟实验来验证这一理论,并在现实世界的场景模型中展示模式锁定模式模式模式。我们提议的远程互动理论能够全面理解人造神经网络中这一现象背后的机制。这一理论可以激励将模式锁定模式纳入模型,以提高其稳健性。</s>