Modern deep residual networks perform substantial redundant computation by evaluating all residual blocks for every input, even when identity mappings suffice. We introduce CosineGate, an end-to-end differentiable architecture for dynamic routing in residual networks that uses cosine incompatibility between identity and residual feature representations as a self-supervised skip signal. CosineGate measures semantic redundancy through the Cosine Incompatibility Ratio (CIR), defined as 1 - cos(x, F(x)), and uses Gumbel-Softmax relaxation to enable per-sample, per-block gating during training. A progressive FLOPs regularization term controls average compute usage without destabilizing optimization. On CIFAR-10, CosineGate spans the accuracy-efficiency Pareto frontier: an aggressive configuration achieves 89.9 percent accuracy with 24.1 percent FLOPs savings, a balanced configuration achieves 91.3 percent accuracy with 28.5 percent savings at epoch 160, and a conservative configuration reaches a peak of 93.2 percent accuracy with minimal compute reduction. These results match or exceed ResNet-20 (91.3 percent) while reducing computation, without auxiliary supervision, distillation, or task-specific heuristics. Our results demonstrate that simple geometric measures of feature incompatibility provide a principled and effective signal for dynamic residual routing.
翻译:现代深度残差网络对每个输入评估所有残差块时执行大量冗余计算,即使恒等映射已足够。我们提出CosineGate,一种用于残差网络中动态路由的端到端可微架构,其利用恒等特征表示与残差特征表示之间的余弦不相容性作为自监督跳跃信号。CosineGate通过余弦不相容比(CIR,定义为 1 - cos(x, F(x)))度量语义冗余,并采用Gumbel-Softmax松弛实现在训练期间进行逐样本、逐块的门控。渐进式FLOPs正则化项可在不破坏优化稳定性的前提下控制平均计算量使用。在CIFAR-10数据集上,CosineGate覆盖了精度-效率的帕累托前沿:激进配置以24.1%的FLOPs节省实现89.9%的准确率;平衡配置在第160轮时以28.5%的节省实现91.3%的准确率;保守配置则以最小计算量削减达到93.2%的峰值准确率。这些结果在减少计算量的同时,达到或超越了ResNet-20(91.3%)的性能,且无需辅助监督、蒸馏或任务特定启发式方法。我们的结果表明,特征不相容性的简单几何度量可为动态残差路由提供原则性且有效的信号。