YOLO-Master：基于专家混合与专用Transformer增强的实时检测加速框架 (YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection)

Existing Real-Time Object Detection (RTOD) methods commonly adopt YOLO-like architectures for their favorable trade-off between accuracy and speed. However, these models rely on static dense computation that applies uniform processing to all inputs, misallocating representational capacity and computational resources such as over-allocating on trivial scenes while under-serving complex ones. This mismatch results in both computational redundancy and suboptimal detection performance. To overcome this limitation, we propose YOLO-Master, a novel YOLO-like framework that introduces instance-conditional adaptive computation for RTOD. This is achieved through a Efficient Sparse Mixture-of-Experts (ES-MoE) block that dynamically allocates computational resources to each input according to its scene complexity. At its core, a lightweight dynamic routing network guides expert specialization during training through a diversity enhancing objective, encouraging complementary expertise among experts. Additionally, the routing network adaptively learns to activate only the most relevant experts, thereby improving detection performance while minimizing computational overhead during inference. Comprehensive experiments on five large-scale benchmarks demonstrate the superiority of YOLO-Master. On MS COCO, our model achieves 42.4% AP with 1.62ms latency, outperforming YOLOv13-N by +0.8% mAP and 17.8% faster inference. Notably, the gains are most pronounced on challenging dense scenes, while the model preserves efficiency on typical inputs and maintains real-time inference speed. Code will be available.

翻译：现有的实时目标检测方法通常采用类YOLO架构，以在精度与速度间取得良好平衡。然而，这些模型依赖于静态密集计算，对所有输入进行统一处理，导致表征能力与计算资源的错配分配——例如在简单场景中过度分配资源，而在复杂场景中分配不足。这种不匹配既造成计算冗余，也导致检测性能欠佳。为克服此局限，我们提出YOLO-Master，一种新颖的类YOLO框架，其通过实例条件自适应计算机制实现实时目标检测。该机制通过高效稀疏专家混合模块实现，可根据输入场景的复杂度动态分配计算资源。其核心在于一个轻量级动态路由网络，该网络在训练期间通过多样性增强目标引导专家专业化，促进专家间形成互补性专长。此外，路由网络能自适应地学习仅激活最相关的专家，从而在提升检测性能的同时最小化推理时的计算开销。在五个大规模基准测试上的综合实验验证了YOLO-Master的优越性。在MS COCO数据集上，我们的模型以1.62毫秒延迟取得42.4%的平均精度，较YOLOv13-N提升0.8%平均精度且推理速度加快17.8%。值得注意的是，该模型在挑战性密集场景中提升尤为显著，同时在典型输入上保持高效，并维持实时推理速度。代码将公开提供。

相关内容

Yolo

关注 28

Yolo算法，其全称是You Only Look Once: Unified, Real-Time Object Detection,You Only Look Once说的是只需要一次CNN运算，Unified指的是这是一个统一的框架，提供end-to-end的预测，而Real-Time体现是Yolo算法速度快。

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

专知会员服务

12+阅读 · 2025年7月28日

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

专知会员服务

15+阅读 · 2022年3月24日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日