We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In inference, the model refines a set of randomly generated boxes to the output results in a progressive way. The extensive evaluations on the standard benchmarks, including MS-COCO and LVIS, show that DiffusionDet achieves favorable performance compared to previous well-established detectors. Our work brings two important findings in object detection. First, random boxes, although drastically different from pre-defined anchors or learned queries, are also effective object candidates. Second, object detection, one of the representative perception tasks, can be solved by a generative way. Our code is available at https://github.com/ShoufaChen/DiffusionDet.
翻译:我们提议DifuncleDet, 这一新框架将物体探测设计成从吵闹的盒子到物体盒的分解扩散过程。 在培训阶段, 物体盒从地面真相盒扩散到随机分布, 模型学会扭转这个噪音过程。 假设模型以渐进的方式将一组随机生成的盒子精细化为产出结果。 包括 MS- COCO 和 LVIS 在内的对标准基准的广泛评价显示, DifunclDet 取得了优于以往成熟的探测器的性能。 我们的工作在物体探测方面带来了两个重要发现。 首先, 随机盒, 虽然与预先定义的锚或学到的查询截然不同, 也是有效的对象选择。 其次, 对象探测, 具有代表性的感知任务之一, 可以通过基因化的方法来解决。 我们的代码可以在 https://github.com/ shoufaChen/DiflutionDet上查阅 。