We endeavor on a rarely explored task named Insubstantial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and temporal information is crucial. Thus, we construct an IOD-Video dataset comprised of 600 videos (141,017 frames) covering various distances, sizes, visibility, and scenes captured by different spectral ranges. In addition, we develop a spatio-temporal aggregation framework for IOD, in which different backbones are deployed and a spatio-temporal aggregation loss (STAloss) is elaborately designed to leverage the consistency along the time axis. Experiments conducted on IOD-Video dataset demonstrate that spatio-temporal aggregation can significantly improve the performance of IOD. We hope our work will attract further researches into this valuable yet challenging task. The code will be available at: \url{https://github.com/CalayZhou/IOD-Video}.
翻译:我们努力完成一个鲜为人知的任务,名为 " 无形物体探测 " (IOD),旨在将物体本地化,具有以下特征:(1) 具有模糊边界的无形形状;(2) 与周围相近;(3) 色彩缺失;因此,在一个静态框架中区分非实质性物体和空间及时间信息的协作性表述至关重要。因此,我们建造一个由600个视频(141 017框架)组成的IOD-Video数据集,涵盖不同频谱范围所捕捉的不同距离、大小、可见度和场景。此外,我们还为IOD开发了一个spatio-时空汇总框架,其中安装了不同的脊椎,并精心设计了一个时空汇总损失(STAloss),以利用时间轴的一致性。在IOD-Video数据集上进行的实验表明,spatio-时空汇总能够显著改善IOD的性能。我们希望我们的工作将吸引对这一宝贵而又具有挑战性的任务进行进一步的研究。代码将在以下提供:\url-AWAUD{GI/OD_GUBcom。