Detecting objects of interest through language often presents challenges, particularly with objects that are uncommon or complex to describe, due to perceptual discrepancies between automated models and human annotators. These challenges highlight the need for comprehensive datasets that go beyond standard object labels by incorporating detailed attribute descriptions. To address this need, we introduce the Objects365-Attr dataset, an extension of the existing Objects365 dataset, distinguished by its attribute annotations. This dataset reduces inconsistencies in object detection by integrating a broad spectrum of attributes, including color, material, state, texture and tone. It contains an extensive collection of 5.6M object-level attribute descriptions, meticulously annotated across 1.4M bounding boxes. Additionally, to validate the dataset's effectiveness, we conduct a rigorous evaluation of YOLO-World at different scales, measuring their detection performance and demonstrating the dataset's contribution to advancing object detection.
翻译:暂无翻译