In this paper, we focus on a scenario where a single image contains objects of the same category but varying sizes, and we propose a lightweight approach that can not only recognize their category labels but also their real sizes. Our approach utilizes commonsense knowledge to assist a deep neural network (DNN) based coarse-grained object detector to achieve accurate size-related fine-grained detection. Specifically, we introduce a commonsense knowledge inference module (CKIM) that maps the coarse-grained labels produced by the DL detector to size-related fine-grained labels. Experimental results demonstrate that our approach achieves accurate fine-grained detections with a reduced amount of annotated data, and smaller model size, compared with baseline methods. Our code is available at: https://github.com/ZJLAB-AMMI/CKIM.
翻译:暂无翻译