In this paper, we consider fine-grained image object detection in resource-constrained cases such as edge computing. Deep learning (DL), namely learning with deep neural networks (DNNs), has become the dominating approach to object detection. To achieve accurate fine-grained detection, one needs to employ a large enough DNN model and a vast amount of data annotations, which brings a challenge for using modern DL object detectors in resource-constrained cases. To this end, we propose an approach, which leverages commonsense knowledge to assist a coarse-grained object detector to get accurate fine-grained detection results. Specifically, we introduce a commonsense knowledge inference module (CKIM) to process coarse-grained lables given by a benchmark DL detector to produce fine-grained lables. We consider both crisp-rule and fuzzy-rule based inference in our CKIM; the latter is used to handle ambiguity in the target semantic labels. We implement our method based on several modern DL detectors, namely YOLOv4, Mobilenetv3-SSD and YOLOv7-tiny. Experiment results show that our approach outperforms benchmark detectors remarkably in terms of accuracy, model size and processing latency.
翻译:本文考虑了资源受限的细粒度图像目标检测,例如边缘计算环境。深度学习(DL),即深度神经网络(DNNs)学习,已成为目标检测的主要方法。为了实现准确的细粒度检测,需要使用足够大的DNN模型和大量数据注释,这对于在资源受限的情况下使用现代DL目标检测器提出了挑战。为此,我们提出了一种方法,该方法利用常识知识来辅助粗粒度物体检测器获取准确的细粒度检测结果。具体而言,我们引入常识知识推理模块(CKIM)来处理基准DL检测器给出的粗糙标签,以生成细粒度的标签。我们考虑基于清晰规则和模糊规则的常识推理;后者用于处理目标语义标签中的歧义。我们基于几种现代DL检测器实现了我们的方法,即YOLOv4,Mobilenetv3-SSD和YOLOv7-tiny。实验结果表明,我们的方法在准确性,模型大小和处理延迟方面明显优于基准检测器。