The detection of object states in images (State Detection - SD) is a problem of both theoretical and practical importance and it is tightly interwoven with other important computer vision problems, such as action recognition and affordance detection. It is also highly relevant to any entity that needs to reason and act in dynamic domains, such as robotic systems and intelligent agents. Despite its importance, up to now, the research on this problem has been limited. In this paper, we attempt a systematic study of the SD problem. First, we introduce the Object State Detection Dataset (OSDD), a new publicly available dataset consisting of more than 19,000 annotations for 18 object categories and 9 state classes. Second, using a standard deep learning framework used for Object Detection (OD), we conduct a number of appropriately designed experiments, towards an in-depth study of the behavior of the SD problem. This study enables the setup of a baseline on the performance of SD, as well as its relative performance in comparison to OD, in a variety of scenarios. Overall, the experimental outcomes confirm that SD is harder than OD and that tailored SD methods need to be developed for addressing effectively this significant problem.
翻译:图像中的物体状态(国家探测-SD)的探测是一个理论和实践上都很重要的问题,它与其他重要的计算机视觉问题紧密交织在一起,例如行动识别和提供检测;它还与任何需要理性和在动态领域采取行动的实体,例如机器人系统和智能剂高度相关;尽管这一问题的研究到目前为止十分重要,但迄今仍然有限;在本文件中,我们试图对SD问题进行系统研究;首先,我们引入了物体状态探测数据集(OSDD),这是一个新的公开数据集,包括18个物体类别和9个状态等级的19,000多份说明;第二,我们使用用于物体探测的标准深层次学习框架,进行一些设计适当的实验,以深入研究SD问题的行为;这项研究使得能够建立关于SD的绩效及其与OD的相对性能的基线,在各种情景中。总体而言,实验结果证实SD比OD更难,并且需要为有效解决这一重大问题而制定专门的SDD方法。