In human-robot collaboration, one challenging task is to teach a robot new yet unknown objects enabling it to interact with them. Thereby, gaze can contain valuable information. We investigate if it is possible to detect objects (object or no object) merely from gaze data and determine their bounding box parameters. For this purpose, we explore different sizes of temporal windows, which serve as a basis for the computation of heatmaps, i.e., the spatial distribution of the gaze data. Additionally, we analyze different grid sizes of these heatmaps, and demonstrate the functionality in a proof of concept using different machine learning techniques. Our method is characterized by its speed and resource efficiency compared to conventional object detectors. In order to generate the required data, we conducted a study with five subjects who could move freely and thus, turn towards arbitrary objects. This way, we chose a scenario for our data collection that is as realistic as possible. Since the subjects move while facing objects, the heatmaps also contain gaze data trajectories, complicating the detection and parameter regression. We make our data set publicly available to the research community for download.
翻译:在人类机器人合作中,一项具有挑战性的任务是教给机器人一个能够与它们互动的新的但未知的物体。 因此, 凝视可以包含有价值的信息。 我们调查是否有可能仅仅从凝视数据中探测物体( 物体或无物体) 并确定其捆绑框参数。 为此, 我们探索时间窗口的不同尺寸, 以之作为计算热映射, 即凝视数据的空间分布的基础。 此外, 我们分析这些热映射的不同网格大小, 并用不同的机器学习技术证明概念的功能。 我们的方法特征是其速度和资源效率, 与常规的物体探测器相比。 为了生成所需的数据, 我们用五个主题进行了一项研究, 他们可以自由移动, 从而转向任意的物体。 这样, 我们选择了一个尽可能现实的数据收集方案。 由于对象在面对物体时移动, 热映射图中还包含视觉数据轨迹, 使探测和参数回归复杂化。 我们把数据公开提供给研究群体下载。