Interactive Machine Teaching (IMT) systems allow non-experts to easily create Machine Learning (ML) models. However, existing vision-based IMT systems either ignore annotations on the objects of interest or require users to annotate in a post-hoc manner. Without the annotations on objects, the model may misinterpret the objects using unrelated features. Post-hoc annotations cause additional workload, which diminishes the usability of the overall model building process. In this paper, we develop LookHere, which integrates in-situ object annotations into vision-based IMT. LookHere exploits users' deictic gestures to segment the objects of interest in real time. This segmentation information can be additionally used for training. To achieve the reliable performance of this object segmentation, we utilize our custom dataset called HuTics, including 2040 front-facing images of deictic gestures toward various objects by 170 people. The quantitative results of our user study showed that participants were 16.3 times faster in creating a model with our system compared to a standard IMT system with a post-hoc annotation process while demonstrating comparable accuracies. Additionally, models created by our system showed a significant accuracy improvement ($\Delta mIoU=0.466$) in segmenting the objects of interest compared to those without annotations.
翻译:交互式机器教学(IMT)系统使非专家能够轻松地创建机器学习(ML)模型。 但是,现有的基于视觉的IMT系统要么忽略了有关对象的说明,要么要求用户在热量后进行批注。 没有关于对象的说明,模型可能会错误地解释对象, 使用不相干的特点。 热量说明增加了工作量, 减少了整个模型构建过程的可用性。 在本文件中, 我们开发了 LookTHewer, 将现场对象说明纳入基于视觉的IMT 。 看这里利用了用户在真实时间将感兴趣的对象分割为部分的异常手势。 这种分解信息可以额外用于培训。 为了实现这个对象分割的可靠性能, 我们使用我们所谓的定制数据集“ HuTics ”, 包括 2040 张对不同对象的辨别手势前侧图像, 减少了170人。 我们用户研究的量化结果表明, 参与者在创建我们系统的模型时比标准IMT系统更快了16.3倍, 后加注过程, 显示后加注过程, 并展示了可比的焦量值=MI= mdeal 。