In recent years, we have seen significant steps taken in the development of self-driving cars. Multiple companies are starting to roll out impressive systems that work in a variety of settings. These systems can sometimes give the impression that full self-driving is just around the corner and that we would soon build cars without even a steering wheel. The increase in the level of autonomy and control given to an AI provides an opportunity for new modes of human-vehicle interaction. However, surveys have shown that giving more control to an AI in self-driving cars is accompanied by a degree of uneasiness by passengers. In an attempt to alleviate this issue, recent works have taken a natural language-oriented approach by allowing the passenger to give commands that refer to specific objects in the visual scene. Nevertheless, this is only half the task as the car should also understand the physical destination of the command, which is what we focus on in this paper. We propose an extension in which we annotate the 3D destination that the car needs to reach after executing the given command and evaluate multiple different baselines on predicting this destination location. Additionally, we introduce a model that outperforms the prior works adapted for this particular setting.
翻译:近年来,我们看到了开发自驾驶汽车方面采取的重大步骤。多家公司开始推出在各种环境下运作的令人印象深刻的系统。这些系统有时会给人一种印象,即完全自驾驶就在转角附近,我们很快会在没有方向盘的情况下制造汽车。AI的自主和控制水平的提高为新的载人车辆互动模式提供了机会。然而,调查表明,在自驾驶汽车方面给予AI更多的控制权的同时,乘客也表现出一定程度的不安全感。为了缓解这一问题,最近的工作采取了一种自然的语言导向方法,允许乘客发出指向视觉中特定物体的命令。然而,这只是汽车应该了解指挥的实际目的地的一半任务,而这正是我们在本文中关注的焦点。我们提议延长期限,说明汽车在执行既定命令后需要达到的3D目的地,并评估预测这一目的地的多种不同基线。此外,我们引入了一种模型,它超越了先前为这一特定地点调整的工程。