With the increase in demands for service robots and automated inspection, agents need to localize in its surrounding environment to achieve more natural communication with humans by shared contexts. In this work, we propose a novel but straightforward task of precise target view localization for look around agents called the FindView task. This task imitates the movements of PTZ cameras or user interfaces for 360 degree mediums, where the observer must "look around" to find a view that exactly matches the target. To solve this task, we introduce a rule-based agent that heuristically finds the optimal view and a policy learning agent that employs reinforcement learning to learn by interacting with the 360 degree scene. Through extensive evaluations and benchmarks, we conclude that learned methods have many advantages, in particular precise localization that is robust to corruption and can be easily deployed in novel scenes.
翻译:随着对服务机器人和自动化检查的需求增加,代理需要在其周围的环境中进行本地化,以通过共享上下文与人类进行更自然的交流。在这项工作中,我们提出了一项精确的目标视图本地化任务,适用于环顾四周的代理,称为FindView任务。该任务模仿PTZ摄像机或360度媒体的用户界面的移动,其中观察者必须“环顾四周”以找到完全匹配目标的视图。为了解决这个任务,我们引入了一种基于规则的代理,通过启发式的方式找到最佳视图,并引入了一种策略学习代理,通过与360度场景交互来学习。通过广泛的评估和基准测试,我们得出结论,学习方法具有许多优点,尤其是精确的本地化,具有鲁棒性,并且可以轻松部署在新颖的场景中。