培训前视觉语言模型在各种承认行为中的机器人应用</s> (Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors)

In recent years, a number of models that learn the relations between vision and language from large datasets have been released. These models perform a variety of tasks, such as answering questions about images, retrieving sentences that best correspond to images, and finding regions in images that correspond to phrases. Although there are some examples, the connection between these pre-trained vision-language models and robotics is still weak. If they are directly connected to robot motions, they lose their versatility due to the embodiment of the robot and the difficulty of data collection, and become inapplicable to a wide range of bodies and situations. Therefore, in this study, we categorize and summarize the methods to utilize the pre-trained vision-language models flexibly and easily in a way that the robot can understand, without directly connecting them to robot motions. We discuss how to use these models for robot motion selection and motion planning without re-training the models. We consider five types of methods to extract information understandable for robots, and show the results of state recognition, object recognition, affordance recognition, relation recognition, and anomaly detection based on the combination of these five methods. We expect that this study will add flexibility and ease-of-use, as well as new applications, to the recognition behavior of existing robots.

翻译：近年来,从大型数据集中学习视觉和语言关系的一些模型已经发布。这些模型执行各种任务,例如回答图像问题,检索最符合图像的句子,在图像中找到与短语相对应的区域。虽然有一些实例,但这些经过预先训练的视觉语言模型和机器人之间的联系仍然薄弱。如果这些模型与机器人运动直接相关,则由于机器人的化身和数据收集的困难而失去其多功能性,并变得不适用于各种各样的机构和情况。因此,在本研究中,我们分类和总结如何以机器人能够理解的方式灵活和容易地使用经过训练的视觉语言模型,而不必直接将其与机器人运动运动联系起来。我们讨论如何将这些模型用于机器人运动的选择和运动规划,而不对模型进行再培训。我们考虑五类方法,以获得机器人可以理解的信息,并显示国家认识、目标识别、承诺度、关系识别和基于这五种方法组合的异常性检测结果。我们期望,这一研究将增加现有机器人的灵活性和轻松度,作为新的认识,作为新的认识,作为新的认识,作为新的认识,我们期望新的认识,作为新的认识,作为新的认识,作为新的认识,作为新的应用。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/