Demand for image editing has been increasing as users' desire for expression is also increasing. However, for most users, image editing tools are not easy to use since the tools require certain expertise in photo effects and have complex interfaces. Hence, users might need someone to help edit their images, but having a personal dedicated human assistant for every user is impossible to scale. For that reason, an automated assistant system for image editing is desirable. Additionally, users want more image sources for diverse image editing works, and integrating an image search functionality into the editing tool is a potential remedy for this demand. Thus, we propose a dataset of an automated Conversational Agent for Image Search and Editing (CAISE). To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests. To build such a system, we first collect image search and editing conversations between pairs of annotators. The assistant-annotators are equipped with a customized image search and editing tool to address the requests from the user-annotators. The functions that the assistant-annotators conduct with the tool are recorded as executable commands, allowing the trained system to be useful for real-world application execution. We also introduce a generator-extractor baseline model for this task, which can adaptively select the source of the next token (i.e., from the vocabulary or from textual/visual contexts) for the executable command. This serves as a strong starting point while still leaving a large human-machine performance gap for useful future work. Our code and dataset are publicly available at: https://github.com/hyounghk/CAISE
翻译:随着用户对表达式的渴望不断增长,对图像编辑的需求也不断增长。然而,对于大多数用户来说,图像编辑工具并非容易使用,因为工具需要某些图片效果方面的专业知识,并拥有复杂的界面。因此,用户可能需要有人帮助编辑图像,但每个用户都有一个专门的个人人类助理,因此不可能按比例调整。为此,图像编辑自动化助理系统是可取的。此外,用户需要为多种图像编辑工作提供更多的图像源,并将图像搜索功能纳入编辑工具是这一需求的潜在补救。因此,我们仍然提议建立一个自动图像搜索和编辑工具(CAISE)的数据集。对于我们的知识来说,这是第一个提供对面图像搜索和编辑说明的数据集,但对于每个用户来说都有一个个人专用的人类助理助理助理助理助理助理助理助理助理助理助理,帮助他们根据用户的要求搜索和编辑图像。为了建立这样一个系统,我们首先收集不同图像编辑的图像搜索和谈话,然后将一个定制图像搜索和编辑工具工具工具安装到用户搜索和编辑器中,然后从用户搜索器搜索和编辑系统内部操作系统, 开始一个有用的操作工具操作,然后开始, 开始一个用于操作操作。