Conversational agents show the promise to allow users to interact with mobile devices using language. However, to perform diverse UI tasks with natural language, developers typically need to create separate datasets and models for each specific task, which is expensive and effort-consuming. Recently, pre-trained large language models (LLMs) have been shown capable of generalizing to various downstream tasks when prompted with a handful of examples from the target task. This paper investigates the feasibility of enabling versatile conversational interactions with mobile UIs using a single LLM. We propose a design space to categorize conversations between the user and the agent when collaboratively accomplishing mobile tasks. We design prompting techniques to adapt an LLM to conversational tasks on mobile UIs. The experiments show that our approach enables various conversational interactions with decent performances, manifesting its feasibility. We discuss the use cases of our work and its implications for language-based mobile interaction.
翻译:对话代理机构展示了允许用户与使用语言的移动设备互动的希望。然而,为了执行与自然语言的不同界面任务,开发者通常需要为每个具体任务创建单独的数据集和模型,而每个具体任务都是昂贵和费力的。最近,经过预先培训的大型语言模型(LLMs)在目标任务中举了几个例子,被证明能够概括到各种下游任务。本文探讨了利用单一LLM与移动UI进行多功能性互动的可行性。我们提出了一个设计空间,用于在合作完成移动任务时对用户和代理之间的对话进行分类。我们设计了各种技术,使LLM适应移动 UIs上的谈话任务。这些实验表明,我们的方法能够使各种对话与体面的表演发生互动,并表明其可行性。我们讨论了我们工作的使用案例及其对基于语言的流动互动的影响。