This paper introduces DroidBot-GPT, a tool that utilizes GPT-like large language models (LLMs) to automate the interactions with Android mobile applications. Given a natural language description of a desired task, DroidBot-GPT can automatically generate and execute actions that navigate the app to complete the task. It works by translating the app GUI state information and the available actions on the smartphone screen to natural language prompts and asking the LLM to make a choice of actions. Since the LLM is typically trained on a large amount of data including the how-to manuals of diverse software applications, it has the ability to make reasonable choices of actions based on the provided information. We evaluate DroidBot-GPT with a self-created dataset that contains 33 tasks collected from 17 Android applications spanning 10 categories. It can successfully complete 39.39% of the tasks, and the average partial completion progress is about 66.76%. Given the fact that our method is fully unsupervised (no modification required from both the app and the LLM), we believe there is great potential to enhance automation performance with better app development paradigms and/or custom model training.
翻译:----
本文介绍了 DroidBot-GPT,一种利用类似于 GPT 的大型语言模型 (LLMs) 自动操作 Android 移动应用程序的工具。给定所需任务的自然语言描述,DroidBot-GPT 可以自动生成并执行操作,以导航应用程序并完成任务。它通过将应用 GUI 状态信息和智能手机屏幕上的可用操作转化为自然语言提示,然后要求 LLM 根据提供的信息进行操作选择。由于 LLM 通常是在包括不同软件应用程序的使用手册在内的大量数据上进行训练的,因此它具有根据所提供的信息做出合理操作选择的能力。我们使用自己创建的数据集对 DroidBot-GPT 进行评估,该数据集包含来自 10 个类别的 17 个 Android 应用程序的 33 个任务。它能够成功完成 39.39% 的任务,并且平均部分完成进度约为 66.76%。鉴于我们的方法完全无监督(不需要修改应用程序和 LLM),我们认为可以通过更好的应用程序开发范例和/或自定义模型训练来提高自动化性能。