Improving the accessibility and automation capabilities of mobile devices can have a significant positive impact on the daily lives of countless users. To stimulate research in this direction, we release a human-annotated dataset with approximately 500k unique annotations aimed at increasing the understanding of the functionality of UI elements. This dataset augments images and view hierarchies from RICO, a large dataset of mobile UIs, with annotations for icons based on their shapes and semantics, and associations between different elements and their corresponding text labels, resulting in a significant increase in the number of UI elements and the categories assigned to them. We also release models using image-only and multimodal inputs; we experiment with various architectures and study the benefits of using multimodal inputs on the new dataset. Our models demonstrate strong performance on an evaluation set of unseen apps, indicating their generalizability to newer screens. These models, combined with the new dataset, can enable innovative functionalities like referring to UI elements by their labels, improved coverage and better semantics for icons etc., which would go a long way in making UIs more usable for everyone.
翻译:改善移动设备的无障碍和自动化能力可以对无数用户的日常生活产生显著的积极影响。为了刺激这方面的研究,我们发行了一个带有约500公里独特说明的带有人类附加说明的数据集,目的是增进对UI元素功能的理解。这个数据集增加了图像,并查看了RICO的等级。RICO是一个大型移动UIs数据集,带有基于其形状和语义的图标说明,以及不同元素及其对应文本标签之间的关联,从而大大增加了UI元素的数量和分配给它们的类别。我们还发行了只使用图像和多式投入的模型;我们实验了各种结构,并研究了在新数据集中使用多式投入的好处。我们的模型展示了一套看不见应用软件的强效性,表明这些模型与新屏幕的通用性。这些模型,加上新的数据集,能够实现创新功能,例如通过标签提及UIE元素,改进了图标的覆盖范围和更好的语义等,这将大大地使UIIS对每个人更加有用。