Automated understanding of user interfaces (UIs) from their pixels can improve accessibility, enable task automation, and facilitate interface design without relying on developers to comprehensively provide metadata. A first step is to infer what UI elements exist on a screen, but current approaches are limited in how they infer how those elements are semantically grouped into structured interface definitions. In this paper, we motivate the problem of screen parsing, the task of predicting UI elements and their relationships from a screenshot. We describe our implementation of screen parsing and provide an effective training procedure that optimizes its performance. In an evaluation comparing the accuracy of the generated output, we find that our implementation significantly outperforms current systems (up to 23%). Finally, we show three example applications that are facilitated by screen parsing: (i) UI similarity search, (ii) accessibility enhancement, and (iii) code generation from UI screenshots.
翻译:从像素中自动理解用户界面(UI)可以改善无障碍环境,实现任务自动化,便利界面设计,而不必依靠开发者全面提供元数据。第一步是推断屏幕上有哪些UI元素存在,但目前的方法有限,无法推断这些元素是如何在结构化界面定义中进行精密分类的。在本文中,我们从屏幕截图中推介屏幕分割问题,预测UI元素及其关系的任务。我们描述了屏幕剖析的执行情况,并提供了一个优化其性能的有效培训程序。在比较所生成产出的准确性时,我们发现我们的实施大大优于现有系统(高达23% ) 。最后,我们展示了三个通过屏幕剖析促进的应用程序:(一) UII相似性搜索,(二) 无障碍增强,以及(三) 从用户截图中生成代码。