Graphical User Interface (GUI) is not merely a collection of individual and unrelated widgets, but rather partitions discrete widgets into groups by various visual cues, thus forming higher-order perceptual units such as tab, menu, card or list. The ability to automatically segment a GUI into perceptual groups of widgets constitutes a fundamental component of visual intelligence to automate GUI design, implementation and automation tasks. Although humans can partition a GUI into meaningful perceptual groups of widgets in a highly reliable way, perceptual grouping is still an open challenge for computational approaches. Existing methods rely on ad-hoc heuristics or supervised machine learning that is dependent on specific GUI implementations and runtime information. Research in psychology and biological vision has formulated a set of principles (i.e., Gestalt theory of perception) that describe how humans group elements in visual scenes based on visual cues like connectivity, similarity, proximity and continuity. These principles are domain-independent and have been widely adopted by practitioners to structure content on GUIs to improve aesthetic pleasant and usability. Inspired by these principles, we present a novel unsupervised image-based method for inferring perceptual groups of GUI widgets. Our method requires only GUI pixel images, is independent of GUI implementation, and does not require any training data. The evaluation on a dataset of 1,091 GUIs collected from 772 mobile apps and 20 UI design mockups shows that our method significantly outperforms the state-of-the-art ad-hoc heuristics-based baseline. Our perceptual grouping method creates the opportunities for improving UI-related software engineering tasks.
翻译:图形用户界面( GUI) 不仅仅是个人和无关部件的集合,而是以各种视觉提示将离散部件分解成各组, 从而形成更高层次的感知单位, 如制表、 菜单、 卡片或列表。 将图形界面自动分割成构件的感知组构成视觉智能的基本组成部分, 以自动化图形用户界面设计、 实施和自动化任务。 虽然人类可以以高度可靠的方式将图形界面分割成有意义的构件概念组, 但感知组仍然是计算方法的一个公开挑战。 现有方法依赖于 ad- hoctheristics 或受监督的机器学习, 取决于具体的图形界面执行和运行时间信息。 心理学和生物视觉研究已经制定了一套原则( 即Gestalt感知理论), 描述视觉场景中的人类元素组是如何根据视觉提示( 如连通性、 相似性、 接近性和连续性) 。 这些原则是不受域内置的, 并且被开业者广泛采用, 结构国家界面( ) 结构( ) 改善审美观和可操作性) 。 根据这些原则, 我们的模型的模型, 我们展示了一套不需变的模型的模型, 方法要求一个不需的图则需要一种不透式的图制的图型的图则。