To satisfy various user needs, different subtasks of graphic layout generation have been explored intensively in recent years. Existing studies usually propose task-specific methods with diverse input-output formats, dedicated model architectures, and different learning methods. However, those specialized approaches make the adaption to unseen subtasks difficult, hinder the knowledge sharing between different subtasks, and are contrary to the trend of devising general-purpose models. In this work, we propose UniLayout, which handles different subtasks for graphic layout generation in a unified manner. First, we uniformly represent diverse inputs and outputs of subtasks as the sequences of tokens. Then, based on the unified sequence format, we naturally leverage an identical encoder-decoder architecture with Transformers for different subtasks. Moreover, based on the above two kinds of unification, we further develop a single model that supports all subtasks concurrently. Experiments on two public datasets demonstrate that while simple, UniLayout significantly outperforms the previous task-specific methods.
翻译:为满足不同用户的需要,近年来对不同图形布局生成的不同子任务进行了深入探讨。现有的研究通常会提出不同投入产出格式、专用模型架构和不同学习方法的具体任务方法。然而,这些专门方法使得难以适应未知子任务,阻碍不同子任务之间的知识共享,与设计通用模型的趋势背道而驰。在这项工作中,我们提议UniLayout,它以统一的方式处理不同子任务,用于图形布局生成。首先,我们一致地将子任务的不同投入和输出作为符号序列。然后,根据统一的序列格式,我们自然地利用一个与变换器相同的编码-变换器结构,用于不同的子任务。此外,基于上述两种统一,我们进一步开发一个同时支持所有子任务的单一模型。在两个公共数据集上进行的实验表明,虽然简单, UniLayoot 明显地超越了先前的任务特定方法。