Designing visually appealing layouts for multimedia documents containing text, graphs and images requires a form of creative intelligence. Modelling the generation of layouts has recently gained attention due to its importance in aesthetics and communication style. In contrast to standard prediction tasks, there are a range of acceptable layouts which depend on user preferences. For example, a poster designer may prefer logos on the top-left while another prefers logos on the bottom-right. Both are correct choices yet existing machine learning models treat layouts as a single choice prediction problem. In such situations, these models would simply average over all possible choices given the same input forming a degenerate sample. In the above example, this would form an unacceptable layout with a logo in the centre. In this paper, we present an auto-regressive neural network architecture, called LayoutMCL, that uses multi-choice prediction and winner-takes-all loss to effectively stabilise layout generation. LayoutMCL avoids the averaging problem by using multiple predictors to learn a range of possible options for each layout object. This enables LayoutMCL to generate multiple and diverse layouts from a single input which is in contrast with existing approaches which yield similar layouts with minor variations. Through quantitative benchmarks on real data (magazine, document and mobile app layouts), we demonstrate that LayoutMCL reduces Fr\'echet Inception Distance (FID) by 83-98% and generates significantly more diversity in comparison to existing approaches.
翻译:为含有文本、图表和图像的多媒体文件设计具有视觉吸引力的图像设计具有视觉吸引力的布局需要某种形式的创造性智能。建模布局最近因其在美学和通信风格中的重要性而引起注意。与标准的预测任务不同,有一系列取决于用户偏好的可接受布局。例如,海报设计师可能更喜欢左上方的标识,而另一个则更喜欢右下方的标识。两者都是正确的选择,而现有的机器学习模型将布局视为单一的选择预测问题。在这种情况下,这些模型将仅仅平均超过所有可能的选择,而这种所有可能的选择将形成退化的样板。在以上例子中,这将形成一个带有中心标志的不可接受的布局。在本文中,我们展示了一种自动反向性神经网络结构结构架构,称为布局图,使用多曲率预测和赢家取全损来有效稳定布局的生成。MCCL通过多个预测器来避免平均问题,以不同的布局对象为不同的选择范围。这样可以使布局ML的布局能够从单一输入产生多种和不同的布局。在中心内形成一个标志。在微的布局上比,比,比我们现有的平面图中,比现有的平面图将产生更相似的平面图图,以比。