Automatically converting text descriptions into images using transformer architectures has recently received considerable attention. Such advances have implications for many applied design disciplines across fashion, art, architecture, urban planning, landscape design and the future tools available to such disciplines. However, a detailed analysis capturing the capabilities of such models, specifically with a focus on the built environment, has not been performed to date. In this work, we investigate the capabilities and biases of such text-to-image methods as it applies to the built environment in detail. We use a systematic grammar to generate queries related to the built environment and evaluate resulting generated images. We generate 1020 different images and find that text to image transformers are robust at generating realistic images across different domains for this use-case. Generated imagery can be found at the github: https://github.com/sachith500/DALLEURBAN
翻译:利用变压器结构将文字描述自动转换成图像最近受到相当重视,这些进步对许多应用设计学科产生了影响,涉及时装、艺术、建筑、城市规划、景观设计以及这些学科可利用的未来工具,然而,迄今尚未进行详细分析,捕捉这些模型的能力,特别是侧重于建筑环境。在这项工作中,我们详细调查了这些文本到图像方法在适用于建筑环境时的能力和偏向。我们使用系统语法生成与建筑环境有关的查询,并评价由此产生的图像。我们生成了1020种不同的图像,发现图像变压器的文本在为这一使用案例生成不同领域的现实图像方面是强有力的。生成图像可以在 Gathub 找到: https://github.com/sachith500/DALEURBAN。