Large-scale Text-to-image Generation Models (LTGMs) (e.g., DALL-E), self-supervised deep learning models trained on a huge dataset, have demonstrated the capacity for generating high-quality open-domain images from multi-modal input. Although they can even produce anthropomorphized versions of objects and animals, combine irrelevant concepts in reasonable ways, and give variation to any user-provided images, we witnessed such rapid technological advancement left many visual artists disoriented in leveraging LTGMs more actively in their creative works. Our goal in this work is to understand how visual artists would adopt LTGMs to support their creative works. To this end, we conducted an interview study as well as a systematic literature review of 72 system/application papers for a thorough examination. A total of 28 visual artists covering 35 distinct visual art domains acknowledged LTGMs' versatile roles with high usability to support creative works in automating the creation process (i.e., automation), expanding their ideas (i.e., exploration), and facilitating or arbitrating in communication (i.e., mediation). We conclude by providing four design guidelines that future researchers can refer to in making intelligent user interfaces using LTGMs.
翻译:大型的文本到图像生成模型(如DALL-E)是自我监督的深层次学习模型,在庞大的数据集上培训了自我监督的深层学习模型,这些模型展示了从多模式投入中生成高质量开放域图像的能力。虽然它们甚至能够生成人类化的多式物体和动物版本,以合理的方式将不相干的概念结合起来,并给用户提供的任何图像带来差异,但我们目睹了这种迅速的技术进步,使许多视觉艺术家在利用LTGM的创造性作品中更加积极地发挥杠杆作用。我们这项工作的目标是了解视觉艺术家将如何采用LTGM来支持其创造性作品。为此目的,我们进行了采访研究,并对72份系统/应用文件进行了系统的文献审查,以便进行彻底检查。共有28名视觉艺术家,覆盖35个不同的视觉艺术领域,承认LTGM的多功能作用,在支持创作过程自动化(即自动化)方面的创造性工作,扩大他们的想法(即探索),以及通信中的便利或仲裁(即调解)方式。我们的结论是,通过设计四种设计工具,我们可以指未来的用户界面。我们得出结论。