The increased interest in diffusion models has opened up opportunities for advancements in generative text modeling. These models can produce impressive images when given a well-crafted prompt, but creating a powerful or meaningful prompt can be hit-or-miss. To address this, we have created a large-scale dataset that is derived and synthesized from real prompts and indexed with popular image-text datasets such as MS-COCO and Flickr. We have also implemented stages that gradually reduce context and increase complexity, which will further enhance the output due to the complex annotations created. The dataset, called MTTN, includes over 2.4 million sentences divided into 5 stages, resulting in a total of over 12 million pairs, and a vocabulary of over 300,000 unique words, providing ample variation. The original 2.4 million pairs are designed to reflect the way language is used on the internet globally, making the dataset more robust for any model trained on it.
 翻译:对传播模型的日益兴趣为基因化文本模型的发展创造了机会,这些模型可以产生令人印象深刻的图像,如果设计得精准,但创造强大或有意义的快速可能是偶然的。为了解决这个问题,我们创建了一个大规模数据集,该数据集来自真实的快速数据,并与流行的图像文本数据集,如MS-COCO和Flickr一起编制成索引。我们还实施了逐步减少背景和增加复杂性的阶段,这将由于创建复杂的说明而进一步提高产出。称为MTTN的数据集包括240万个句子,分为5个阶段,总共1 200万对,以及30多万个单词的词汇,提供了充分的变异。最初的240万对配对设计是为了反映全球互联网上使用语言的方式,使所培训的模型更加坚固。