The explosive popularity of diffusion models[ 1][ 2][ 3 ] has provided a huge stage for further development in generative-text modelling. As prompt based models are very nuanced, such that a carefully generated prompt can produce truely breath taking images, on the contrary producing powerful or even meaningful prompt is a hit or a miss. To lavish on this we have introduced a large scale derived and synthesized dataset built with on real prompts and indexed with popular image-text datasets like MS-COCO[4 ], Flickr[ 5], etc. We have also introduced staging for these sentences that sequentially reduce the context and increase the complexity, that will further strengthen the output because of the complex annotations that are being created. MTTN consists of over 2.4M sentences that are divided over 5 stages creating a combination amounting to over 12M pairs, along with a vocab size of consisting more than 300 thousands unique words that creates an abundance of variations. The original 2.4M million pairs are broken down in such a manner that it produces a true scenario of internet lingo that is used globally thereby heightening the robustness of the dataset, and any model trained on it.
翻译:传播模型[1][2][3] 的爆炸性流行为基因文本建模的进一步发展提供了一个巨大的阶段。由于快速基础模型非常细微,因此仔细生成的快速能够产生真正的呼吸图像,相反,产生强大甚至有意义的快速是一个打击或错失。我们为此引入了一个大规模衍生和综合数据集,该数据集以真实的速率为基础,并与流行的MS-CO[4]、Flickr[5]等图像文本数据集索引。我们还引入了这些句子,这些句子先后减少背景并增加复杂性,这将进一步加强产出,因为正在创建复杂的说明。MTTN由2.4M句组成,分为五个多阶段,形成相当于12M对的组合,以及由30多万个独有的单词组成,产生大量变异。最初的240M对子被打破了,从而产生了一种真正的互联网语言假象,从而提升了数据集的坚固性,并且对它进行了任何模型进行了培训。