Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora. However, understanding the properties of an automatically generated text corpus remains challenging. We propose a set of tools that examine the properties of generated text corpora. Applying these tools on various generated corpora allowed us to gain new insights into the properties of the generative models. As part of our characterization process, we found remarkable differences in the corpora generated by two leading generative technologies.
翻译:生成文本的模型已成为许多研究任务的焦点,特别是生成句子公司的焦点,然而,了解自动生成的文本材料的特性仍然具有挑战性。我们提出了一套工具来研究生成文本材料公司的特性。将这些工具应用到各种生成的体上,使我们得以对基因模型的特性有了新的了解。作为我们定性过程的一部分,我们发现在两种主要基因技术生成的体上存在显著差异。