Generative Adversarial Networks (GANs) have been shown to aid in the creation of artificial data in situations where large amounts of real data are difficult to come by. This issue is especially salient in the computational linguistics space, where researchers are often tasked with modeling the complex morphologic and grammatical processes of low-resource languages. This paper will discuss the implementation and testing of a GAN that attempts to model and reproduce the graphotactics of a language using only 100 example strings. These artificial, yet graphotactically compliant, strings are meant to aid in modeling the morphological inflection of low-resource languages.
翻译:事实证明,在难以获得大量真实数据的情况下,基因生成网络(GANs)有助于创造人工数据,这个问题在计算语言空间中特别突出,研究人员往往负责模拟复杂的低资源语言的形态和语法过程,本文件将讨论GAN的实施和测试情况,GAN试图仅用100个示例字符串来模拟和复制一种语言的笔迹学。这些人工但符合图形学的字符串是为了帮助模拟低资源语言的形态变化。