In this paper, we propose a method to generate personalized filled pauses (FPs) with group-wise prediction models. Compared with fluent text generation, disfluent text generation has not been widely explored. To generate more human-like texts, we addressed disfluent text generation. The usage of disfluency, such as FPs, rephrases, and word fragments, differs from speaker to speaker, and thus, the generation of personalized FPs is required. However, it is difficult to predict them because of the sparsity of position and the frequency difference between more and less frequently used FPs. Moreover, it is sometimes difficult to adapt FP prediction models to each speaker because of the large variation of the tendency within each speaker. To address these issues, we propose a method to build group-dependent prediction models by grouping speakers on the basis of their tendency to use FPs. This method does not require a large amount of data and time to train each speaker model. We further introduce a loss function and a word embedding model suitable for FP prediction. Our experimental results demonstrate that group-dependent models can predict FPs with higher scores than a non-personalized one and the introduced loss function and word embedding model improve the prediction performance.
翻译:在本文中, 我们提出一种方法来产生个性化的填充暂停, 以群集预测模型 。 与流利的文本生成相比, 疏漏的文本生成没有被广泛探讨。 为了产生更多的像人一样的文本, 我们解决了不流利的文本生成。 消漏的文本生成方法, 诸如 FP、 重新措辞和单词碎片的使用因发言者使用FP 的趋势不同而不同, 因此, 需要生成个性化的FP 模式。 但是, 很难预测它们, 但是, 由于位置的宽度和频繁使用的FP的频率差异, 很难预测它们。 此外, 有时很难将FP 预测模型适应每个发言者。 为了解决这些问题, 我们建议了一种方法, 即根据发言者使用FP 、 重新措辞和单词的倾向, 分组化的预测模型, 从而根据他们使用FP 的倾向, 来建立群体化的预测模型。 这种方法不需要大量的数据和时间来培训每个发言者的模式。 我们还引入了一种损失函数和一个适合FP预测的词嵌入模式。 我们的实验结果显示, 依赖的FPS 的模型可以预测一个不高个性化模型, 。