Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor\`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.
翻译:生成NLP 对话是一项重要的NLP 任务, 包含许多挑战。 对非洲低资源语言而言, 挑战变得更为艰巨。 为了建立非洲语言的对话代理机构, 我们为6种非洲语言( 斯瓦希里语、 沃洛夫语、 豪萨语、 尼日利亚皮德金英语、 基尼亚卢旺达语 & Yor ⁇ ub\'a ) 提供了第一批高质量的对话数据集。 这些数据集由1500个旋转组成, 我们从英语多面多面多面多WOZ数据集中的一部分翻译出来。 随后, 我们调查并分析建模的效果, 通过使用最先进的艺术( SoTA) 深单语言的单一语言传授来传授模型。 我们通过自由的单面语言模式, 将模型与6种非洲语言( DialoGPT ) 和 BlenderBot 等高语言( BlenderBat) 。 我们用简单后2eq 基准比较这些模型。 除此之外, 我们使用多数的票和测量器对单面对话进行人文评价, 我们发现深单面单面模型可以理解各种语言的抽象的模型。 我们观察类似对话, 以不同程度的6个公交式的公交点的公交的公交点, 。