Current multilingual semantic parsing (MSP) datasets are almost all collected by translating the utterances in the existing datasets from the resource-rich language to the target language. However, manual translation is costly. To reduce the translation effort, this paper proposes the first active learning procedure for MSP (AL-MSP). AL-MSP selects only a subset from the existing datasets to be translated. We also propose a novel selection method that prioritizes the examples diversifying the logical form structures with more lexical choices, and a novel hyperparameter tuning method that needs no extra annotation cost. Our experiments show that AL-MSP significantly reduces translation costs with ideal selection methods. Our selection method with proper hyperparameters yields better parsing performance than the other baselines on two multilingual datasets.
翻译:目前的多语种语义解析(MSP)数据集几乎全部都是通过将现有数据集中的语句从资源丰富的语言翻译成目标语言来收集的。然而,人工翻译费用昂贵。为减少翻译工作,本文件提议MSP(AL-MSP)采用第一个积极学习程序。AL-MSP从有待翻译的现有数据集中只选择一个子集。我们还提议了一种新的选择方法,将逻辑形式结构多样化的示例列为优先事项,并采用不需要额外说明费用的新的超参数调法。我们的实验表明,AL-MSP用理想的选择方法大大降低了翻译成本。我们使用适当的超参数的选定方法比两个多语种数据集上的其他基线的精确性能要好。