Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive representation of textual information. To promote this goal, we investigate how to best utilize the power of sequence-to-sequence (seq2seq) pre-trained language models, within the unique setup of semi-structured outputs, consisting of an unordered set of question-answer pairs. We examine different input and output linearization strategies, and assess the effect of multitask learning and of simple data augmentation techniques in the setting of imbalanced training data. Consequently, we release the first unified QASem parsing tool, practical for downstream applications who can benefit from an explicit, QA-based account of information units in a text.
翻译:最近的一些著作建议代表语义关系与问答的关系,将文字信息分解成单独的审问性自然语言说明;在本文件中,我们考虑了基于质量A的三个语义任务,即QA-SRL、QANom和QADiscourse,每个任务都针对某种类型的预言,并提议将它们视为共同提供一种全面的文本信息。为了促进这一目标,我们调查如何在半结构化产出的独特结构内,最好地利用从顺序到顺序(seq2seq)预先培训的语言模型的力量,包括一套没有顺序的问答组合。我们研究了不同的投入和产出线性化战略,评估了多任务学习和简单数据增强技术在设置不平衡的培训数据方面的效果。因此,我们发布了第一个统一的QASem分类工具,对于下游应用中能够受益于文本中明确、基于质量A的信息单位说明的实用工具。