We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score, which avoids the need for a separate filtering model. Our generator is trained by fine-tuning a pretrained LM using maximum likelihood estimation. The experimental results indicate significant improvements in the domain adaptation of QA models outperforming current state-of-the-art methods.
翻译:我们为合成质量保证数据生成提出了一个端对端方法。 我们的模型包括一个基于单一变压器的编码器- 编码器- 编码器网络, 该网络经过培训的端对端网络, 以产生答案和问题。 简而言之, 我们向编码器提供一条通道, 并询问编码器生成一个问答, 逐个答题。 生成过程中产生的可能性被用作过滤分, 从而避免使用单独的过滤模型。 我们的生成器经过培训, 利用最大可能性估计, 微调一个训练有素的 LM 。 实验结果显示, QA 模型的域适应性显著改进, 超过了当前最先进的方法 。