Current end-to-end machine reading and question answering (Q\&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature of RNNs. We propose a new Q\&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and self-attention models global interactions. On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models. The speed-up gain allows us to train the model with much more data. We hence combine our model with data generated by backtranslation from a neural machine translation model. On the SQuAD dataset, our single model, trained with augmented data, achieves 84.6 F1 score on the test set, which is significantly better than the best published F1 score of 81.8.
翻译:目前端对端机器读读和答答( ⁇ A)模型主要基于经常性神经网络,并引起注意。尽管这些模型取得了成功,但由于RNN的相继性质,这些模型在培训和推论方面往往十分缓慢。我们提议了一个名为QANet的新的QAA结构,它不需要经常性网络:其编码器完全由变换和自我注意组成,其中变换模型是本地互动和自我注意模型全球互动。在 SQAD数据集中,我们的模型在培训中速度为3x至13x,在推断中速度为4x至9x方面速度较快,同时达到与经常模型相当的精确度。加速收益使我们能够用更多的数据来培训模型。因此,我们将我们的模型与由神经机器翻译模型的回译产生的数据结合起来。在SQAD数据集中,我们经过强化数据培训的单一模型在测试集中取得了84.6 F1分,这比所公布的81.8分的最佳F1分要好得多。