Recently pre-training models have significantly improved the performance of various NLP tasks by leveraging large-scale text corpora to improve the contextual representation ability of the neural network. The large pre-training language model has also been applied in the area of table semantic parsing. However, existing pre-training approaches have not carefully explored explicit interaction relationships between a question and the corresponding database schema, which is a key ingredient for uncovering their semantic and structural correspondence. Furthermore, the question-aware representation learning in the schema grounding context has received less attention in pre-training objective.To alleviate these issues, this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training. We further propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner. We evaluate our pre-trained framework by fine-tuning it on two benchmarks, Spider and SQUALL. The results demonstrate the effectiveness of our pre-training objective and curriculum compared to a variety of baselines.
翻译:最近的培训前模式通过利用大型文本公司来提高神经网络的背景代表性能力,大大提高了国家语言方案各项任务的业绩,在表语义区分方面也应用了大型培训前语言模式,但是,现有的培训前方法没有认真探讨一个问题与相应的数据库模式之间的明确互动关系,而后者是发现其语义和结构对应的关键内容。此外,在培训前目标中,在系统基础背景下的有问题代表学习受到的关注较少。 为了缓解这些问题,本文件设计了两个新的培训前目标,将所希望的暗示偏见强加在培训前的学习中。我们进一步建议一种有计划的课程学习方法,以易于操作的方式减轻噪音的影响,并有效地从培训前数据中学习。我们通过对培训前框架的两种基准,即蜘蛛和SQUAL进行微调,来评估我们经过培训前的框架。结果表明,与各种基线相比,我们的培训前目标和课程是有效的。