We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences. We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data.
翻译:我们建议AutoQA, 这是一种方法和工具包, 用于生成解析器, 解答数据库中的问题, 无需人工操作。 根据数据库的系统图及其数据, AutoQA 自动生成一系列高质量的培训问题, 包括不同的数据库操作。 它与基于模板的解析相结合, 并使用自动解析, 在不同部分的语音中找到属性的替代表达方式 。 它还使用新颖的过滤式自动解析器, 来生成整个句子的正确解说 。 我们在Schema2QA 数据集中应用 AutoQA, 在对自然问题进行测试时, 获得62.9%的平均逻辑格式准确度为62.9%, 这比用专家自然语言说明和从众工那里收集的解说数据培训的模型低6.4% 。 为了显示 AutoQA 的一般性, 我们还将它应用到过夜数据集 。 自动QA 达到69.8% 的回答准确度, 比最新零发模型高16.4%, 仅比受人类数据培训的同一模型低5.2% 。