Creating challenging tabular inference data is essential for learning complex reasoning. Prior work has mostly relied on two data generation strategies. The first is human annotation, which yields linguistically diverse data but is difficult to scale. The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness. In this research, we present a framework for semi-automatically recasting existing tabular data to make use of the benefits of both approaches. We utilize our framework to build tabular NLI instances from five datasets that were initially intended for tasks like table2text creation, tabular Q/A, and semantic parsing. We demonstrate that recasted data could be used as evaluation benchmarks as well as augmentation data to enhance performance on tabular NLI tasks. Furthermore, we investigate the effectiveness of models trained on recasted data in the zero-shot scenario, and analyse trends in performance across different recasted datasets types.
翻译:创建具有挑战性的表格推理数据对于学习复杂的推理至关重要。 先前的工作主要依赖于两个数据生成战略。 第一个是人类说明,它产生语言多样性的数据,但难以推广。 创造的第二类是合成生成,它可缩放,具有成本效益,但缺乏创造性。 在这项研究中,我们提出了一个半自动重编现有表格数据的框架,以便利用这两种方法的效益。 我们利用我们的框架,从最初用于表2文本创建、表格Q/A和语义解析等任务的五个数据集中建立表格式的NLI实例。 我们证明,重编的数据可以用作评价基准和增强数据,以提高表NLI任务的业绩。 此外,我们调查了在零发假设中重新输入数据所培训的模式的有效性,并分析了不同重编数据集类型的执行情况趋势。