Tabular question answering (TQA) presents a challenging setting for neural systems by requiring joint reasoning of natural language with large amounts of semi-structured data. Unlike humans who use programmatic tools like filters to transform data before processing, language models in TQA process tables directly, resulting in information loss as table size increases. In this paper we propose ToolWriter to generate query specific programs and detect when to apply them to transform tables and align them with the TQA model's capabilities. Focusing ToolWriter to generate row-filtering tools improves the state-of-the-art for WikiTableQuestions and WikiSQL with the most performance gained on long tables. By investigating headroom, our work highlights the broader potential for programmatic tools combined with neural components to manipulate large amounts of structured data.
翻译:摘要:对于神经系统而言,表格问答(TQA)需要将自然语言与大量半结构化数据同时进行推理,因此提出了一个具有挑战性的设置。与使用过滤器等编程工具的人类不同,TQA模型直接处理表格,导致随着表格大小的增加信息丢失。本文提出了ToolWriter,用于生成查询特定程序,检测何时应用它们来转换表格并使其与TQA模型的能力对齐。将ToolWriter聚焦于生成行过滤工具可以改善WikiTableQuestions和WikiSQL的最新性能,最大程度地提高长表格的性能。通过调查潜力,我们的工作强调了编程工具与神经组件相结合处理大量结构化数据的广泛潜力。