争取为不受监督的复杂表格理由的统一框架 (Toward a Unified Framework for Unsupervised Complex Tabular Reasoning)

Structured tabular data exist across nearly all fields. Reasoning task over these data aims to answer questions or determine the truthiness of hypothesis sentences by understanding the semantic meaning of a table. While previous works have devoted significant efforts to the tabular reasoning task, they always assume there are sufficient labeled data. However, constructing reasoning samples over tables (and related text) is labor-intensive, especially when the reasoning process is complex. When labeled data is insufficient, the performance of models will suffer an unendurable decline. In this paper, we propose a unified framework for unsupervised complex tabular reasoning (UCTR), which generates sufficient and diverse synthetic data with complex logic for tabular reasoning tasks, assuming no human-annotated data at all. We first utilize a random sampling strategy to collect diverse programs of different types and execute them on tables based on a "Program-Executor" module. To bridge the gap between the programs and natural language sentences, we design a powerful "NL-Generator" module to generate natural language sentences with complex logic from these programs. Since a table often occurs with its surrounding texts, we further propose novel "Table-to-Text" and "Text-to-Table" operators to handle joint table-text reasoning scenarios. This way, we can adequately exploit the unlabeled table resources to obtain a well-performed reasoning model under an unsupervised setting. Our experiments cover different tasks (question answering and fact verification) and different domains (general and specific), showing that our unsupervised methods can achieve at most 93% performance compared to supervised models. We also find that it can substantially boost the supervised performance in low-resourced domains as a data augmentation technique. Our code is available at https://github.com/leezythu/UCTR.

翻译：几乎所有领域都存在结构化的表格数据。这些数据的原因在于通过理解表格的语义含义来回答问题或确定假设句的真伪。虽然先前的著作将大量精力投入到表格推理任务中, 但他们总是认为有足够的标签数据。但是, 在表格( 和相关文本) 中构建推理样本是劳动密集型的, 特别是在推理过程复杂的情况下。当标签数据不足时, 模型的性能将遭受无法避免的下降。在本文中, 我们提出一个统一框架, 用于未经监督的复杂表格推理( UCTR), 用于生成充足和多样的合成数据, 并具有复杂的表格推理逻辑, 假设没有任何人附加的数据。我们首先使用随机抽样战略来收集不同种类的不同程序( Program- Excutororors), 并在基于“ program- Excultive ” 模块的表格中执行。我们设计了一个强大的“ NL- Generator ” 模块, 来生成具有复杂逻辑的自然语言句子句。由于一个表格经常以非文本形式出现, 我们进一步提议“ liver- liver- liver- liver- liver- liver- liver- listrual- liver- liver- liver- liver- liver- liver- legal- ” labtal- ” 和 a liver- lautus