To facilitate conversational question answering (CQA) over hybrid contexts in finance, we present a new dataset, named PACIFIC. Compared with existing CQA datasets, PACIFIC exhibits three key features: (i) proactivity, (ii) numerical reasoning, and (iii) hybrid context of tables and text. A new task is defined accordingly to study Proactive Conversational Question Answering (PCQA), which combines clarification question generation and CQA. In addition, we propose a novel method, namely UniPCQA, to adapt a hybrid format of input and output content in PCQA into the Seq2Seq problem, including the reformulation of the numerical reasoning process as code generation. UniPCQA performs multi-task learning over all sub-tasks in PCQA and incorporates a simple ensemble strategy to alleviate the error propagation issue in the multi-task learning by cross-validating top-$k$ sampled Seq2Seq outputs. We benchmark the PACIFIC dataset with extensive baselines and provide comprehensive evaluations on each sub-task of PCQA.
翻译:为了促进金融领域中混合上下文的对话式问题回答(CQA),我们提出了一个新数据集,称为PACIFIC。相比于现有的CQA 数据集,PACIFIC 具有三个关键特征:(i)主动性,(ii)数值推理,和(iii)一个同时包含表格和文本的混合上下文。研究相应的新任务,称为主动式对话式问题回答(PCQA),它结合了澄清问题生成和对话式问题回答。此外,我们提出了一种新方法UniPCQA,将PCQA的混合格式的输入和输出内容改编为Seq2Seq问题,包括将数值推理过程改编为代码生成。UniPCQA在PCQA的所有子任务上执行多任务学习,并通过交叉验证前k个样本的Seq2Seq 输出来缓解多任务学习中的错误传播问题。我们利用广泛的基线对PACIFIC数据集进行了基准测试,并对PCQA的每个子任务进行了全面评估。