Event forecasting is a challenging, yet important task, as humans seek to constantly plan for the future. Existing automated forecasting studies rely mostly on structured data, such as time-series or event-based knowledge graphs, to help predict future events. In this work, we aim to formulate a task, construct a dataset, and provide benchmarks for developing methods for event forecasting with large volumes of unstructured text data. To simulate the forecasting scenario on temporal news documents, we formulate the problem as a restricted-domain, multiple-choice, question-answering (QA) task. Unlike existing QA tasks, our task limits accessible information, and thus a model has to make a forecasting judgement. To showcase the usefulness of this task formulation, we introduce ForecastQA, a question-answering dataset consisting of 10,392 event forecasting questions, which have been collected and verified via crowdsourcing efforts. We present our experiments on ForecastQA using BERT-based models and find that our best model achieves 60.1% accuracy on the dataset, which still lags behind human performance by about 19%. We hope ForecastQA will support future research efforts in bridging this gap.
翻译:事件预测是一项具有挑战性但又很重要的任务,因为人类寻求不断规划未来。现有的自动化预测研究主要依靠结构化数据,如时间序列或基于事件的知识图表,以帮助预测未来的事件。在这项工作中,我们的目标是制定一个任务,建立一个数据集,并为制定利用大量无结构文本数据进行事件预测的方法提供基准。在时间新闻文件中模拟预测情景时,我们将问题描述为一种限制性的、多选择的、问答的任务。与现有的质量评估任务不同,我们的任务限制是可获得的信息,因此一个模型必须作出预测判断。为了展示这一任务设计的有用性,我们引入了“预测”数据,这是一个问答数据集,由10,392个事件预测问题组成,通过众包工作收集并核实。我们用基于BERT的模型来介绍我们关于预测QA的实验,发现我们的最佳模型在数据集上达到了60.1%的准确度,该数据集仍然落后于人类业绩约19 %。我们希望“预测”将支持今后缩小这一差距的研究工作。