Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations. In contrast to other datasets with structured event representations, we gather annotations by posing natural questions, and evaluate off-the-shelf models for three different tasks: sentence classification, document ranking, and temporal aggregation of target events. We present baseline results from zero-shot BERT-based models fine-tuned on natural language inference and passage retrieval tasks. Our novel corpus-level evaluations and annotation approach can guide creation of similar social-science-oriented resources in the future.
翻译:社会科学应用中的自动事件提取往往要求进行实体一级的评价:例如,将元数据和无偏倚的召回估计的文本预测汇总起来;我们把实体一级的评价要求与现实世界、社会科学环境结合起来,并推出印度警察总部的21 391项判决,这些判决来自2002年3月印度古吉拉特邦事件的1 257篇英语《印度时报》的1 257篇文章。我们受过训练的告示员阅读并贴上每个文件,以提及警察活动,允许不偏袒的召回评价。与其他数据集相比,我们通过提出自然问题来收集说明,并评估三种不同任务的现成模式:判决分类、文件排行、目标活动的时间汇总。我们介绍了基于零速BERT模型的基线结果,该模型对自然语言推理和通过检索任务进行了微调。我们新的实体一级的评价和注解方法可以指导未来类似的社会科学资源。