This paper describes in details the design and development of a novel annotation framework and of annotated resources for Internal Displacement, as the outcome of a collaboration with the Internal Displacement Monitoring Centre, aimed at improving the accuracy of their monitoring platform IDETECT. The schema includes multi-faceted description of the events, including cause, quantity of people displaced, location and date. Higher-order facets aimed at improving the information extraction, such as document relevance and type, are proposed. We also report a case study of machine learning application to the document classification tasks. Finally, we discuss the importance of standardized schema in dataset benchmark development and its impact on the development of reliable disaster monitoring infrastructure.
翻译:本文详细叙述作为与境内流离失所问题监测中心合作的成果,设计和开发一个新的说明框架和附加说明的境内流离失所问题资源,以提高其监测平台IDETECT的准确性,其中包括多面描述事件,包括原因、流离失所人数、地点和日期,提出了旨在改进信息提取的更高层次方面,如文件相关性和类型,我们还报告了文件分类任务中机器学习应用案例研究。最后,我们讨论了数据集基准开发标准化计划的重要性及其对可靠灾害监测基础设施开发的影响。