Despite biographies are widely spread within the Semantic Web, resources and approaches to automatically extract biographical events are limited. Such limitation reduces the amount of structured, machine-readable biographical information, especially about people belonging to underrepresented groups. Our work challenges this limitation by providing a set of guidelines for the semantic annotation of life events. The guidelines are designed to be interoperable with existing ISO-standards for semantic annotation: ISO-TimeML (ISO-24617-1), and SemAF (ISO-24617-4). Guidelines were tested through an annotation task of Wikipedia biographies of underrepresented writers, namely authors born in non-Western countries, migrants, or belonging to ethnic minorities. 1,000 sentences were annotated by 4 annotators with an average Inter-Annotator Agreement of 0.825. The resulting corpus was mapped on OntoNotes. Such mapping allowed to to expand our corpus, showing that already existing resources may be exploited for the biographical event extraction task.
翻译:尽管在语义网页上广泛散布了传记,但资源和自动提取传记事件的方法有限,这种限制减少了结构化、机器可读的传记资料的数量,特别是属于代表性不足群体的人的传记资料。我们的工作挑战在于为生活事件的语义说明提供一套准则。准则的设计与现有的ISO语义说明标准(ISO-TimeML(ISO-24617-1)和SemAF(ISO-24617-4))互可操作。准则通过维基百科在非西方国家出生的作者、移民或少数民族的作者的传记任务进行测试。4名注解员对1 000个判决作了说明,平均签署0.825份《机构间协议》,由此绘制的《Onto Notes》是一幅图。这种绘图可以扩大我们的版图,表明已有的资源可用于传记事件摘取任务。