In natural language processing (NLP), labeling on regions of text, such as words, sentences and paragraphs, is a basic task. In this paper, label is defined as map between mention of entity in a region on text and context of entity in a broader region on text containing the mention. This definition naturally introduces linkage of entities induced from inclusion relation of regions, and connected entities form a graph representing information flow defined by map. It also enables calculation of information loss through map using entropy, and entropy lost is regarded as distance between two entities over a path on graph.
翻译:在自然语言处理(NLP)中,对文字区域(如文字、句子和段落)贴标签是一项基本任务,在本文中,标签的定义是:在提及内容的文字和大区域实体背景上,在提及某一区域实体的文字和背景上,在提及内容的文字和背景上,标明某一区域实体之间的位置。这一定义自然引入了因纳入区域关系而导致的实体之间的联系,而关联实体形成图示,表示地图定义的信息流动。它还能够通过使用信箱图计算信息损失,而所丢失的英特罗比则被视为两个实体之间在图表路径上的距离。