The focus of this thesis is broadly on the alignment of lexicographical data, particularly dictionaries. In order to tackle some of the challenges in this field, two main tasks of word sense alignment and translation inference are addressed. The first task aims to find an optimal alignment given the sense definitions of a headword in two different monolingual dictionaries. This is a challenging task, especially due to differences in sense granularity, coverage and description in two resources. After describing the characteristics of various lexical semantic resources, we introduce a benchmark containing 17 datasets of 15 languages where monolingual word senses and definitions are manually annotated across different resources by experts. In the creation of the benchmark, lexicographers' knowledge is incorporated through the annotations where a semantic relation, namely exact, narrower, broader, related or none, is selected for each sense pair. This benchmark can be used for evaluation purposes of word-sense alignment systems. The performance of a few alignment techniques based on textual and non-textual semantic similarity detection and semantic relation induction is evaluated using the benchmark. Finally, we extend this work to translation inference where translation pairs are induced to generate bilingual lexicons in an unsupervised way using various approaches based on graph analysis. This task is of particular interest for the creation of lexicographical resources for less-resourced and under-represented languages and also, assists in increasing coverage of the existing resources. From a practical point of view, the techniques and methods that are developed in this thesis are implemented within a tool that can facilitate the alignment task.
翻译:该论文的焦点大致上是地名录数据的统一,特别是字典。为了应对该领域的一些挑战,将处理文字感知和翻译推导的两个主要任务。第一个任务旨在根据两个不同的单语词典中首词定义的感知,找到最佳的对齐。这是一项具有挑战性的任务,特别是由于两种资源在语言颗粒性、覆盖面和描述上的差异。在描述各种词汇语义资源的特点之后,我们引入了一个包含15种语言的17个数据集的基准,其中单语言词感应和定义在专家的不同资源中手工注释。在创建基准时,地名录学家的知识通过说明纳入其中,其中为每种语义选择了语义关系,即精确、狭小、宽广、相关或无。这一基准可用于语言感知调系统的评价目的。在描述各种文字和非文字性语义相似性探测和语义关系感应征的功能时,使用基准来评估的对15种语言的对齐性识别和语义关系。在创建基准中,我们将这项工作的范围扩大到了各种语言的翻译方法,在以内部,在这种语言感化分析中,在使用这种语言感化分析中使用了一种基于不同语言的翻译方式,这种语言的翻译方法,在计算中,可以产生一种不甚深地名录。