In today's multilingual lexical databases, the majority of the world's languages are under-represented. Beyond a mere issue of resource incompleteness, we show that existing lexical databases have structural limitations that result in a reduced expressivity on culturally-specific words and in mapping them across languages. In particular, the lexical meaning space of dominant languages, such as English, is represented more accurately while linguistically or culturally diverse languages are mapped in an approximate manner. Our paper assesses state-of-the-art multilingual lexical databases and evaluates their strengths and limitations with respect to their expressivity on lexical phenomena of linguistic diversity.
翻译:在当今的多语言词汇数据库中,世界上大多数语言的代表性不足,除了资源不完全的问题之外,我们还表明,现有的词汇数据库存在结构性限制,导致对特定文化语言的表达性下降,并导致对不同语言的绘图,特别是,主要语言(如英语)的词汇含义空间得到更准确的表述,而语言或文化多样性语言的分布则大致如此。我们的文件评估了最先进的多语言词汇数据库,并评估了它们在语言多样性词汇现象的表达性方面的优势和局限性。