LEXPander:对自动词汇扩展应用硬化网络 (LEXpander: applying colexification networks to automated lexicon expansion)

Recent approaches to text analysis from social media and other corpora rely on word lists to detect topics, measure meaning, or to select relevant documents. These lists are often generated by applying computational lexicon expansion methods to small, manually-curated sets of root words. Despite the wide use of this approach, we still lack an exhaustive comparative analysis of the performance of lexicon expansion methods and how they can be improved with additional linguistic data. In this work, we present LEXpander, a method for lexicon expansion that leverages novel data on colexification, i.e. semantic networks connecting words based on shared concepts and translations to other languages. We evaluate LEXpander in a benchmark including widely used methods for lexicon expansion based on various word embedding models and synonym networks. We find that LEXpander outperforms existing approaches in terms of both precision and the trade-off between precision and recall of generated word lists in a variety of tests. Our benchmark includes several linguistic categories and sentiment variables in English and German. We also show that the expanded word lists constitute a high-performing text analysis method in application cases to various corpora. This way, LEXpander poses a systematic automated solution to expand short lists of words into exhaustive and accurate word lists that can closely approximate word lists generated by experts in psychology and linguistics.

翻译：最近,社交媒体和其他社团的文本分析方法依靠文字列表来检测专题、衡量含义或选择相关文件。这些清单往往是通过将计算词汇扩展方法应用于小型、手工加工的根词组来生成的。尽管这种方法得到了广泛使用,但我们仍然缺乏对词汇扩展方法的性能以及如何用额外的语言数据来改进这些方法的详尽的比较分析。在这项工作中,我们介绍了LEXpander,一种词汇扩展方法,它利用关于灵活性的新数据,即将基于共同概念的文字与其他语言的翻译连接起来的语义网络。我们用一个基准来评估LEXpander,包括广泛使用的基于各种语言嵌入模式和同义网络的词汇扩展方法。我们发现,LEXpander在精确性和在各种测试中准确性与回顾生成的单词表之间的权衡方面,超越了现有的方法。我们的基准包括一些英语和德语的语言类别和情绪变量。我们还表明,扩大的词汇列表构成一种高性文本分析方法,在应用中包括广泛使用的词汇模型模型和精确性词汇列表中,通过系统化的词汇列表,可以形成一个精确的词汇列表。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日