In this paper we present a new ensemble method, Continuous Bag-of-Skip-grams (CBOS), that produces high-quality word representations putting emphasis on the modern Greek language. The CBOS method combines the pioneering approaches for learning word representations: Continuous Bag-of-Words (CBOW) and Continuous Skip-gram. These methods are compared through intrinsic and extrinsic evaluation tasks on three different sources of data: the English Wikipedia corpus, the modern Greek Wikipedia corpus, and the modern Greek Web Content corpus. By comparing these methods across different tasks and datasets, it is evident that the CBOS method achieves state-of-the-art performance.
翻译:在本文中,我们展示了一种新的组合方法,即连续一袋Skip-gram(CBOS),该方法产生高质量的文字表达方式,强调现代希腊语。CBOS方法结合了学习文字表达方式的先锋方法:连续一袋(CBOW)和连续一袋(Skip-gram)。这些方法通过三种不同数据来源的内在和外部评估任务加以比较:英国维基百科、现代希腊维基百科和现代希腊网络内容组合。通过将这些方法与不同的任务和数据集进行比较,很明显,CBOS方法取得了最先进的性能。