This paper describes our system for the SemEval2022 task of matching dictionary glosses to word embeddings. We focus on the Reverse Dictionary Track of the competition, which maps multilingual glosses to reconstructed vector representations. More specifically, models convert the input of sentences to three types of embeddings: SGNS, Char, and Electra. We propose several experiments for applying neural network cells, general multilingual and multitask structures, and language-agnostic tricks to the task. We also provide comparisons over different types of word embeddings and ablation studies to suggest helpful strategies. Our initial transformer-based model achieves relatively low performance. However, trials on different retokenization methodologies indicate improved performance. Our proposed Elmobased monolingual model achieves the highest outcome, and its multitask, and multilingual varieties show competitive results as well.
翻译:本文描述我们的SemEval2022任务系统, 将字典上的遗迹与字嵌入相匹配。 我们侧重于对竞争的反词典跟踪, 绘制多语种的遗迹图, 用于再造矢量表示。 更具体地说, 模型将句子输入转换成三种嵌入式: SGNS、 Char 和 Electra。 我们提出了几项应用神经网络细胞、 通用多语种和多任务结构以及语言学技巧的实验。 我们还提供了不同类型词嵌入和缩入研究的比较, 以提出有用的战略。 我们最初的变压器模型取得了相对较低的性能。 但是, 对不同再定位方法的试验表明业绩有所改善。 我们提议的 Elmo 单语模式取得了最高的结果, 其多任务和多语言品种也显示了竞争性的结果 。