Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored. The field of Computer Vision has begun to tackle encoder ranking, with promising forays into Natural Language Processing, however they lack coverage of linguistic tasks such as structured prediction. We propose probing to rank LMs, specifically for parsing dependencies in a given language, by measuring the degree to which labeled trees are recoverable from an LM's contextualized embeddings. Across 46 typologically and architecturally diverse LM-language pairs, our probing approach predicts the best LM choice 79% of the time using orders of magnitude less compute than training a full parser. Within this study, we identify and analyze one recently proposed decoupled LM - RemBERT - and find it strikingly contains less inherent dependency information, but often yields the best parser after full fine-tuning. Without this outlier our approach identifies the best LM in 89% of cases.
翻译:知情地选择经过培训的语言模式对于业绩至关重要,但环境成本却很高,而且这种探索也普遍不足。 计算机愿景领域已开始解决编码器的排序问题,有希望地进入自然语言处理系统,然而,它们缺乏结构化预测等语言任务的涵盖范围。 我们提议通过测量标签树从LM背景化嵌入中可恢复到何种程度,具体地说是为了将依赖性区分在特定语言中。在46个类型化和建筑多样性的LM语言配对中,我们的测试方法预测了79%的最佳LM选择时间,使用数量级比培训完整分析员少。 在本研究中,我们确定并分析最近提出的一个脱钩LM-RembERT,发现它含有明显的不那么固有的依赖性信息,但在全面微调后往往产生最好的配方。如果没有这种超出我们的方法,我们就能确定89%的案例的最佳LM。