最佳大小性能权衡取舍: 相邻的 PoS Tagger 模型 (Optimal Size-Performance Tradeoffs: Weighing PoS Tagger Models)

Improvement in machine learning-based NLP performance are often presented with bigger models and more complex code. This presents a trade-off: better scores come at the cost of larger tools; bigger models tend to require more during training and inference time. We present multiple methods for measuring the size of a model, and for comparing this with the model's performance. In a case study over part-of-speech tagging, we then apply these techniques to taggers for eight languages and present a novel analysis identifying which taggers are size-performance optimal. Results indicate that some classical taggers place on the size-performance skyline across languages. Further, although the deep models have highest performance for multiple scores, it is often not the most complex of these that reach peak performance.

翻译：在基于机器学习的NLP性能的改进中,通常会展示更大的模型和更复杂的代码。这带来了一个权衡:以更大的工具为代价来获得更好的分数;在培训和推算时间期间,更大的模型往往需要更多的时间。我们提出了多种方法来衡量模型的大小,并将它与模型的性能进行比较。在对部分语音标记的案例研究中,我们然后将这些技术应用到8种语言的标签标签上,并提出新的分析,确定哪些标签是最佳的尺寸-性能。结果显示有些古典标记者在跨语言的大小-性能天线上找到了位置。此外,虽然深层模型对于多重得分的性能最高,但通常不是最复杂的方法。

相关内容

词性标注

关注 389

词性（part-of-speech）是词汇基本的语法属性，通常也称为词类。词性标注就是在给定句子中判定每个词的语法范畴，确定其词性并加以标注的过程，是中文信息处理面临的重要基础性问题。在语料库语言学中，词性标注（POS标注或PoS标注或POST），也称为语法标注，是将文本（语料库）中的单词标注为与特定词性相对应的过程，[1] 基于其定义和上下文。

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日