Contextualized representations give significantly improved results for a wide range of NLP tasks. Much work has been dedicated to analyzing the features captured by representative models such as BERT. Existing work finds that syntactic, semantic and word sense knowledge are encoded in BERT. However, little work has investigated word features for character-based languages such as Chinese. We investigate Chinese BERT using both attention weight distribution statistics and probing tasks, finding that (1) word information is captured by BERT; (2) word-level features are mostly in the middle representation layers; (3) downstream tasks make different use of word features in BERT, with POS tagging and chunking relying the most on word features, and natural language inference relying the least on such features.
翻译:大量工作都用于分析有代表性的模型(如BERT)所捕捉的特征; 现有工作发现,BERT对综合、语义和字感知识进行了编码; 然而,很少调查中文等以字为基础的语言的字性特征; 我们利用注意力重量分布统计和检验任务对中国的字性特征进行调查,发现:(1) BERT收集了字性信息;(2) 字级特征大多在中间代表层;(3) 下游任务不同地使用BERT的字性,POS标记和拼凑最依赖字性特征,自然语言推断最不依赖这些特征。