深度学习与自然语言处理领域的研究员：以下是一篇论文的题目和摘要的翻译结果：题目：汉语预训练模型中的字符、词或两者？重访分词粒度摘要：预训练语言模型（PLM）已经在各种NLP任务中展现了惊人的性能提升。大部分汉语PLM仅将输入文本视为字符序列，并忽略单词信息。虽然“整词遮蔽”可以缓解这一问题，但单词语义仍未得到很好的表达。本文重访汉语PLM的分词粒度。我们通过同时考虑字符和单词来设计混合粒度的汉语BERT（MigBERT）。为了实现这一目标，我们设计了用于学习字符和单词级别表示的目标函数。我们在各种中文NLP任务上进行了大量实验，以评估现有的PLM和所提出的MigBERT。实验结果表明，MigBERT在所有这些任务上都实现了新的SOTA性能。进一步分析表明，单词语义比字符更丰富。更有趣的是，我们发现MigBERT也适用于日语。我们的代码和模型已在此发布（https://github.com/xnliang98/MigBERT）。 (Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models)

翻译：深度学习与自然语言处理领域的研究员：以下是一篇论文的题目和摘要的翻译结果：题目：汉语预训练模型中的字符、词或两者？重访分词粒度摘要：预训练语言模型（PLM）已经在各种NLP任务中展现了惊人的性能提升。大部分汉语PLM仅将输入文本视为字符序列，并忽略单词信息。虽然“整词遮蔽”可以缓解这一问题，但单词语义仍未得到很好的表达。本文重访汉语PLM的分词粒度。我们通过同时考虑字符和单词来设计混合粒度的汉语BERT（MigBERT）。为了实现这一目标，我们设计了用于学习字符和单词级别表示的目标函数。我们在各种中文NLP任务上进行了大量实验，以评估现有的PLM和所提出的MigBERT。实验结果表明，MigBERT在所有这些任务上都实现了新的SOTA性能。进一步分析表明，单词语义比字符更丰富。更有趣的是，我们发现MigBERT也适用于日语。我们的代码和模型已在此发布（https://github.com/xnliang98/MigBERT）。

Xinnian Liang,Zefan Zhou,Hui Huang,Shuangzhi Wu,Tong Xiao,Muyun Yang,Zhoujun Li,Chao Bian

from arxiv, preprint

Pretrained language models (PLMs) have shown marvelous improvements across various NLP tasks. Most Chinese PLMs simply treat an input text as a sequence of characters, and completely ignore word information. Although Whole Word Masking can alleviate this, the semantics in words is still not well represented. In this paper, we revisit the segmentation granularity of Chinese PLMs. We propose a mixed-granularity Chinese BERT (MigBERT) by considering both characters and words. To achieve this, we design objective functions for learning both character and word-level representations. We conduct extensive experiments on various Chinese NLP tasks to evaluate existing PLMs as well as the proposed MigBERT. Experimental results show that MigBERT achieves new SOTA performance on all these tasks. Further analysis demonstrates that words are semantically richer than characters. More interestingly, we show that MigBERT also works with Japanese. Our code and model have been released here~\footnote{https://github.com/xnliang98/MigBERT}.

翻译：