The ubiquity of the contemporary language understanding tasks gives relevance to the development of generalized, yet highly efficient models that utilize all knowledge, provided by the data source. In this work, we present SocialBERT - the first model that uses knowledge about the author's position in the network during text analysis. We investigate possible models for learning social network information and successfully inject it into the baseline BERT model. The evaluation shows that embedding this information maintains a good generalization, with an increase in the quality of the probabilistic model for the given author up to 7.5%. The proposed model has been trained on the majority of groups for the chosen social network, and still able to work with previously unknown groups. The obtained model, as well as the code of our experiments, is available for download and use in applied tasks.
翻译:当代语言理解任务的普遍性与开发利用数据源提供的所有知识的通用但高效的模型有关。在这项工作中,我们介绍了SoultBERT,这是在文本分析过程中使用关于作者在网络中地位知识的第一种模型。我们调查了学习社会网络信息的可能模式,并成功地将其注入基准BERT模式。评估表明,将这种信息嵌入了一种良好的通用,给定作者的概率模型质量提高到7.5%。提议的模型已经为选定的社会网络的大多数群体进行了培训,并且仍然能够与以前未知的群体合作。获得的模型以及我们的实验代码可以下载并用于应用任务。