用户对比模式培训前预科培训 (UserBERT: Contrastive User Model Pre-training)

User modeling is critical for personalized web applications. Existing user modeling methods usually train user models from user behaviors with task-specific labeled data. However, labeled data in a target task may be insufficient for training accurate user models. Fortunately, there are usually rich unlabeled user behavior data which encode rich information of user characteristics and interests. Thus, pre-training user models on unlabeled user behavior data has the potential to improve user modeling for many downstream tasks. In this paper, we propose a contrastive user model pre-training method named UserBERT. Two self-supervision tasks are incorporated in UserBERT for user model pre-training on unlabeled user behavior data to empower user modeling. The first one is masked behavior prediction, which aims to model the relatedness between user behaviors. The second one is behavior sequence matching, which aims to capture the inherent user interests that are consistent in different periods. In addition, we propose a medium-hard negative sampling framework to select informative negative samples for better contrastive pre-training. We maintain a synchronously updated candidate behavior pool and an asynchronously updated candidate behavior sequence pool to select the locally hardest negative behaviors and behavior sequences in an efficient way. Extensive experiments on two real-world datasets in different tasks show that UserBERT can effectively improve various user models.

翻译：用户模式对于个人化的网络应用程序至关重要。现有的用户模型方法通常从用户行为中用特定任务标签数据来培训用户模型。但是, 目标任务中的标签数据可能不足以培训准确用户模型。幸运的是, 通常有丰富的未标签用户行为数据, 以输入关于用户特性和兴趣的丰富信息。因此, 未标签用户行为数据的培训前用户模式有可能改进许多下游任务的用户模式。本文中, 我们提议了一个对比性用户模式模型预培训方法, 名为 UseerBERT 。用户模型前培训中包含两个自我监督任务, 用于对未标签用户行为数据进行用户行为模型预培训, 以增强用户模型模型的功能。第一个是隐形行为预测, 目的是模拟用户行为特性和兴趣之间的关联。第二个是行为序列匹配, 目的是捕捉不同时期内在的用户兴趣。此外, 我们提出一个中硬的负面抽样框架, 选择信息化的负面样本, 以便进行更好的对比性培训。我们保留一个同步更新的候选行为池, 并且对候选人行为顺序进行同步更新的候选行为顺序排序, 能够有效地选择当地最坏的用户行为模式, 。在不同的用户系统中选择最坏的行为模式中, 。