User modeling is crucial to understanding user behavior and essential for improving user experience and personalized recommendations. When users interact with software, vast amounts of command sequences are generated through logging and analytics systems. These command sequences contain clues to the users' goals and intents. However, these data modalities are highly unstructured and unlabeled, making it difficult for standard predictive systems to learn from. We propose SimCURL, a simple yet effective contrastive self-supervised deep learning framework that learns user representation from unlabeled command sequences. Our method introduces a user-session network architecture, as well as session dropout as a novel way of data augmentation. We train and evaluate our method on a real-world command sequence dataset of more than half a billion commands. Our method shows significant improvement over existing methods when the learned representation is transferred to downstream tasks such as experience and expertise classification.
翻译:用户建模对于理解用户行为至关重要,对于改善用户经验和个人化建议至关重要。 当用户与软件互动时, 大量命令序列是通过记录和分析系统生成的。 这些命令序列包含用户目标和意图的线索。 然而, 这些数据模式高度无结构且没有标签, 使得标准预测系统难以从中学习。 我们建议 SimCURL, 这是一个简单而有效的自我监督的深层次学习框架, 从未标定的指令序列中学习用户代表。 我们的方法引入了用户- 拥有的网络结构, 以及会话流, 作为数据增强的新方式。 我们用50亿个命令组成的真实世界命令序列数据集来培训和评估我们的方法。 我们的方法显示, 当所学到的表达方式被转移到诸如经验和专业知识分类等下游任务时, 现有方法有很大的改进。