User-generated data on social media contain rich information about who we are, what we like and how we make decisions. In this paper, we survey representative work on learning a concise latent user representation (a.k.a. user embedding) that can capture the main characteristics of a social media user. The learned user embeddings can later be used to support different downstream user analysis tasks such as personality modeling, suicidal risk assessment and purchase decision prediction. The temporal nature of user-generated data on social media has largely been overlooked in much of the existing user embedding literature. In this survey, we focus on research that bridges the gap by incorporating temporal/sequential information in user representation learning. We categorize relevant papers along several key dimensions, identify limitations in the current work and suggest future research directions.
翻译:用户在社交媒体上生成的数据包含关于我们是谁、我们喜欢什么和我们如何作出决定的丰富信息。在本文中,我们调查了代表们关于学习简明的潜在用户代表(a.k.a.用户嵌入)的工作,这种代表可以捕捉社交媒体用户的主要特征。学习的用户嵌入可用于支持不同的下游用户分析任务,如个性建模、自杀风险评估和购买决策预测。在现有的用户嵌入文献中,用户生成的社交媒体数据的时间性质在很大程度上被忽略了。在本次调查中,我们侧重于通过在用户代表学习中纳入时间/顺序信息来弥合差距的研究。我们按几个关键方面对相关文件进行分类,找出当前工作中的局限性,并提出未来研究方向。