We consider the task of linking social media accounts that belong to the same author in an automated fashion on the basis of the content and metadata of their corresponding document streams. We focus on learning an embedding that maps variable-sized samples of user activity -- ranging from single posts to entire months of activity -- to a vector space, where samples by the same author map to nearby points. The approach does not require human-annotated data for training purposes, which allows us to leverage large amounts of social media content. The proposed model outperforms several competitive baselines under a novel evaluation framework modeled after established recognition benchmarks in other domains. Our method achieves high linking accuracy, even with small samples from accounts not seen at training time, a prerequisite for practical applications of the proposed linking framework.
翻译:我们考虑将属于同一作者的社交媒体账户根据其相应文件流的内容和元数据自动地连接起来的任务。我们侧重于学习将用户活动(从单个站到整个月的活动)的可变规模样本嵌入到矢量空间,即同一作者地图的样本到附近点。这一方法并不要求为培训目的提供附加说明的数据,从而使我们能够利用大量社交媒体内容。拟议模式在新颖的评估框架下优于若干竞争性基线,而新颖的评估框架是根据其他领域既定的确认基准建模。我们的方法实现了高度的连结性,即使从培训时看不到的小账户样本,也是实际应用拟议链接框架的先决条件。