The term idiolect refers to the unique and distinctive use of language of an individual and it is the theoretical foundation of Authorship Attribution. In this paper we are focusing on learning distributed representations (embeddings) of social media users that reflect their writing style. These representations can be considered as stylistic fingerprints of the authors. We are exploring the performance of the two main flavours of distributed representations, namely embeddings produced by Neural Probabilistic Language models (such as word2vec) and matrix factorization (such as GloVe).
翻译:在本文中,我们的重点是学习社会媒体用户反映其写作风格的分布式表述(组合),这些表述可被视为作者的文体指纹。我们正在探索分布式表述的两个主要口味,即由神经概率语言模型(如Word2vec)和矩阵因子化(如GloVe)生成的嵌入模式(如GloVe)所产生的嵌入模式。