Stylistic analysis of text is a key task in research areas ranging from authorship attribution to forensic analysis and personality profiling. The existing approaches for stylistic analysis are plagued by issues like topic influence, lack of discriminability for large number of authors and the requirement for large amounts of diverse data. In this paper, the source of these issues are identified along with the necessity for a cognitive perspective on authorial style in addressing them. A novel feature representation, called Trajectory-based Style Estimation (TraSE), is introduced to support this purpose. Authorship attribution experiments with over 27,000 authors and 1.4 million samples in a cross-domain scenario resulted in 90% attribution accuracy suggesting that the feature representation is immune to such negative influences and an excellent candidate for stylistic analysis. Finally, a qualitative analysis is performed on TraSE using physical human characteristics, like age, to validate its claim on capturing cognitive traits.
翻译:文字的立体分析是研究领域的一项关键任务,从作者归属到法医学分析和个性特征分析,现有的立体分析方法受到诸如专题影响、大量作者缺乏差异性以及需要大量不同数据等问题的困扰。在本文中,这些问题的来源与从认知角度看待作者风格的必要性一起被确定为解决这些问题的必要性。为了支持这一目的,采用了一种新颖的特征说明,称为 " 以轨迹为基础的样式估计(TRASE) " 。在跨主题情况下,与27 000多名作者和140万样本进行授权归属实验,结果90%的归属准确性表明特征代表不受这种负面影响,是进行文体分析的优秀候选者。最后,对TraSEE进行了定性分析,利用人的身体特征,如年龄,验证其关于捕捉认知特征的主张。