Stylometry is mostly applied to authorial style. Recently, researchers have begun investigating the style of characters, finding that the variation remains within authorial bounds. We address the stylistic distinctiveness of characters in drama. Our primary contribution is methodological; we introduce and evaluate two non-parametric methods to produce a summary statistic for character distinctiveness that can be usefully applied and compared across languages and times. Our first method is based on bootstrap distances between 3-gram probability distributions, the second (reminiscent of 'unmasking' techniques) on word keyness curves. Both methods are validated and explored by applying them to a reasonably large corpus (a subset of DraCor): we analyse 3301 characters drawn from 2324 works, covering five centuries and four languages (French, German, Russian, and the works of Shakespeare). Both methods appear useful; the 3-gram method is statistically more powerful but the word keyness method offers rich interpretability. Both methods are able to capture phonological differences such as accent or dialect, as well as broad differences in topic and lexical richness. Based on exploratory analysis, we find that smaller characters tend to be more distinctive, and that women are cross-linguistically more distinctive than men, with this latter finding carefully interrogated using multiple regression. This greater distinctiveness stems from a historical tendency for female characters to be restricted to an 'internal narrative domain' covering mainly direct discourse and family/romantic themes. It is hoped that direct, comparable statistical measures will form a basis for more sophisticated future studies, and advances in theory.
翻译:最近,研究人员开始调查字符的风格,发现这些差异仍然在作者的界限之内。我们处理戏剧中人物的风格特征。我们的主要贡献是方法;我们采用和评估两种非参数方法,以产生可有效应用和跨语言和时间比较的特征特征概要统计。我们的第一个方法基于3克概率分布之间的靴带距离,第二个方法(“不成熟”技术的迷惑)在文字关键度曲线上。两种方法都通过将其应用到一个相当大的内容(德拉科尔的一个子)加以验证和探索。我们分析3301个来自2324作品的特征,涵盖五个世纪和四种语言(法语、德语、俄语和莎士比莎士比。这两种方法似乎都有用;3克方法在统计学上更有力,但关键度方法提供了丰富的解释性。两种方法都能够捕捉感性差异,如口音或方言,以及主题和词汇丰富性之间的广泛差异。根据探索性分析,我们分析了3301个来自234作品的特征,涵盖五个世纪和四种语言(法语、德语、俄语和莎)的特征。我们发现一个比更鲜明的性别的特征分析结果更明显地是,在历史上更明显地研究中发现一个比典型的性别更独特性分析中更典型的走向更典型的走向。从一个更明显、更典型的性别更典型的走向。我们发现一个更接近于历史回归。