There is a growing tradition in the joint field of network studies and drama history that produces interpretations from the character networks of the plays.The potential of such an interpretation is that the diagrams provide a different representation of the relationships between characters as compared to reading the text or watching the performance. Our aim is to create a method that is able to cluster texts with similar structures on the basis of the play's well-interpretable and simple properties, independent from the number of characters in the drama, or in other words, the size of the network. Finding these features is the most important part of our research, as well as establishing the appropriate statistical procedure to calculate the similarities between the texts. Our data was downloaded from the DraCor database and analyzed in R (we use the GerDracor and the ShakeDraCor sub-collection). We want to propose a robust method based on the distribution of words among characters; distribution of characters in scenes, average length of speech acts, or character-specific and macro-level network properties such as clusterization coefficient and network density. Based on these metrics a supervised classification procedure is applied to the sub-collections to classify comedies and tragedies using the Support Vector Machine (SVM) method. Our research shows that this approach can also produce reliable results on a small sample size.
翻译:在网络研究和戏剧史的联合领域,一种从剧本的性格网络中产生解释的共同传统在不断增长。这种解释的潜力在于,图表提供了与阅读文字或观看表演不同的角色关系的不同表达方式。我们的目标是根据剧本的清楚解释和简单属性,建立一个能够将类似结构的文本分组的方法,独立于剧本中的角色数量,或者换句话说,网络的大小。找到这些特征是我们研究的最重要部分,并且建立了计算文本相似之处的适当统计程序。我们的数据是从德拉科尔数据库下载的,在R(我们使用GerDracor和ShakeDraCor子集)中分析的。我们想提出一个强有力的方法,其依据是字符在剧本中分布的文字;演员的分布,平均发言动作的长度,或者个性化和宏观网络的特性等。根据这些指标,一个监督的分类程序被应用到子集中,用来计算我们的图案样本中的相似之处。我们的数据被下载了,并在R(我们使用GerDracor和ShakDractorCor 亚集) 中分析了(我们使用Gerraphlesmaism) 方法来分析一个可靠的模型的结果。我们要提出一个强有力的方法。