The availability of quantitative methods that can analyze text has provided new ways of examining literature in a manner that was not available in the pre-information era. Here we apply comprehensive machine learning analysis to the work of William Shakespeare. The analysis shows clear change in style of writing over time, with the most significant changes in the sentence length, frequency of adjectives and adverbs, and the sentiments expressed in the text. Applying machine learning to make a stylometric prediction of the year of the play shows a Pearson correlation of 0.71 between the actual and predicted year, indicating that Shakespeare's writing style as reflected by the quantitative measurements changed over time. Additionally, it shows that the stylometrics of some of the plays is more similar to plays written either before or after the year they were written. For instance, Romeo and Juliet is dated 1596, but is more similar in stylometrics to plays written by Shakespeare after 1600. The source code for the analysis is available for free download.
翻译:可以分析文字的定量方法的可用性提供了以信息前时代所不具备的方式审查文学的新方式。 我们在这里对威廉·莎士比亚的工作进行全面的机器学习分析。 分析显示,随着时间的推移,写作风格发生了明显变化, 句长、 形容词和动词频率以及文字中表达的情绪都发生了最显著的变化。 应用机器来对剧本年度进行轮廓学预测, 显示实际年份和预测年份之间有0.71的皮尔逊相关性, 表明数量测量反映的莎士比亚的写作风格随时间变化。 此外, 分析还显示, 一些剧的台式比在写作的年份前后更类似。 例如, 罗密和朱丽叶是1596年的, 但与莎士比1600年后的剧体格学则更相似。 分析的来源代码可供免费下载。