In this technical note we suggest a novel approach to discover temporal (related and unrelated to language dilation) and personality (authorship attribution) in historical datasets. We exemplify our approach on the State of the Union speeches given by the past 42 US presidents: this dataset is known for its relatively small amount of data, and high variability of the amount and style of texts. Nevertheless we manage to achieve about 95\% accuracy on the authorship attribution task, and pin down the date of writing to a single presidential term.
翻译:暂无翻译