While recent studies have focused on quantifying word usage to find the overall shapes of narrative emotional arcs, certain features of narratives within narratives remain to be explored. Here, we characterize the narrative time scale of sub-narratives by finding the length of text at which fluctuations in word usage begin to be relevant. We represent more than 30,000 Project Gutenberg books as time series using ousiometrics, a power-danger framework for essential meaning, itself a reinterpretation of the valence-arousal-dominance framework derived from semantic differentials. We decompose each book's power and danger time series using empirical mode decomposition into a sum of constituent oscillatory modes and a non-oscillatory trend. By comparing the decomposition of the original power and danger time series with those derived from shuffled text, we find that shorter books exhibit only a general trend, while longer books have fluctuations in addition to the general trend, similar to how subplots have arcs within an overall narrative arc. These fluctuations typically have a period of a few thousand words regardless of the book length or library classification code, but vary depending on the content and structure of the book. Our method provides a data-driven denoising approach that works for text of various lengths, in contrast to the more traditional approach of using large window sizes that may inadvertently smooth out relevant information, especially for shorter texts.
翻译:虽然最近的研究侧重于量化文字用法,以找到情绪色彩弧形的整体形状,但叙事中的某些叙事特征仍有待探讨。在这里,我们通过查找文字使用波动开始相关时的文字长度来描述子叙事的叙述时间尺度。我们把古滕贝格项目30,000多本书作为时间序列,使用显微计量法,即基本含义的权势危险框架,重新解释来自语义差异的价值-激励-主导框架。我们利用经验模式分解,将每本书的力量和危险时间序列分解成成成成成成成成构形的血管模式和一种非血管趋势。通过比较原始权力和危险时间序列的变异与由微调文字产生的文字序列,我们发现较短的书籍只显示一种一般趋势,而书本的波动与一般趋势相似,与总体叙事弧中的传统缩略图的弧法相似。这些波动通常有几千个词,而不论内容的长度或图书馆分类法的长度如何,这些变换得更细的文本的长度,而不同,在书籍中则会提供更细的文本的缩缩缩缩的顺序,但则会用更细的缩缩缩的缩的文字结构,而不同。