Several complex systems are characterized by presenting intricate characteristics taking place at several scales of time and space. These multiscale characterizations are used in various applications, including better understanding diseases, characterizing transportation systems, and comparison between cities, among others. In particular, texts are also characterized by a hierarchical structure that can be approached by using multi-scale concepts and methods. The multiscale properties of texts constitute a subject worth further investigation. In addition, more effective approaches to text characterization and analysis can be obtained by emphasizing words with potentially more informational content. The present work aims at developing these possibilities while focusing on mesoscopic representations of networks. More specifically, we adopt an extension to the mesoscopic approach to represent text narratives, in which only the recurrent relationships among tagged parts of speech (subject, verb and direct object) are considered to establish connections among sequential pieces of text (e.g., paragraphs). The characterization of the texts was then achieved by considering scale-dependent complementary methods: accessibility, symmetry and recurrence signatures. In order to evaluate the potential of these concepts and methods, we approached the problem of distinguishing between literary genres (fiction and non-fiction). A set of 300 books organized into the two genres was considered and were compared by using the aforementioned approaches. All the methods were capable of differentiating to some extent between the two genres. The accessibility and symmetry reflected the narrative asymmetries, while the recurrence signature provided a more direct indication about the non-sequential semantic connections taking place along the narrative.
翻译:若干复杂系统的特点是,在不同的时间和空间尺度上呈现复杂的特征,这些多尺度的描述用于各种应用,包括更好地了解疾病,确定运输系统的特点,以及城市之间的比较等。特别是,文本还具有等级结构的特点,可以通过使用多尺度的概念和方法加以处理。文本的多尺度特性是一个值得进一步调查的主题。此外,通过强调可能具有更多信息内容的文字,可以取得更有效的文本定性和分析方法;目前的工作旨在发展这些可能性,同时侧重于网络的中层图示。更具体地说,我们采用中层图解方法来代表文本说明,其中仅考虑有标记的演讲部分(主题、动词和直接对象)之间的经常性关系,以建立顺序文本(例如,段落)之间的联系。此外,对文本定性和分析的定性是通过考虑基于规模的辅助方法来实现的:可获取性、对称性和复现性签名。为了评估这些概念和方法的潜力,我们着手处理对文学特征和可读性特征之间不进行区分的问题,而采用两种直层图解的方法则考虑采用“三三三百”和“非分析”的方法。通过两种比较的方法,对等方法进行了区分。从结构和“三三三三”的精确方法对等方法进行了比较。