DISCCO PAL: 有心理和性欲标签的西班牙Sonnet体 (DISCO PAL: Diachronic Spanish Sonnet Corpus with Psychological and Affective Labels)

Nowadays, there are many applications of text mining over corpora from different languages. However, most of them are based on texts in prose, lacking applications that work with poetry texts. An example of an application of text mining in poetry is the usage of features derived from their individual words in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, even though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over an annotated corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words for predicting their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, identifying with tags when a sonnet is related to a specific term. Then, we study how GAM changes according to each of those psychological terms. The corpus used contains 274 Spanish sonnets from authors of different centuries, from 15th to 19th. This corpus was annotated by different domain experts. The experts annotated the poems with affective and lexico-semantic features, as well as with domain concepts that belong to psychology. Thanks to this, the corpus of sonnets can be used in different applications, such as poetry recommender systems, personality text mining studies of the authors, or the usage of poetry for therapeutic purposes.

翻译：目前,对不同语言的Corpora有许多应用文字的运用,然而,大多数应用文字的运用都基于流言中的文字,缺乏与诗歌相关的应用。在诗歌中应用文字的运用的一个实例是,使用从其单词中产生的特征,以捕捉文字的字典、亚缩和跨语言的含义,并推断出文文中的语义通用Affective含义(GAM),然而,尽管这一提议在某些语言中被证明对诗歌很有用,但缺乏对西班牙诗歌和高度结构的诗歌成份(如诗歌)的研究。这篇文章展示了对西班牙诗歌的附加文集进行的一项研究,以便分析能否用其单词来建立特征,以了解文字中的语义、亚缩和跨语言的文字。文章还分析了语义和内容本身之间的关系。我们从心理角度来考虑内容,在与诗歌有关的诗歌应用中识别符号。然后,我们研究GAM的用物理系和数学系的法系中,这些术语是不同的。