Italian is a Romance language that has its roots in Vulgar Latin. The birth of the modern Italian started in Tuscany around the 14th century, and it is mainly attributed to the works of Dante Alighieri, Francesco Petrarca and Giovanni Boccaccio, who are among the most acclaimed authors of the medieval age in Tuscany. However, Italy has been characterized by a high variety of dialects, which are often loosely related to each other, due to the past fragmentation of the territory. Italian has absorbed influences from many of these dialects, as also from other languages due to dominion of portions of the country by other nations, such as Spain and France. In this work we present Vulgaris, a project aimed at studying a corpus of Italian textual resources from authors of different regions, ranging in a time period between 1200 and 1600. Each composition is associated to its author, and authors are also grouped in families, i.e. sharing similar stylistic/chronological characteristics. Hence, the dataset is not only a valuable resource for studying the diachronic evolution of Italian and the differences between its dialects, but it is also useful to investigate stylistic aspects between single authors. We provide a detailed statistical analysis of the data, and a corpus-driven study in dialectology and diachronic varieties.
翻译:现代意大利语的诞生始于14世纪的托斯卡尼,主要归功于丹特·阿利盖里、弗朗切斯科·彼得拉尔卡和乔瓦尼·博卡奇奥的作品,他们是托斯卡尼中世纪时代最受人欢迎的作者之一,但意大利的特点是多种方言,这些方言往往因过去领土分散而彼此关系松散,意大利人吸收了许多这些方言的影响,以及由于西班牙和法国等其他国家对国内部分地区的统治而来自其他语言的影响。在此工作中,我们介绍武尔加里斯,该项目旨在研究来自不同地区作者的意大利文本资源汇编,时间为1200至1600年。每种方言都与其作者有关,作者们也按家庭分类,即分享类似的文理学/体理学特征。因此,数据集不仅是研究意大利和法国等其他国家部分的支配性语言影响的宝贵资源,而且它也是研究意大利和各种方言的原论的理论进化分析的一个有用的资源。