This paper presents a corpus annotated for the task of direct-speech extraction in Croatian. The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian and on the analysis of its language-specific differences compared to English. From this, a list of the phenomena that require special attention when performing these annotations is derived. The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.
翻译:本文为克罗地亚语直接语音提取任务提供了附加说明的文集,重点介绍塞蒂亚时报克罗地亚语新闻集的引文、共同参考分辨率和情绪说明,并分析克罗地亚语与英语的语言差异。从中可得出一份清单,列出在执行这些说明时需要特别注意的现象。生成的引文附加说明可用于自然语言处理领域的多项任务。