Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews to determine: a) The target entity being reviewed, b) The high-level aspect to which it belongs, and c) The sentiment expressed toward the targets and the aspects. Numerous yet scattered corpora for ABSA make it difficult for researchers to identify corpora best suited for a specific ABSA subtask quickly. This study aims to present a database of corpora that can be used to train and assess autonomous ABSA systems. Additionally, we provide an overview of the major corpora for ABSA and its subtasks and highlight several features that researchers should consider when selecting a corpus. Finally, we discuss the advantages and disadvantages of current collection approaches and make recommendations for future corpora creation. This survey examines 65 publicly available ABSA datasets covering over 25 domains, including 45 English and 20 other languages datasets.
翻译:基于方面的情感分析(ABSA)是一种自然语言处理问题,需要分析用户生成的评价以确定:a)被评价的目标实体,b)所属的高层面,以及c)对目标和方面表达的情感。大量但分散的ABSA语料库使研究人员难以快速确定最适合特定ABSA子任务的语料库。本研究旨在提供一个数据库,供自主ABSA系统进行培训和评估。此外,我们还概述了ABSA和其子任务的主要语料库,并强调选择语料库时研究人员应考虑的几个特点。最后,我们讨论了目前收集方法的优缺点,并对未来的数据集创建提出了建议。这项调查研究了65个公开可用的ABSA数据集,涉及25多个领域,包括45个英语数据集和20个其他语言数据集。