Opinion summarization aims to profile a target by extracting opinions from multiple documents. Most existing work approaches the task in a semi-supervised manner due to the difficulty of obtaining high-quality annotation from thousands of documents. Among them, some use aspect and sentiment analysis as a proxy for identifying opinions. In this work, we propose a new framework, FineSum, which advances this frontier in three aspects: (1) minimal supervision, where only aspect names and a few aspect/sentiment keywords are available; (2) fine-grained opinion analysis, where sentiment analysis drills down to the sub-aspect level; and (3) phrase-based summarization, where opinion is summarized in the form of phrases. FineSum automatically identifies opinion phrases from the raw corpus, classifies them into different aspects and sentiments, and constructs multiple fine-grained opinion clusters under each aspect/sentiment. Each cluster consists of semantically coherent phrases, expressing uniform opinions towards certain sub-aspect or characteristics (e.g., positive feelings for ``burgers'' in the ``food'' aspect). An opinion-oriented spherical word embedding space is trained to provide weak supervision for the phrase classifier, and phrase clustering is performed using the aspect-aware contextualized embedding generated from the phrase classifier. Both automatic evaluation on the benchmark and quantitative human evaluation validate the effectiveness of our approach.
翻译:意见总和旨在通过从多个文件中提取观点来描述一个目标。大多数现有工作都以半监督的方式对待这项任务,因为很难从数千份文件中获得高质量的自动批注。其中有一些使用方方面面和情绪分析作为意见识别的替代物。在这项工作中,我们提出一个新的框架FineSum, 该框架在三个方面推进这一前沿:(1) 最小监督,其中只有方名和几个方面/关键字可以提供;(2) 精细的见解分析,其中情绪分析会钻到次层层次;(3) 基于语句的总结,其中意见会以短语的形式总结。 FineSum自动从原始结构中找出意见短语,将其分为不同的方面和情绪,并在每个方面/主轴/主轴下构建多种微细细微的意见组合。每组由语义上的一致性短语组成,对某些次层或特性表示一致的意见(例如,对“食品”类次层层次分析的描述;(3) 以语句为基础总结,其中以语句形式总结意见,从经过培训的语系分类,从结构上进行。