A typical product or place often has hundreds of reviews, and summarization of these texts is an important and challenging problem. Recent progress on abstractive summarization in domains such as news has been driven by supervised systems trained on hundreds of thousands of news articles paired with human-written summaries. However for opinion texts, such large scale datasets are rarely available. Unsupervised methods, self-training, and few-shot learning approaches bridge that gap. In this work, we present a novel self-training approach, OpineSum, for abstractive opinion summarization. The summaries in this approach are built using a novel application of textual entailment and capture the consensus of opinions across the various reviews for an item. This method can be used to obtain silver-standard summaries on a large scale and train both unsupervised and few-shot abstractive summarization systems. OpineSum achieves state-of-the-art performance in both settings.
翻译:典型的产品或地方往往有数百份评论,对这些文本的总结是一个重要和具有挑战性的问题。最近在诸如新闻等领域的抽象总结方面取得的进展是由在数十万条新闻文章和人文摘要的配对下经过培训的监督下系统推动的。然而,对于意见文本来说,这种大规模数据集很少提供。这种不受监督的方法、自我培训和少见的学习方法缩小了这一差距。在这项工作中,我们提出了一个全新的自我培训方法,OpineSum, 供抽象的观点总结。这一方法的概要是利用对文本内容的新型应用和捕捉对某项作品的各种审查意见的共识来构建的。这种方法可以用来大规模地获取银质标准摘要,并培训不受监督的和几发式的抽象总结系统。OpineSum在两种环境中都取得了最先进的表现。