Opinion summarization is the task of automatically generating summaries that encapsulate information from multiple user reviews. We present Semantic Autoencoder (SemAE) to perform extractive opinion summarization in an unsupervised manner. SemAE uses dictionary learning to implicitly capture semantic information from the review and learns a latent representation of each sentence over semantic units. A semantic unit is supposed to capture an abstract semantic concept. Our extractive summarization algorithm leverages the representations to identify representative opinions among hundreds of reviews. SemAE is also able to perform controllable summarization to generate aspect-specific summaries. We report strong performance on SPACE and AMAZON datasets, and perform experiments to investigate the functioning of our model. Our code is publicly available at https://github.com/brcsomnath/SemAE.
翻译:意见总和是自动生成摘要,从多个用户审查中收集信息的任务。我们介绍Sematic Autoencoder(SemAE),以不受监督的方式进行采掘意见总和。SemAE使用字典学习从审查中隐含地获取语义信息,并了解语义单位对每个句子的潜在表达方式。一个语义单位应该捕捉抽象的语义概念。我们的采掘总算算法利用这些表达方式在数百个审查中确定有代表性的意见。SemAE还能够进行可控的汇总,以产生特定的方面摘要。我们报告了空间和AMAZON数据集方面的有力表现,并进行了实验,以调查我们模型的功能。我们的代码可在https://github.com/brcsomnath/SemAE公开查阅。