Automatic comment generation is a special and challenging task to verify the model ability on news content comprehension and language generation. Comments not only convey salient and interesting information in news articles, but also imply various and different reader characteristics which we treat as the essential clues for diversity. However, most of the comment generation approaches only focus on saliency information extraction, while the reader-aware factors implied by comments are neglected. To address this issue, we propose a unified reader-aware topic modeling and saliency information detection framework to enhance the quality of generated comments. For reader-aware topic modeling, we design a variational generative clustering algorithm for latent semantic learning and topic mining from reader comments. For saliency information detection, we introduce Bernoulli distribution estimating on news content to select saliency information. The obtained topic representations as well as the selected saliency information are incorporated into the decoder to generate diversified and informative comments. Experimental results on three datasets show that our framework outperforms existing baseline methods in terms of both automatic metrics and human evaluation. The potential ethical issues are also discussed in detail.
翻译:自动生成评论是一项特殊而具有挑战性的任务,用于核实关于新闻内容理解和语言生成的模型能力。评论不仅传达了新闻文章中的突出和有趣信息,还包含各种不同的读者特点,我们把这些特点视为多样性的基本线索。然而,大多数评论生成方法只是侧重于突出的信息提取,而评论中隐含的读者认识因素被忽略。为解决这一问题,我们提议建立一个统一的读者认识主题建模和突出信息检测框架,以提高所产生评论的质量。对于读者认识的专题建模而言,我们设计了一种变式的基因组合算法,用于潜在语义学习和从读者评论中挖掘专题。关于突出信息检测,我们引入了Bernoulli发布新闻内容估计,以选择突出信息。所获得的专题表述以及选定的突出信息被纳入解码器,以产生多样化和内容丰富的评论。关于三个数据集的实验结果显示,我们的框架在自动计量和人类评价方面都超越了现有的基线方法。还详细讨论了潜在的伦理问题。