加强文件摘要的语义-对称性 (Reinforcing Semantic-Symmetry for Document Summarization)

Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions. The main issue is how to make the output summary semantically consistent with the input document. To reach this goal, recently, researchers have focused on supervised end-to-end hybrid approaches, which contain an extractor module and abstractor module. Among them, the extractor identifies the salient sentences from the input document, and the abstractor generates a summary from the salient sentences. This model successfully keeps the consistency between the generated summary and the reference summary via various strategies (e.g., reinforcement learning). There are two semantic gaps when training the hybrid model (one is between document and extracted sentences, and the other is between extracted sentences and summary). However, they are not explicitly considered in the existing methods, which usually results in a semantic bias of summary. To mitigate the above issue, in this paper, a new \textbf{r}einforcing s\textbf{e}mantic-\textbf{sy}mmetry learning \textbf{m}odel is proposed for document summarization (\textbf{ReSyM}). ReSyM introduces a semantic-consistency reward in the extractor to bridge the first gap. A semantic dual-reward is designed to bridge the second gap in the abstractor. The whole document summarization process is implemented via reinforcement learning with a hybrid reward mechanism (combining the above two rewards). Moreover, a comprehensive sentence representation learning method is presented to sufficiently capture the information from the original document. A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent. The results have shown the superiority of ReSyM by comparing it with the state-of-the-art baselines in terms of various evaluation metrics.

翻译：文件总和将长的文档压缩成短版本, 包含突出的信息和准确的语义描述。主要问题是如何使输出摘要的语义与输入文档保持一致。为了实现这一目标, 研究人员最近侧重于监督端到端混合方法, 其中包括一个提取模块和抽象模块。其中, 摘取器从输入文档中找出突出的句子, 抽象或从突出的句子中生成摘要。这个模型通过各种战略( 如, 增强学习) 成功地保持生成的摘要与参考摘要的一致性。当培训混合模型时( 一个在文档和提取的句子之间, 而另一个在提取的句子和摘要之间) 。然而, 研究人员没有明确地考虑这些方法, 通常导致摘要的语义偏差。为了减轻上述问题, 本文中, 一个新的 textb{r} 将 talfffr_ 插入 sliverff{e} 。 mantic- sy- textmaisal 正在显示的语义学习 ligare{ m} 。在文件中, IMdeal deal mdeldeal a train a train a trueal- dealmadeal- deal- dealmentalmadealdaldaldal dald dald daldald 。。。正在提议, 在文件的正在制成成的文档缩缩缩缩数据, 。在文件缩进进进进的缩数据, 。 Smaldaldalmadaldaldaldaldaldaldaldaldaldaldaldaldaldaldddddddddddddddddddddddaldaldaldaldaldaldaldaldaldaldaldalddalddddddddddddddddddddddddddddddaldaldaldaldaldddddddddddddddddddddddddaldaldaldaldddaldaldaldaldaldaldaldaldaldaldaldaldald