Online discussion forums are prevalent and easily accessible, thus allowing people to share ideas and opinions by posting messages in the discussion threads. Forum threads that significantly grow in length can become difficult for participants, both newcomers and existing, to grasp main ideas. This study aims to create an automatic text summarizer for online forums to mitigate this problem. We present a framework based on hierarchical attention networks, unifying Bidirectional Long Short-Term Memory (Bi-LSTM) and Convolutional Neural Network (CNN) to build sentence and thread representations for the forum summarization. In this scheme, Bi-LSTM derives a representation that comprises information of the whole sentence and whole thread; whereas, CNN recognizes high-level patterns of dominant units with respect to the sentence and thread context. The attention mechanism is applied on top of CNN to further highlight the high-level representations that capture any important units contributing to a desirable summary. Extensive performance evaluation based on three datasets, two of which are real-life online forums and one is news dataset, reveals that the proposed model outperforms several competitive baselines.
翻译:在线讨论论坛很普遍,容易进入,因此人们可以通过在讨论线索中张贴信息来交流想法和意见。论坛线索的长度大增,对于新来者和现有参与者来说,很难掌握主要想法。本研究的目的是为在线论坛创建自动文本摘要,以缓解这一问题。我们提出了一个基于分级关注网络的框架,统一双向短期记忆(Bi-LSTM)和革命神经网络(Convolutional Nural Network),为论坛的总结建立句子和线状表述。在这个计划中,双线和线状模块包含整个句子和线条的信息;而CNN承认在句子和线条背景方面占主导地位的单位的高层次模式。关注机制在CNN上应用,以进一步突出高级别的表述,以收集有助于形成理想摘要的任何重要单位。基于三个数据集(其中两个是真实的在线论坛,一个是新闻数据集)的广泛业绩评估显示,拟议的模型超越了几个竞争性基线。