The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.
翻译:计算机设备无处不在的可用性和互联网的广泛使用不断产生大量数据。因此,关于任何特定主题的现有信息数量远远超过人类处理能力,远远超出了人类处理能力,无法正确处理,造成信息超载。为了高效率地处理大量信息,生成对用户具有重要价值的内容,我们需要确定、合并和总结信息。数据摘要可以帮助收集相关信息,并将其收集成一个较短的格式,从而能够回答复杂的问题,获得新的洞察力和发现概念界限。本论文侧重于利用新式合成技术减轻信息超载的三大挑战。它进一步打算便利分析文件以支持个人化信息提取。该论文将研究问题分为四个领域,包括:(一) 文件汇总的特征工程,(二) 传统的静态和不灵活摘要,(三) 传统的通用汇总方法,以及(四) 参考摘要。我们提出了应对这些挑战的新办法,其方法是:一) 增强自动智能特征工程,二) 使灵活和互动的合成方法得以实现。三) 将数据超载性化方法分为四个领域,包括:(一) 文件汇总的特征工程设计,我们提出的其他智能和个体化数据汇总。