The ever-increasing number of threats and the existing diversity of information sources pose challenges for Computer Emergency Response Teams (CERTs). To respond to emerging threats, CERTs must gather information in a timely and comprehensive manner. But the volume of sources and information leads to information overload. This paper contributes to the question of how to reduce information overload for CERTs. We propose clustering incoming information as scanning this information is one of the most tiresome, but necessary, manual steps. Based on current studies, we establish conditions for such a framework. Different types of evaluation metrics are used and selected in relation to the framework conditions. Furthermore, different document embeddings and distance measures are evaluated and interpreted in combination with clustering methods. We use three different corpora for the evaluation, a novel ground truth corpus based on threat reports, one security bug report (SBR) corpus, and one with news articles. Our work shows, it is possible to reduce the information overload by up to 84.8% with homogeneous clusters. A runtime analysis of the clustering methods strengthens the decision of selected clustering methods. The source code and dataset will be made publicly available after acceptance.
翻译:暂无翻译