Security incidents and data breaches are increasing rapidly, and only a fraction of them is being reported. Public vulnerability databases, e.g., national vulnerability database (NVD) and common vulnerability and exposure (CVE), have been leading the effort in documenting vulnerabilities and sharing them to aid defenses. Both are known for many issues, including brief vulnerability descriptions. Those descriptions play an important role in communicating the vulnerability information to security analysts in order to develop the appropriate countermeasure. Many resources provide additional information about vulnerabilities, however, they are not utilized to boost public repositories. In this paper, we devise a pipeline to augment vulnerability description through third party reference (hyperlink) scrapping. To normalize the description, we build a natural language summarization pipeline utilizing a pretrained language model that is fine-tuned using labeled instances and evaluate its performance against both human evaluation (golden standard) and computational metrics, showing initial promising results in terms of summary fluency, completeness, correctness, and understanding.
翻译:公共脆弱性数据库,例如国家脆弱性数据库和常见脆弱性和风险数据库(CVE),一直在带头记录脆弱性,并分享这些脆弱性和风险,以帮助防御。这两个数据库在许多问题上都是众所周知的,包括简单的脆弱性描述。这些描述在向安全分析家传递脆弱性信息以制定适当的应对措施方面发挥了重要作用。许多资源提供了关于脆弱性的更多信息,但并未用于加强公共储存库。在本文中,我们设计了一条管道,通过第三方参考(超级链接)拆解来增加脆弱性描述。为了实现描述正常化,我们利用事先经过训练的语言模型,使用贴标签的事例进行微调,并根据人类评价(黄金标准)和计算指标评估其绩效,在简要流利、完整性、正确性和理解性方面显示出初步的有希望的结果。