Active measurements can be used to collect server characteristics on a large scale. This kind of metadata can help discovering hidden relations and commonalities among server deployments offering new possibilities to cluster and classify them. As an example, identifying a previously-unknown cybercriminal infrastructures can be a valuable source for cyber-threat intelligence. We propose herein an active measurement-based methodology for acquiring Transport Layer Security (TLS) metadata from servers and leverage it for their fingerprinting. Our fingerprints capture the characteristic behavior of the TLS stack primarily caused by the implementation, configuration, and hardware support of the underlying server. Using an empirical optimization strategy that maximizes information gain from every handshake to minimize measurement costs, we generated 10 general-purpose Client Hellos used as scanning probes to create a large database of TLS configurations used for classifying servers. We fingerprinted 28 million servers from the Alexa and Majestic toplists and two Command and Control (C2) blocklists over a period of 30 weeks with weekly snapshots as foundation for two long-term case studies: classification of Content Delivery Network and C2 servers. The proposed methodology shows a precision of more than 99 % and enables a stable identification of new servers over time. This study describes a new opportunity for active measurements to provide valuable insights into the Internet that can be used in security-relevant use cases.
翻译:主动测量可以用于在大规模上收集服务器特征。这种元数据可以帮助发现服务器部署之间的隐藏关系和共性,为将其聚类和分类提供新的可能性。例如,识别以前未知的网络犯罪基础设施可以成为网络威胁情报的有价值来源。在本文中,我们提出了一种基于主动测量的从服务器获取传输层安全(TLS)元数据并利用其进行指纹识别的方法。我们的指纹捕捉了 TLS 栈的特征行为,这主要是由于底层服务器的实现、配置和硬件支持引起的。我们使用一种经验优化策略,通过最大化每个握手的信息增益来最小化测量成本,生成了 10 个通用的客户端 Hello 作为扫描探针,用于创建一个大型的 TLS 配置数据库,用于对服务器进行分类。我们使用 Alexa 和 Majestic 的 toplists 以及两个命令和控制(C2)blocklists,对 2800 万个服务器进行了指纹识别,期间拍摄了每周的快照,作为两个长期案例研究的基础:内容交付网络和 C2 服务器的分类。所提出的方法显示了超过 99 % 的精度,并能够稳定地识别随着时间推移而变化的新服务器。本研究描述了主动测量提供有价值的对于互联网的深入洞察力,可以用于安全相关用例。