Protocol reverse engineering based on traffic traces infers the behavior of unknown network protocols by analyzing observable network messages. To perform correct deduction of message semantics or behavior analysis, accurate message type identification is an essential first step. However, identifying message types is particularly difficult for binary protocols, whose structural features are hidden in their densely packed data representation. We leverage the intrinsic structural features of binary protocols and propose an accurate method for discriminating message types. Our approach uses a similarity measure with continuous value range by comparing feature vectors where vector elements correspond to the fields in a message, rather than discrete byte values. This enables a better recognition of structural patterns, which remain hidden when only exact value matches are considered. We combine Hirschberg alignment with DBSCAN as cluster algorithm to yield a novel inference mechanism. By applying novel autoconfiguration schemes, we do not require manually configured parameters for the analysis of an unknown protocol, as required by earlier approaches. Results of our evaluations show that our approach has considerable advantages in message type identification result quality and also execution performance over previous approaches.
翻译:基于交通轨迹的礼宾反向工程通过分析可观测网络信息,推断出未知网络协议的行为。为了正确扣减电文的语义或行为分析,准确的电文类型识别是关键的第一步。然而,对于二进制协议来说,识别电文类型特别困难,因为二进制协议的结构特征隐藏在密集的包装数据中。我们利用二进制协议的内在结构特征,并为区分电文类型提出了准确的方法。我们的方法使用一种具有连续价值范围的类似测量方法,比较在信息中矢量元素与字段相对应的特性矢量,而不是离散的字节值。这样可以更好地识别结构模式,只有考虑精确的值匹配才能隐藏这些结构模式。我们把Hirschberg和DBSCAN作为集算法结合,以产生新的推断机制。我们采用新式自动配置计划,不需要人工配置参数来分析未知的协议类型。我们的评估结果表明,我们的方法在信息类型识别结果质量和以往方法的绩效方面有很大优势。