IoT 设备分类的成本认知特性选择 (Cost-aware Feature Selection for IoT Device Classification)

Classification of IoT devices into different types is of paramount importance, from multiple perspectives, including security and privacy aspects. Recent works have explored machine learning techniques for fingerprinting (or classifying) IoT devices, with promising results. However, existing works have assumed that the features used for building the machine learning models are readily available or can be easily extracted from the network traffic; in other words, they do not consider the costs associated with feature extraction. In this work, we take a more realistic approach, and argue that feature extraction has a cost, and the costs are different for different features. We also take a step forward from the current practice of considering the misclassification loss as a binary value, and make a case for different losses based on the misclassification performance. Thereby, and more importantly, we introduce the notion of risk for IoT device classification. We define and formulate the problem of cost-aware IoT device classification. This being a combinatorial optimization problem, we develop a novel algorithm to solve it in a fast and effective way using the Cross-Entropy (CE) based stochastic optimization technique. Using traffic of real devices, we demonstrate the capability of the CE based algorithm in selecting features with minimal risk of misclassification while keeping the cost for feature extraction within a specified limit.

翻译：从多种角度,包括安全和隐私方面,将IoT装置分为不同类型至关重要。最近的工作探索了指纹(或分类) IoT装置的机器学习技术,并取得了可喜的成果。然而,现有工作假设,建造机器学习模型的特征很容易获得,或者可以很容易地从网络交通中提取;换句话说,它们不考虑与特征提取有关的成本。在这项工作中,我们采取更现实的方法,认为特征提取有成本,不同特征的成本也不同。我们还从目前将错误分类损失视为二进制价值的做法向前迈出了一步,并根据错误分类的性能为不同的损失提供证据。因此,更重要的是,我们提出了IoT装置分类的风险概念。我们定义和阐述成本识别 IoT装置分类问题。这是一个组合式优化问题,我们用基于Cros-Estropy(CE)的随机优化技术快速和有效地解决这个问题。我们利用真实设备的交通流量来证明不同的损失是基于分类的。我们使用基于最低成本的Cristropy(C)的精度优化技术,同时在基于磁度的磁度上,我们用最起码的磁度选择了C级的磁度的磁度分析能力,我们用最起码的磁度的磁度选择了磁度的磁度。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

安全应急产业链发展白皮书（防疫应急物资）

专知会员服务

30+阅读 · 2021年4月10日