Millions of vulnerable consumer IoT devices in home networks are the enabler for cyber crimes putting user privacy and Internet security at risk. Internet service providers (ISPs) are best poised to play key roles in mitigating risks by automatically inferring active IoT devices per household and notifying users of vulnerable ones. Developing a scalable inference method that can perform robustly across thousands of home networks is a non-trivial task. This paper focuses on the challenges of developing and applying data-driven inference models when labeled data of device behaviors is limited and the distribution of data changes (concept drift) across time and space domains. Our contributions are three-fold: (1) We collect and analyze network traffic of 24 types of consumer IoT devices from 12 real homes over six weeks to highlight the challenge of temporal and spatial concept drifts in network behavior of IoT devices; (2) We analyze the performance of two inference strategies, namely "global inference" (a model trained on a combined set of all labeled data from training homes) and "contextualized inference" (several models each trained on the labeled data from a training home) in the presence of concept drifts; and (3) To manage concept drifts, we develop a method that dynamically applies the ``closest'' model (from a set) to network traffic of unseen homes during the testing phase, yielding better performance in 20% of scenarios.
翻译:互联网服务提供商(ISPs)最有能力在降低风险方面发挥关键作用。 我们的贡献有三重:(1) 我们收集并分析来自12个真实家庭的24种消费IOT设备的网络流量,在6周内从12个真实家庭收集24种消费IOT设备的网络流量,以突出在IOT设备网络行为模式中的时间和空间概念漂移的挑战;(2) 我们分析两种推论战略的绩效,即“全球推论”(一个经过培训的关于设备行为标签数据组合模型的综合培训的模型)和“虚拟推论”(每个经过培训的模型都是关于动态家庭流动概念的更好模型); 在动态家庭测试过程中,我们从动态的流动概念中,我们从动态的流动概念中,我们从动态的流动概念中,我们从动态的流动概念中,我们从动态的流动模型中,我们从动态的流动模型中,我们从动态的流动模型中,我们从动态的流动模型中,我们从动态的流动模型中,我们从动态的流动模型中,我们从动态的流动模型中,我们从流动的流动模型中,我们从流动的流动模型中,我们从流动的流动的模型中,我们从流动的流动的模型的模型中,我们从流动模型的模型应用的模型的模型应用了。