This work introduces a novel data augmentation method for few-shot website fingerprinting (WF) attack where only a handful of training samples per website are available for deep learning model optimization. Moving beyond earlier WF methods relying on manually-engineered feature representations, more advanced deep learning alternatives demonstrate that learning feature representations automatically from training data is superior. Nonetheless, this advantage is subject to an unrealistic assumption that there exist many training samples per website, which otherwise will disappear. To address this, we introduce a model-agnostic, efficient, and Harmonious Data Augmentation (HDA) method that can improve deep WF attacking methods significantly. HDA involves both intra-sample and inter-sample data transformations that can be used in harmonious manner to expand a tiny training dataset to an arbitrarily large collection, therefore effectively and explicitly addressing the intrinsic data scarcity problem. We conducted expensive experiments to validate our HDA for boosting state-of-the-art deep learning WF attack models in both closed-world and open-world attacking scenarios, at absence and presence of strong defense. {For instance, in the more challenging and realistic evaluation scenario with WTF-PAD based defense, our HDA method surpasses the previous state-of-the-art results by more than 4% in absolute classification accuracy in the 20-shot learning case.
翻译:这项工作引入了一种新颖的数据增强方法,用于少数网站指纹(WF)攻击,因为每个网站只有少量的培训样本可供深层次学习模式优化。超越早先的WF方法,依靠人工设计的特征演示,更先进的深层次学习替代方法表明,学习特征表从培训数据中自动产生优势。然而,这一优势取决于一个不切实际的假设,即每个网站有许多培训样本,否则就会消失。为了解决这个问题,我们引入了一种模型――不可知、高效和和谐的数据增强(HDA)方法,可以大大改进深入的WFF攻击方法。HDA方法涉及早期的样本内和样本间数据转换,可以和谐地将微小的培训数据集扩大到任意的大规模收集,从而有效而明确地解决内在的数据稀缺问题。我们进行了昂贵的实验,以验证我们的HDA,从而在封闭世界和开放世界的深度学习模式中,在没有和存在强有力的防御状态。 {例如,在更具挑战性和现实性的评价假设中,WTF-PAD-D的精确度数据集比我们之前的绝对的40级的精确性国防,我们用HDA-方法进行了比以往的精确性案例的分类。