This work proposes two statistical approaches for the synthesis of keystroke biometric data based on Universal and User-dependent Models. Both approaches are validated on the bot detection task, using the keystroke synthetic data to better train the systems. Our experiments include a dataset with 136 million keystroke events from 168,000 subjects. We have analyzed the performance of the two synthesis approaches through qualitative and quantitative experiments. Different bot detectors are considered based on two supervised classifiers (Support Vector Machine and Long Short-Term Memory network) and a learning framework including human and generated samples. Our results prove that the proposed statistical approaches are able to generate realistic human-like synthetic keystroke samples. Also, the classification results suggest that in scenarios with large labeled data, these synthetic samples can be detected with high accuracy. However, in few-shot learning scenarios it represents an important challenge.
翻译:这项工作提出了基于通用和用户依赖模型的按键生物鉴别数据综合的两种统计方法。两种方法都对机器人检测任务进行了验证,使用按键合成数据更好地培训这些系统。我们的实验包括一个数据集,有来自168 000个学科的1.36亿个按键事件。我们通过定性和定量实验分析了两种合成方法的性能。不同的按键检测器根据两个受监督的分类器(支持矢量机和长期短期内存网络)和包括人类和生成样本在内的学习框架进行了考虑。我们的结果证明,拟议的统计方法能够产生现实的人类类合成按键样本。此外,分类结果显示,在使用大标签数据的情况下,这些合成样本可以非常精确地检测。然而,在少数几眼的学习情景中,它是一个重大挑战。