Voice authentication has been widely used on smartphones. However, it remains vulnerable to spoofing attacks, where the attacker replays recorded voice samples from authentic humans using loudspeakers to bypass the voice authentication system. In this paper, we present MagLive, a robust voice liveness detection scheme designed for smartphones to mitigate such spoofing attacks. MagLive leverages the differences in magnetic pattern changes generated by different speakers (i.e., humans or loudspeakers) when speaking for liveness detection, which are captured by the built-in magnetometer on smartphones. To extract effective and robust magnetic features, MagLive utilizes a TF-CNN-SAF model as the feature extractor, which includes a time-frequency convolutional neural network (TF-CNN) combined with a self-attention-based fusion (SAF) model. Supervised contrastive learning is then employed to achieve user-irrelevance, device-irrelevance, and content-irrelevance. MagLive imposes no additional burden on users and does not rely on active sensing or specialized hardware. We conducted comprehensive experiments with various settings to evaluate the security and robustness of MagLive. Our results demonstrate that MagLive effectively distinguishes between humans and attackers (i.e., loudspeakers), achieving an average balanced accuracy (BAC) of 99.01% and an equal error rate (EER) of 0.77%.
翻译:暂无翻译