Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to the usability and also it will be challenging to spoof because of multi-modal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems. These unique characteristics of this dataset will pave the way to implement novel state-of-the-art unimodal or audio-visual speaker recognition systems. We also report the performance of the bench-marked biometric verification systems on our dataset. The robustness of biometric algorithms is evaluated towards multiple dependencies like signal noise, device, language and presentation attacks like replay and synthesized signals with extensive experiments. The obtained results raised many concerns about the generalization properties of state-of-the-art biometrics methods in smartphones.
翻译:以生物鉴别为基础的核查系统使用智能手机,为高度敏感应用提供安全保障。由于可用性,视听生物鉴别技术越来越受欢迎,而且由于多式性质,使用起来也具有挑战性。在这项工作中,我们展示了在最近五部不同的智能手机中捕捉的视听智能电话数据集。这个新数据集包含在三次不同的会议中捕捉的103个主题,其中考虑到不同的现实世界情景。在这个数据集中获取了三种不同的语言,以包括扬声器识别系统的语言依赖性问题。这个数据集的这些独特特点将为执行新颖的最新单式或视听语音语音识别系统铺平道路。我们还在我们的数据集中报告了专门设计的生物鉴别系统的性能。根据多种依赖性,例如信号噪音、装置、语言和演示攻击,例如用广泛的实验重放和合成信号,对生物鉴别方法在智能手机中的通用性提出了许多关切。