Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to their usability, and also it will be challenging to spoof because of their multimodal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems. These unique characteristics of this dataset will pave the way to implement novel state-of-the-art unimodal or audio-visual speaker recognition systems. We also report the performance of the bench-marked biometric verification systems on our dataset. The robustness of biometric algorithms is evaluated towards multiple dependencies like signal noise, device, language and presentation attacks like replay and synthesized signals with extensive experiments. The obtained results raised many concerns about the generalization properties of state-of-the-art biometrics methods in smartphones.
翻译:使用基于生物鉴别的核查系统来保障高度敏感应用的安全; 视听生物鉴别技术因其可用性而越来越受欢迎,而且由于其多式性质而具有挑战性。 在这项工作中,我们展示了在最近五部不同的智能手机中捕捉的视听智能电话数据集。这个新数据集包含在三次不同的会议中捕捉的103个主题,考虑到不同的现实世界情景。在这个数据集中获取了三种不同的语言,以包括扬声器识别系统的语言依赖性问题。这个数据集的这些独特特点将为执行新颖的最新单式或视听语音语音识别系统铺平道路。我们还在数据集中报告了专门设计的生物鉴别系统的业绩。对生物鉴别算法的稳健性进行了评价,使之具有多种依赖性,例如信号噪音、装置、语言和演示攻击,例如用广泛的实验重放和合成信号。获得的结果使人们对智能手机中最先进的生物鉴别方法的普及性提出了许多关切。