In this paper, we provide a series of multi-tasking benchmarks for simultaneously detecting spoofing at the segmental and utterance levels in the PartialSpoof database. First, we propose the SELCNN network, which inserts squeeze-and-excitation (SE) blocks into a light convolutional neural network (LCNN) to enhance the capacity of hidden feature selection. Then, we implement multi-task learning (MTL) frameworks with SELCNN followed by bidirectional long short-term memory (Bi-LSTM) as the basic model. We discuss MTL in PartialSpoof in terms of architecture (uni-branch/multi-branch) and training strategies (from-scratch/warm-up) step-by-step. Experiments show that the multi-task model performs relatively better than single-task models. Also, in MTL, a binary-branch architecture more adequately utilizes information from two levels than a uni-branch model. For the binary-branch architecture, fine-tuning a warm-up model works better than training from scratch. Models can handle both segment-level and utterance-level predictions simultaneously overall under a binary-branch multi-task architecture. Furthermore, the multi-task model trained by fine-tuning a segmental warm-up model performs relatively better at both levels except on the evaluation set for segmental detection. Segmental detection should be explored further.
翻译:在本文中,我们提供一系列多任务基准,以便在部分Spoof数据库中同时检测部分和语句层次的口腔口腔。首先,我们提议SELCNN网络,将挤和抽取(SE)块插入一个轻相神经神经网络(LCNNN),以提高隐藏特征选择的能力。然后,我们与SELCNN实施多任务学习框架,然后作为基本模型双向长期短期记忆(Bi-LSTM)双向双向双向。我们在部分Spoof中讨论部分Spoof(un-Branch/Mul-branch)和培训战略(从挤压/冲压(SE)区块),以便提高隐藏特征选择能力。在SELCNNN(M)中,我们采用多任务学习(MTL)框架,然后以双向的短期内短期记忆(Bi-LSTM)模式(B-LT)作为基本模型。对于二分级结构,在部分的暖模型中进行微调模型模型模型工作,比在长期检测(Stentrachal-cha)中进行双向级的多级的双向级,可以处理。