Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks. The Res2Net approach designs a residual-like connection between feature groups within one block, which increases the possible receptive fields and improves the system's detection generalizability. However, such a residual-like connection is performed by a direct addition between feature groups without channel-wise priority. We argue that the information across channels may not contribute to spoofing cues equally, and the less relevant channels are expected to be suppressed before adding onto the next feature group, so that the system can generalize better to unseen attacks. This argument motivates the current work that presents a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism in the connection between feature groups. This gating mechanism dynamically selects channel-wise features based on the input, to suppress the less relevant channels and enhance the detection generalizability. Three gating mechanisms with different structures are proposed and integrated into Res2Net. Experimental results conducted on ASVspoof 2019 logical access (LA) demonstrate that the proposed CG-Res2Net significantly outperforms Res2Net on both the overall LA evaluation set and individual difficult unseen attacks, which also outperforms other state-of-the-art single systems, depicting the effectiveness of our method.
翻译:在自动扬声器校验(ASV)中,现有的反吹嘘方法仍然缺乏对隐蔽攻击的通用性。Res2Net 方法设计了一个街区内各特征组之间类似残余的连接,增加了可能的可接收字段,改进了系统的可探测性。然而,这种类似残余的连接是通过在没有频道优先的情况下在特征组之间直接添加一个频道式的连接来进行的。我们争辩说,跨频道的信息可能不会有助于在输入下一个特性组之前平等掩盖信号,在添加到下一个特性组之前,预计将抑制不那么相关的渠道。这样,系统就可以更好地将隐蔽攻击加以概括化。这一论点激励了当前的工作,提出了一个新的、有频道的锁定的Res2Net(CG-Res2Net)(CG-Res2Net),它改变了Res2Net(Res2Net),使功能组之间能够直接增加一个频道式的导航机制。这个定位机制动态地选择基于输入的频道式特征,以压制不那么重要的渠道,并加强探测性通用性。三个不同结构的定位机制被提议并纳入Res2Net。在AS2Net上进行的实验性结果,对ARes2-Resper-Resformstrum的系统也明显地展示了2019号的单个访问系统设置了其他逻辑访问系统。