联邦学习环境中的情感认同言论攻击 (Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings)

Speech emotion recognition (SER) processes speech signals to detect and characterize expressed perceived emotions. Many SER application systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive demographic traits such as gender, age and language background. Consequently, it is desirable for SER systems to have the ability to classify emotion constructs while preventing unintended/improper inferences of sensitive and demographic information. Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing their local data. This training approach appears secure and can improve privacy for SER. However, recent works have demonstrated that FL approaches are still vulnerable to various privacy attacks like reconstruction attacks and membership inference attacks. Although most of these have focused on computer vision applications, such information leakages exist in the SER systems trained using the FL technique. To assess the information leakage of SER systems trained using FL, we propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters, corresponding to the FedSGD and the FedAvg training algorithms, respectively. As a use case, we empirically evaluate our approach for predicting the client's gender information using three SER benchmark datasets: IEMOCAP, CREMA-D, and MSP-Improv. We show that the attribute inference attack is achievable for SER systems trained using FL. We further identify that most information leakage possibly comes from the first layer in the SER model.

翻译：语音识别 (SER) 进程语言信号, 以检测和描述感知到的情绪。许多SER应用系统经常获取和传输在客户端收集的语音数据,并将在客户端收集的语音数据传送到远程云平台,以便进行推断和决策。但是,语音数据不仅包含关于声音表达表达中的情绪的丰富信息,而且还包含性别、年龄和语言背景等其他敏感的人口特征等敏感信息。因此,SER系统最好能够对情感构建进行分类,同时防止对敏感和人口信息进行意外/不当的推断。联邦学习(FL) 是一个分布式的机器学习模式,它协调客户合作培训模型,以培养模型而无需分享其本地数据。这种培训方法似乎安全,可以改善SER的隐私。然而,最近的研究表明,FL方法仍然容易受到各种隐私攻击,例如重建攻击和加入感知性攻击。尽管这些系统大多侧重于计算机视觉应用,但SER系统也存在这种信息渗漏。我们用FLL培训过的SER系统的信息渗漏情况,我们用FRA培训过的SER方法提出一个属性攻击框架,用来推断客户的敏感信息从共同的梯值信息, 水平或ARCA值指标指标指标指标显示我们使用共同的CRA的CRAD的SL值或CL值。