Contextual word representations generated by language models (LMs) learn spurious associations present in the training corpora. Recent findings reveal that adversaries can exploit these associations to reverse-engineer the private attributes of entities mentioned within the corpora. These findings have led to efforts towards minimizing the privacy risks of language models. However, existing approaches lack interpretability, compromise on data utility and fail to provide privacy guarantees. Thus, the goal of my doctoral research is to develop interpretable approaches towards privacy preservation of text representations that retain data utility while guaranteeing privacy. To this end, I aim to study and develop methods to incorporate steganographic modifications within the vector geometry to obfuscate underlying spurious associations and preserve the distributional semantic properties learnt during training.
翻译:最近的调查结果显示,对手可以利用这些协会逆向设计公司内提及的实体的私人属性,从而尽量减少语言模型的隐私风险,但是,现有的方法缺乏可解释性,数据效用缺乏妥协,而且没有提供隐私保障,因此,我的博士研究的目的是为保留数据效用同时保障隐私的文本标识的隐私保护制定可解释性方法。为此,我打算研究并制订方法,在矢量几何中纳入血清修改,以混淆潜在的虚假协会,并保护培训期间学会的分布式语义特性。