The recent rapid advances in machine learning technologies largely depend on the vast richness of data available today, in terms of both the quantity and the rich content contained within. For example, biometric data such as images and voices could reveal people's attributes like age, gender, sentiment, and origin, whereas location/motion data could be used to infer people's activity levels, transportation modes, and life habits. Along with the new services and applications enabled by such technological advances, various governmental policies are put in place to regulate such data usage and protect people's privacy and rights. As a result, data owners often opt for simple data obfuscation (e.g., blur people's faces in images) or withholding data altogether, which leads to severe data quality degradation and greatly limits the data's potential utility. Aiming for a sophisticated mechanism which gives data owners fine-grained control while retaining the maximal degree of data utility, we propose Multi-attribute Selective Suppression, or MaSS, a general framework for performing precisely targeted data surgery to simultaneously suppress any selected set of attributes while preserving the rest for downstream machine learning tasks. MaSS learns a data modifier through adversarial games between two sets of networks, where one is aimed at suppressing selected attributes, and the other ensures the retention of the rest of the attributes via general contrastive loss as well as explicit classification metrics. We carried out an extensive evaluation of our proposed method using multiple datasets from different domains including facial images, voice audio, and video clips, and obtained promising results in MaSS' generalizability and capability of suppressing targeted attributes without negatively affecting the data's usability in other downstream ML tasks.
翻译:最近机器学习技术的快速进步在很大程度上取决于当今现有数据的广泛丰富性,包括数量和内容内容的丰富性。例如,图像和声音等生物鉴别数据可以揭示人们的年龄、性别、情绪和来源等属性,而位置/感知数据可以用来推断人们的活动水平、交通模式和生活习惯。除了这些技术进步促成的新服务和应用外,还制定了各种政府政策,以规范这些数据的视听使用和保护人们的隐私和权利。结果,数据所有者往往选择简单的数据模糊(例如,图像中显示目标图像的面貌模糊)或完全扣留数据,从而导致数据质量严重退化,大大限制数据的潜在效用。为了建立一个尖端机制,使数据所有者能够精细控制数据使用量的最大程度,我们提出了多种归宿选择的选择性抑制措施,即MASS,为同时压制任何选定的一组属性,同时保留下游机器学习任务的剩余部分(例如,图像中显示目标人物在图像质量上的面值)或完全保留数据。MaSS,我们通过普通的平面性游戏,通过其他的标定的标本来确保数据流流流中的数据修正能力,通过其他的标定的标定的标值,从而的标定的标定的标定的标值的标定的标定的标定的标定的标定的标定的标定的标定性,从而的标定的标定的标定的标定的标定的标定的标定的标定性,用于。