With the use of personal devices connected to the Internet for tasks such as searches and shopping becoming ubiquitous, ensuring the privacy of the users of such services has become a requirement in order to build and maintain customer trust. While text privatization methods exist, they require the existence of a trusted party that collects user data before applying a privatization method to preserve users' privacy. In this work we propose an efficient mechanism to provide metric differential privacy for text data on-device. With our solution, sensitive data never leaves the device and service providers only have access to privatized data to train models on and analyze. We compare our algorithm to the state-of-the-art for text privatization, showing similar or better utility for the same privacy guarantees, while reducing the storage costs by orders of magnitude, enabling on-device text privatization.
翻译:随着搜索和购物等任务使用与互联网相连的个人设备变得无处不在,确保这类服务用户的隐私已成为建立和维持客户信任的一项要求;虽然存在文本私有化方法,但要求有一个受信任的当事方收集用户数据,然后采用私有化方法来保护用户隐私;在这项工作中,我们提议了一个有效的机制,为网上的文本数据提供量化的保密性;有了我们的解决方案,敏感数据永远不会离开设备和服务供应商,只能获得私营化的数据来培训和分析模型。我们比较我们的算法与文本私有化的最新数据,显示同一隐私保障的类似或更好的效用,同时按数量顺序降低存储成本,使在线文本私有化成为可能。