Privacy is of worldwide concern regarding activities and processes that include sensitive data. For this reason, many countries and territories have been recently approving regulations controlling the extent to which organizations may exploit data provided by people. Artificial intelligence areas, such as machine learning and natural language processing, have already successfully employed privacy-preserving mechanisms in order to safeguard data privacy in a vast number of applications. Information retrieval (IR) is likewise prone to privacy threats, such as attacks and unintended disclosures of documents and search history, which may cripple the security of users and be penalized by data protection laws. This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data. Our contribution is threefold: firstly, we present an overview of privacy threats to IR tasks; secondly, we discuss applicable privacy-preserving mechanisms which may be employed in solutions to restrain privacy hazards; finally, we bring insights on the tradeoffs between privacy preservation and utility performance for IR tasks.
翻译:由于这一原因,许多国家和地区最近批准了控制各组织利用人们提供的数据的程度的条例,人工情报领域,如机器学习和自然语言处理,已经成功地利用了保护隐私的机制,以在大量应用中保护数据隐私;信息检索同样容易受到隐私威胁,如攻击和意外披露文件和搜索历史,这可能损害用户的安全,并受到数据保护法的处罚;这项工作旨在突出和讨论IR最近文献中对隐私的公开挑战,重点是由用户生成的文本数据的任务;我们的贡献有三重:第一,我们概述了对IR任务的隐私威胁;第二,我们讨论了在限制隐私危险的解决办法中可能采用的适用的隐私保护机制;最后,我们提出了关于隐私保护与IR任务的效用执行之间的权衡问题。