在网上数据库上以排名为基础的推论 (Rank-Based Inference over Web Databases)

In recent years, there has been much research in Ranked Retrieval model in structured databases, especially those in web databases. With this model, a search query returns top-k tuples according to not just exact matches of selection conditions, but a suitable ranking function. This paper studies a novel problem on the privacy implications of database ranking. The motivation is a novel yet serious privacy leakage we found on real-world web databases which is caused by the ranking function design. Many such databases feature private attributes - e.g., a social network allows users to specify certain attributes as only visible to him/herself, but not to others. While these websites generally respect the privacy settings by not directly displaying private attribute values in search query answers, many of them nevertheless take into account such private attributes in the ranking function design. The conventional belief might be that tuple ranks alone are not enough to reveal the private attribute values. Our investigation, however, shows that this is not the case in reality. To address the problem, we introduce a taxonomy of the problem space with two dimensions, (1) the type of query interface and (2) the capability of adversaries. For each subspace, we develop a novel technique which either guarantees the successful inference of private attributes, or does so for a significant portion of real-world tuples. We demonstrate the effectiveness and efficiency of our techniques through theoretical analysis, extensive experiments over real-world datasets, as well as successful online attacks over websites with tens to hundreds of millions of users - e.g., Amazon Goodreads and Renren.com.

翻译：近年来,在结构化数据库中,特别是在网络数据库中,对高级检索模型进行了大量研究。使用这一模型,搜索查询查询不仅根据选择条件的准确匹配性,而且根据适当的排名函数,返回了顶部图普尔。本文研究了数据库排名的隐私影响方面的新问题。动机是我们在真实世界网络数据库中发现的新颖而严重的隐私渗漏,这是排名函数设计造成的。许多这类数据库都具有私人属性,例如,社交网络允许用户指定某些特性,只有他/她自己能看得见,而其他人不能看得见。虽然这些网站通常尊重隐私设置,不直接显示搜索查询答案的私人属性值,但其中许多网站在排序函数设计中也考虑到了这类私人属性。传统的信念可能是,单靠图普尔的排名不足以揭示私人属性值。然而,我们的调查表明,事实并非如此。为了解决这个问题,我们引入了问题空间的分类,有两种层面:(1) 查询接口的类型,以及(2) 对手的能力。对于每个子空间的隐私用户来说,我们开发了一种新的技术,要么是成功的,要么是真实的,要么是真实的。