Embeddings-based 检索的准确性和垃圾处理：社交网络搜索案例研究 (Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search)

Embedding based retrieval has seen its usage in a variety of search applications like e-commerce, social networking search etc. While the approach has demonstrated its efficacy in tasks like semantic matching and contextual search, it is plagued by the problem of uncontrollable relevance. In this paper, we conduct an analysis of embedding-based retrieval launched in early 2021 on our social network search engine, and define two main categories of failures introduced by it, integrity and junkiness. The former refers to issues such as hate speech and offensive content that can severely harm user experience, while the latter includes irrelevant results like fuzzy text matching or language mismatches. Efficient methods during model inference are further proposed to resolve the issue, including indexing treatments and targeted user cohort treatments, etc. Though being simple, we show the methods have good offline NDCG and online A/B tests metrics gain in practice. We analyze the reasons for the improvements, pointing out that our methods are only preliminary attempts to this important but challenging problem. We put forward potential future directions to explore.

翻译：基于嵌入的检索已在电子商务、社交网络搜索等各种搜索应用中得到应用。虽然该方法在语义匹配和上下文搜索等任务中已经证明了其功效，但是它却困扰着无法控制的相关性的问题。在本文中，我们对于2021年初在我们的社交网络搜索引擎上推出的基于嵌入的检索进行了分析，并定义了由此引入的两个主要故障类别，一是完整性故障，二是垃圾类故障。前者涉及到恶意言论和冒犯性内容等问题，可能严重影响用户体验，而后者包括模糊文本匹配或语言不匹配等不相关的结果。我们进一步提出了用于解决这一问题的高效方法，包括索引处理和针对特定用户分组等。虽然这些方法很简单，但是实践表明其在离线NDCG以及在线A/B测试度量指标上均有良好的成效。我们分析了这些改进的原因，并指出，我们的方法只是这个重要而具有挑战性问题的初步尝试。我们提出潜在的未来研究方向。

相关内容

社交网络

关注 25

社会网络（英语：Social network），是由许多节点构成的一种社会结构，节点通常是指个人或组织，社会网络代表各种社会关系，经由这些社会关系，把从偶然相识的泛泛之交到紧密结合的家庭关系的各种人们或组织串连起来。社会网络由一个或多个特定类型的相互依存，如价值观、理想、观念、金融交流、友谊、血缘关系、不喜欢、冲突或贸易。由此产生的图形结构往往是非常复杂的。

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

WWW2021 | 图机器学习论文一览

专知会员服务

59+阅读 · 2021年4月29日

图挖掘与多关系学习，亚马逊与CMU-WWW2021教程，附161页ppt

专知会员服务

37+阅读 · 2021年4月20日

【KDD 2020】Facebook搜索中语义Embedding检索技术的实战经验总结

专知会员服务

32+阅读 · 2020年7月27日