As online dating has become more popular in the past few years, an efficient and effective algorithm to match users is needed. In this project, we proposed a new dating matching algorithm that uses Kendall-Tau distance to measure the similarity between users based on their ranking for items in a list. (e.g., their favourite sports, music, etc.) To increase the performance of the search process, we applied a tree-based searching structure, Cascading Metric Tree (CMT), on this metric. The tree is built on ranked lists from all the users; when a query target and a radius are provided, our algorithm can return users within the radius of the target. We tested the scaling of this searching method on a synthetic dataset by varying list length, population size, and query radius. We observed that the algorithm is able to query the best matching people for the user in a practical time, given reasonable parameters. We also provided potential future improvements that can be made to this algorithm based on the limitations. Finally, we offered more use cases of this search structure on Kendall-Tau distance and new insight into real-world applications of distance search structures.
翻译:随着在线约会在过去几年变得更加流行,需要一种有效率的算法来匹配用户。在本项目中,我们提出了一种新的约会匹配算法,它使用Kendall-Tau距离来衡量用户之间基于排名在列表中的项目(例如他们最喜欢的运动、音乐等)的相似度。为了提高搜索过程的性能,我们在这个度量上使用了一种基于树形结构的搜索方法,级联度量树(CMT)。该树根据所有用户的排名列表建立;当提供查询目标和半径时,我们的算法可以返回距离目标半径内的用户。我们通过改变列表长度、种群大小和查询半径来测试该搜索方法的伸缩性。我们观察到,鉴于合理的参数,该算法能够查询出最符合用户匹配的人,且在实践中用时合理。我们还提出了基于局限性可以进行的该算法的潜在改进,并提供了更多在Kendall-Tau距离上使用该搜索结构和对距离搜索结构在实际应用中的新见解。