In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function. One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries. Indeed, as we demonstrate in simulation settings and real data examples, when semantically similar queries are available it is possible to gainfully use them when ranking with respect to a particular query. We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome.
翻译:在现代排名问题中,对要排位的项目往往有不同和不同的表述方式,因此,试图将这些表述方式结合起来来提高排名是明智的。事实上,通过合并表述方式进行排名对于学习某一特定查询的排序功能既具有原则性,又具有实用性。然而,在极其缺乏数据的情况下,特定查询的标签数据数量可能导致高度变化和无效的排名功能。减少数据数量少的一个方法就是利用从语义相似的查询中获取信息。事实上,正如我们在模拟设置和真实数据实例中展示的那样,在对特定查询进行排位时,可以有收益地利用它们。我们从偏差交易的背景来描述和探索这种现象,并将其应用到Bing导航图和Droophila 幼虫连接器的数据记录设置中。