Establishing a good information retrieval system in popular mediums of entertainment is a quickly growing area of investigation for companies and researchers alike. We delve into the domain of information retrieval for podcasts. In Spotify's Podcast Challenge, we are given a user's query with a description to find the most relevant short segment from the given dataset having all the podcasts. Previous techniques that include solely classical Information Retrieval (IR) techniques, perform poorly when descriptive queries are presented. On the other hand, models which exclusively rely on large neural networks tend to perform better. The downside to this technique is that a considerable amount of time and computing power are required to infer the result. We experiment with two hybrid models which first filter out the best podcasts based on user's query with a classical IR technique, and then perform re-ranking on the shortlisted documents based on the detailed description using a transformer-based model.
翻译:在流行娱乐媒介中建立良好的信息检索系统是公司和研究人员都迅速增长的调查领域。 我们深入到播客的信息检索领域。 在Potify的播客挑战中,我们得到用户的询问,从拥有所有播客的给定数据集中找到最相关的短段。 以前的技术包括纯古典信息检索技术,在提出描述性查询时表现不佳。 另一方面,完全依赖大型神经网络的模型往往效果更好。 这一技术的下端是需要相当长的时间和计算能力来推断结果。 我们试验两种混合模型,先用古典IR技术过滤基于用户查询的最佳播客,然后根据使用变压模型的详细描述对短名单文件进行重新排档。