Rapidly growing online podcast archives contain diverse content on a wide range of topics. These archives form an important resource for entertainment and professional use, but their value can only be realized if users can rapidly and reliably locate content of interest. Search for relevant content can be based on metadata provided by content creators, but also on transcripts of the spoken content itself. Excavating relevant content from deep within these audio streams for diverse types of information needs requires varying the approach to systems prototyping. We describe a set of diverse podcast information needs and different approaches to assessing retrieved content for relevance. We use these information needs in an investigation of the utility and effectiveness of these information sources. Based on our analysis, we recommend approaches for indexing and retrieving podcast content for ad hoc search.
翻译:快速增长的在线播客档案包含广泛主题内容的多样性。 这些档案是娱乐和专业使用的重要资源,但只有用户能够迅速可靠地定位感兴趣的内容,才能实现其价值。 搜索相关内容可以基于内容创建者提供的元数据,也可以基于口头内容本身的记录。 从这些音频流深处挖掘不同类型信息需求的相关内容,需要不同方法处理系统原型。 我们描述了一套不同的播客信息需求和不同方法,以评估检索到的内容的相关性。 我们利用这些信息需求调查这些信息来源的效用和有效性。 根据我们的分析,我们建议了为临时搜索编制索引和检索播客内容的方法。