The essential task of Topic Detection and Tracking (TDT) is to organize a collection of news media into clusters of stories that pertain to the same real-world event. To apply TDT models to practical applications such as search engines and discovery tools, human guidance is needed to pin down the scope of an "event" for the corpus of interest. In this work in progress, we explore a human-in-the-loop method that helps users iteratively fine-tune TDT algorithms so that both the algorithms and the users themselves better understand the nature of the events. We generate a visual overview of the entire corpus, allowing the user to select regions of interest from the overview, and then ask a series of questions to affirm (or reject) that the selected documents belong to the same event. The answers to these questions supplement the training data for the event similarity model that underlies the system.
翻译:专题探测和跟踪(TDT)的基本任务是组织一批新闻媒介集,编集与同一真实世界事件有关的故事。为了将TDT模型应用于搜索引擎和发现工具等实际应用,需要人的指导来缩小“活动”的范围,以吸引人们的兴趣。在这项工作中,我们探索了一种“人到行”方法,帮助用户反复微调TDT算法,使算法和用户本身都更好地了解事件的性质。我们生成了整个内容的视觉概览,允许用户从概览中选择感兴趣的区域,然后提出一系列问题来确认(或拒绝)所选文件属于同一活动。这些问题的答案补充了系统所基于的类似事件模型的培训数据。