To unfold the tremendous amount of multimedia data uploaded daily to social media platforms, effective topic modeling techniques are needed. Existing work tends to apply topic models on written text datasets. In this paper, we propose a topic extractor on video transcripts. Exploiting neural word embeddings through graph-based clustering, we aim to improve usability and semantic coherence. Unlike most topic models, this approach works without knowing the true number of topics, which is important when no such assumption can or should be made. Experimental results on the real-life multimodal dataset MuSe-CaR demonstrates that our approach GraphTMT extracts coherent and meaningful topics and outperforms baseline methods. Furthermore, we successfully demonstrate the applicability of our approach on the popular Citysearch corpus.
翻译:为了展示每天向社交媒体平台上上传的大量多媒体数据,需要有效的主题模型技术。现有工作倾向于在书面文本数据集中应用专题模型。在本文中,我们提议在视频誊本上使用专题摘录。通过基于图形的集群来利用神经字嵌入,我们的目标是提高可用性和语义一致性。与大多数专题模型不同,这种方法在不了解专题的真实数量的情况下起作用,而当无法或不应作出这种假设时,这些专题是十分重要的。关于实际存在的多式联运数据集MuSe-CaR的实验结果表明,我们的方法GapTMT提取了连贯和有意义的专题,并超越了基线方法。此外,我们还成功地展示了我们在大众城市搜索中采用的方法。