We make decisions by reacting to changes in the real world, in particular, the emergence and disappearance of impermanent entities such as events, restaurants, and services. Because we want to avoid missing out on opportunities or making fruitless actions after they have disappeared, it is important to know when entities disappear as early as possible. We thus tackle the task of detecting disappearing entities from microblogs, whose posts mention various entities, in a timely manner. The major challenge is detecting uncertain contexts of disappearing entities from noisy microblog posts. To collect these disappearing contexts, we design time-sensitive distant supervision, which utilizes entities from the knowledge base and time-series posts, for this task to build large-scale Twitter datasets\footnote{We will release the datasets (tweet IDs) used in the experiments to promote reproducibility.} for English and Japanese. To ensure robust detection in noisy environments, we refine pretrained word embeddings of the detection model on microblog streams of the target day. Experimental results on the Twitter datasets confirmed the effectiveness of the collected labeled data and refined word embeddings; more than 70\% of the detected disappearing entities in Wikipedia are discovered earlier than the update on Wikipedia, and the average lead-time is over one month.
翻译:我们通过对现实世界的变化作出反应来作出决定,特别是应对现实世界的变化,特别是事件、餐馆和服务等长期实体的出现和消失。因为我们想要避免错失机会,或者在事件、餐馆和服务等长期实体消失后采取徒劳的行动。因为我们希望避免错失机会,或者知道这些实体消失后何时才会消失,因此我们必须尽早知道这些实体何时消失。我们因此要及时处理从微博客中探测消失的实体的任务,这些小博客的文章提到各种实体。主要的挑战是如何探测从噪音微博客文章中消失的实体的不确定背景。为了收集这些消失的背景,我们设计了时间敏感的远程监督,利用知识库和时序员额的实体,来建立大型的Twitter数据集\ footote{我们将会发布用于实验以促进再生化的数据集(tweet IDs) 。 对于英语和日本人来说,为了确保在噪音环境中进行强有力的检测,我们改进了在目标日微博客流上嵌入检测模型的预先训练过的字。Twitter数据集的实验结果证实了所收集的标签数据的有效性和精细的字嵌的嵌的词嵌入;超过70个月平均更新了在VIBIBIBBIBIBBB中发现的实体是早发现的一个实体。