This paper studies the differences between different types of newspapers in expressing temporal information, which is a topic that has not received much attention. Techniques from the fields of temporal processing and pattern mining are employed to investigate this topic. First, a corpus annotated with temporal information is created by the author. Then, sequences of temporal information tags mixed with part-of-speech tags are extracted from the corpus. The TKS algorithm is used to mine skip-gram patterns from the sequences. With these patterns, the signatures of the four newspapers are obtained. In order to make the signatures uniquely characterize the newspapers, we revise the signatures by removing reference patterns. Through examining the number of patterns in the signatures and revised signatures, the proportion of patterns containing temporal information tags and the specific patterns containing temporal information tags, it is found that newspapers differ in ways of expressing temporal information.
翻译:本文研究不同类型报纸在表达时间信息方面的差异,这是一个没有引起多少注意的专题,使用时间处理和模式采矿领域的技术来调查这个专题。首先,作者制作了附有时间信息的文体,然后从该文体中提取了时间信息标签的顺序,其中含有部分语言标记。TKS算法用于从顺序中挖掘跳格模式。有了这些模式,就获得了四家报纸的签名。为了使这些签名具有独特的特征,我们通过删除参考模式来修改签名。通过审查签名和修订签名的文体数量、包含时间信息标记的文体比例以及包含时间信息标记的具体模式,发现报纸在表达时间信息的方式上存在差异。