The understanding of time expressions includes two sub-tasks: recognition and normalization. In recent years, significant progress has been made in the recognition of time expressions while research on normalization has lagged behind. Existing SOTA normalization methods highly rely on rules or grammars designed by experts, which limits their performance on emerging corpora, such as social media texts. In this paper, we model time expression normalization as a sequence of operations to construct the normalized temporal value, and we present a novel method called ARTime, which can automatically generate normalization rules from training data without expert interventions. Specifically, ARTime automatically captures possible operation sequences from annotated data and generates normalization rules on time expressions with common surface forms. The experimental results show that ARTime can significantly surpass SOTA methods on the Tweets benchmark, and achieves competitive results with existing expert-engineered rule methods on the TempEval-3 benchmark.
翻译:对时间表达方式的理解包括两个子任务:承认和正常化。近年来,在承认时间表达方式方面取得重大进展,而关于正常化的研究却落后于以往。现有的SOTA正常化方法高度依赖专家设计的规则或语法,这些规则或语法限制了专家对新出现的公司(如社交媒体文本)的绩效。在本文中,我们将时间表达方式的正常化作为构建正常时间值的操作序列来模拟,我们提出了一种名为ARTY的新方法,它可以在没有专家干预的情况下,从培训数据中自动产生正常化规则。具体地说,ARTEY自动从附加说明的数据中获取可能的操作顺序,并生成具有共同表表表格式的时间表达方式的正常化规则。实验结果表明,ARTA可以大大超过Tweet基准的SOTA方法,并且与现有的专家设计的TemEval-3基准规则方法取得竞争性的结果。