Temporal expressions in text play a significant role in language understanding and correctly identifying them is fundamental to various retrieval and natural language processing systems. Previous works have slowly shifted from rule-based to neural architectures, capable of tagging expressions with higher accuracy. However, neural models can not yet distinguish between different expression types at the same level as their rule-based counterparts. n this work, we aim to identify the most suitable transformer architecture for joint temporal tagging and type classification, as well as, investigating the effect of semi-supervised training on the performance of these systems. After studying variants of token classification and encoder-decoder architectures, we ultimately present a transformer encoder-decoder model using RoBERTa language model as our best performing system. By supplementing training resources with weakly labeled data from rule-based systems, our model surpasses previous works in temporal tagging and type classification, especially on rare classes. Additionally, we make the code and pre-trained experiment publicly available
翻译:文本中的时间表达方式在语言理解和正确识别这些表达方式方面起着重要作用,对于各种检索和自然语言处理系统来说,这些表达方式至关重要。 以前的作品已经缓慢地从基于规则的建筑转向神经结构,能够以更精确的方式标记表达方式。 然而,神经模型尚无法在与基于规则的对口单位相同的级别上区分不同的表达方式类型。 这项工作的目的是确定最合适的变压器结构,用于联合时间标记和类型分类,以及调查半监督培训对这些系统性能的影响。 在研究了代号分类和编码-编码转换器结构的变异物之后,我们最终以RoBERTA语言模型为最佳性能系统提供了变压器 编码- 解码器模型。 通过利用基于规则的系统贴有微弱标签的数据来补充培训资源,我们的模型超越了先前在时间标记和类型分类方面,特别是在稀有类中的工作。 此外,我们公布了代码和经过预先培训的实验。