In essence, the two tagging methods (direct tagging and tagging with sentences compression) are to tag the information we need by using regular expression which basing on the inherent language patterns of the natural language. Though it has many advantages in extracting regular data, Direct tagging is not applicable to some situations. if the data we need extract is not regular and its surrounding words are regular is relatively regular, then we can use information compression to cut the information we do not need before we tagging the data we need. In this way we can increase the precision of the data while not undermine the recall of the data.
翻译:从本质上说,两种标记方法(直接标记和用句子压缩标签)是使用基于自然语言固有语言模式的常规表达方式标记我们所需要的信息。虽然直接标记在提取常规数据方面有许多优势,但不适用于某些情况。如果我们需要提取的数据不是定期的,其周围的文字是定期的,那么我们可以使用信息压缩来削减我们不需要的信息,然后我们才能标记我们所需要的数据。这样,我们可以提高数据的精确度,同时不影响数据的恢复。