Target-oriented Opinion Words Extraction (TOWE) is a fine-grained sentiment analysis task that aims to extract the corresponding opinion words of a given opinion target from the sentence. Recently, deep learning approaches have made remarkable progress on this task. Nevertheless, the TOWE task still suffers from the scarcity of training data due to the expensive data annotation process. Limited labeled data increase the risk of distribution shift between test data and training data. In this paper, we propose exploiting massive unlabeled data to reduce the risk by increasing the exposure of the model to varying distribution shifts. Specifically, we propose a novel Multi-Grained Consistency Regularization (MGCR) method to make use of unlabeled data and design two filters specifically for TOWE to filter noisy data at different granularity. Extensive experimental results on four TOWE benchmark datasets indicate the superiority of MGCR compared with current state-of-the-art methods. The in-depth analysis also demonstrates the effectiveness of the different-granularity filters. Our codes are available at https://github.com/TOWESSL/TOWESSL.
翻译:然而,由于数据注释过程昂贵,TOWE的任务仍然缺乏培训数据。标签有限的数据增加了测试数据和培训数据之间分布转移的风险。在本文中,我们提议利用大量未贴标签的数据来降低风险,办法是增加模型对不同分布变化的接触。具体地说,我们提议采用新的多格一致常规化(MGCR)方法来使用未贴标签的数据,并设计两个过滤器,专门供TOWE在不同颗粒度过滤噪音数据。四个TOWE基准数据集的广泛实验结果显示MGCR与目前最先进的方法相比的优势。深入分析还表明不同基因过滤器的有效性。我们的代码可在 https://github.com/TOWESL/TOSSL查阅。