Multimodal named entity recognition and relation extraction (MNER and MRE) is a fundamental and crucial branch in information extraction. However, existing approaches for MNER and MRE usually suffer from error sensitivity when irrelevant object images incorporated in texts. To deal with these issues, we propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction, aiming to achieve more effective and robust performance. Specifically, we regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision. We further propose a dynamic gated aggregation strategy to achieve hierarchical multi-scaled visual features as visual prefix for fusion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance. Code is available in https://github.com/zjunlp/HVPNeT.
翻译:多式名称实体识别和关系提取(MNER和MRE)是信息提取方面一个基本和关键分支,然而,在文本中包含不相干对象图像时,MNER和MRE的现有方法通常会因对错误的敏感度而受到影响。为了处理这些问题,我们提议为视觉强化实体和关系提取(HVPNeT)建立一个新型的等级视觉聚合前置集(HVPNET),目的是实现更有效和更强的性能。具体地说,我们认为视觉表述是可插入的视觉前缀,用以指导对敏感预测决定错误的文字表述。我们进一步提议了一个动态的封闭式汇总战略,以达到等级性的多尺度视觉特征,作为聚合的视觉前缀。关于三个基准数据集的广泛实验显示了我们的方法的有效性,并实现了最先进的性能。我们可在https://github.com/zjunp/HVPNETNT中查阅《守则》。