We investigate how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. Existing approaches to zero-shot sequence labeling do not perform well when applied on transformer-based architectures. As transformers contain multiple layers of multi-head self-attention, information in the sentence gets distributed between many tokens, negatively affecting zero-shot token-level performance. We find that a soft attention module which explicitly encourages sharpness of attention weights can significantly outperform existing methods.
翻译:我们调查了如何在没有任何直接监督的情况下将判决级变压器改造为象征性的有效序列标签。 现有的零点序列标签办法在应用于变压器结构时效果不佳。 由于变压器包含多层多头自省, 句子中的信息会在许多牌子之间分布, 从而对零点信号级的性能产生消极影响。 我们发现, 明确鼓励关注权重锐化的软关注模块会大大超过现有方法。