A naive application of state-of-the-art bidirectional encoders for streaming sequence tagging would require encoding each token from scratch for each new token in an incremental streaming input (like transcribed speech). The lack of re-usability of previous computation leads to a higher number of Floating Point Operations (or FLOPs) and higher number of unnecessary label flips. Increased FLOPs consequently lead to higher wall-clock time and increased label flipping leads to poorer streaming performance. In this work, we present a Hybrid Encoder with Adaptive Restart (HEAR) that addresses these issues while maintaining the performance of bidirectional encoders over the offline (or complete) inputs while improving performance on streaming (or incomplete) inputs. HEAR has a Hybrid unidirectional-bidirectional encoder architecture to perform sequence tagging, along with an Adaptive Restart Module (ARM) to selectively guide the restart of bidirectional portion of the encoder. Across four sequence tagging tasks, HEAR offers FLOP savings in streaming settings upto 71.1% and also outperforms bidirectional encoders for streaming predictions by upto +10% streaming exact match.
翻译:使用最先进的双向双向编码器进行串流序列标记的天真应用将要求将每个新标记的每个标记从从零开始在递进流流输入(如转录语音)中从每个新标记进行编码。 先前的计算无法重新使用导致浮点操作( 或FLOPs) 数量增加, 以及不必要的标签翻转数量增加。 增加 FLOP 会导致高墙时钟时间, 增加标签翻转导致流性能更差。 在这项工作中, 我们推出一个混合编码器, 与适应再启动( HEAR) 一起解决这些问题, 同时保持双向离线( 或完整) 输入的双向编码器性能, 同时改进流点( 或不完整) 输入的性能。 听有一个混合的单向双向双向编码器结构来进行序列标记, 与一个适应性再启动模块一起有选择地指导重新启用编码器双向流部分的性能。 在四种序列标记任务中, 将 FLOP 节省 FLOP 设置 向上至71. 1%, 和 将 流式双向 将 发送 双向 双向 。