Weighted finite-state automata (WSFAs) are commonly used in NLP. Failure transitions are a useful extension for compactly representing backoffs or interpolation in $n$-gram models and CRFs, which are special cases of WFSAs. The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward algorithm in time $O(|E|)$, where $E$ is the set of transitions. However, this does not allow failure transitions, and preprocessing the WFSA to eliminate failure transitions could greatly increase $|E|$. We extend the backward algorithm to handle failure transitions directly. Our approach is efficient when the average state has outgoing arcs for only a small fraction $s \ll 1$ of the alphabet $\Sigma$. We propose an algorithm for general acyclic WFSAs which runs in $O{\left(|E| + s |\Sigma| |Q| T_\text{max} \log{|\Sigma|}\right)}$, where $Q$ is the set of states and $T_\text{max}$ is the size of the largest connected component of failure transitions. When the failure transition topology satisfies a condition exemplified by CRFs, the $T_\text{max}$ factor can be dropped, and when the weight semiring is a ring, the $\log{|\Sigma|}$ factor can be dropped. In the latter case (ring-weighted acyclic WFSAs), we also give an alternative algorithm with complexity $\displaystyle O{\left(|E| + |\Sigma| |Q| \min(1,s\pi_\text{max}) \right)}$, where $\pi_\text{max}$ is the size of the longest failure path.
翻译:在 NLP 中通常使用 重度 定时自动自定义(WSFAs) 。 失灵过渡是一种有用的延伸, 代表美元模型和通用报告格式的后推或内推, 这些都是 WFSA 的特殊案例。 普通环球世界安全协会的病理和计算有效, 由后演算法来计算 $O( ⁇ E ⁇ ) 美元, 美元是转折的组合。 但是, 这不允许失败过渡, 并且预处理 WFSA 以取消失灵过渡, 可能大大增加 $ $ 。 我们扩展后演算法, 直接处理失灵过渡。 当平均状态只为小部分发行弧时, 我们的方法是高效的。 我们提议一般环球世界安全协会的算法, 美元是左转的 + ⁇ + ⁇ ⁇ ⁇ t ⁇ t ⁇ t} rq_ rq_ ral_rick} 美元, 美元是最重的变速, 美元是正变速的 。