Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary sources with unobserved switching-points, which arguably capture an important characteristic of natural language and action-observation sequences in partially observable environments. We show that various types of memory-based neural models, including Transformers, LSTMs, and RNNs can learn to accurately approximate known Bayes-optimal algorithms and behave as if performing Bayesian inference over the latent switching-points and the latent parameters governing the data distribution within each segment.
翻译:基于内存的元学习是近似贝耶斯最佳预测器的一种技术。在比较一般的条件下,以日志损失量度,最大限度地减少顺序预测错误,导致隐含的元学习。这项工作的目的是调查当前序列预测模型和培训制度能够在多大程度上实现这一解释。重点是具有未观测到的切换点的零星固定来源,这些源可能捕捉到部分可观测环境中自然语言和行动观察序列的一个重要特征。我们显示,各种基于内存的神经模型,包括变换器、LSTMS和RNNNs,可以学会准确估计已知的贝耶斯最佳算法,并表现为对潜在切换点和每个部分内数据分布的潜在参数进行贝耶斯式推断。