There is a long history of using meta learning as representation learning, specifically for determining the relevance of inputs. In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent---building on a variety of prior work in stochastic approximation, machine learning, and artificial neural networks. In particular, we focus on stochastic meta-descent introduced in the Incremental Delta-Bar-Delta (IDBD) algorithm for setting individual step sizes for each feature of a linear function approximator. Using IDBD, a feature with large or small step sizes will have a large or small impact on generalization from training examples. As a main contribution of this work, we extend IDBD to temporal-difference (TD) learning---a form of learning which is effective in sequential, non i.i.d. problems. We derive a variety of IDBD generalizations for TD learning, demonstrating that they are able to distinguish which features are relevant and which are not. We demonstrate that TD IDBD is effective at learning feature relevance in both an idealized gridworld and a real-world robotic prediction task.
翻译:长期使用元学习作为代表学习,特别是用于确定投入的相关性。在本文件中,我们研究了一个元学习实例,通过调整以前在随机近似、机器学习和人工神经网络方面的各种工作,对随机梯度梯度梯度下下层建设的步数参数进行修改,从而了解了其中的显著相关性。我们特别侧重于在递增的德尔塔-巴-德尔塔(IDBD)中引入的随机随机元值算法,用于为线性函数对准线性函数的每个特征设定单个步数。使用IDBD,一个大步数或小步数的特征将对培训实例的一般化产生很大或小的影响。作为这项工作的主要贡献,我们把IDBD推广到时间偏差(TD),这是一种在连续、非i.d.问题中有效的学习形式。我们得出各种用于TD学习的IDBD总则,表明它们能够辨别哪些特征是相关的,哪些是无效的。我们证明TDIDBD对于一个理想世界和现实世界的预测中学习特征具有有效意义。