Hebbian learning 的实现 Fast Weights

2018 年 7 月 7 日 CreateAMind

https://theneuralperspective.com/2016/12/04/implementation-of-using-fast-weights-to-attend-to-the-recent-past/


https://github.com/GokuMohandas/fast-weights


Implementation of Using Fast Weights to Attend to the Recent Past

Using Fast Weights to Attend to the Recent Past
Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu
NIPS 2016, https://arxiv.org/abs/1610.06258

More details @ https://theneuralperspective.com/2016/12/04/implementation-of-using-fast-weights-to-attend-to-the-recent-past/

Use fast weights to aid in learning associative tasks and store temporary memories of recent past. In a traditional recurrent architecture we have our slow weights which are used to determine the next hidden state and hold long-term memory. We introduce the concept of fast weights, in conjunction with the slow weights, in order to account for short-term knowledge. These weights are quick to update and decay as they change from the introduction of new hidden states.

The overall task of the fast weights is to quickly be able to adjust to recent hidden states while remembering the recent past. You use the fast weights to determine the final hidden state at each time step. Though BPTT does not directly change our fast weights (these fast weights A are unique for each sample actually), BPTT does affect the hidden states states' weights, and recall that fast weights do affects our hidden states (and vice versa). So with training, we affect our slow weights (Wh and Wx) which intern affects how our fast weights are determined. Eventually, they will be determined as to keep track of the recent past so we can create hidden states that lead to the correct answer.

Physiological Motivations

How do we store memories? We don't store memories by keeping track of the exact neural activity that occurred at the time of the memory. Instead, we try to recreate the neural activity through a set of associative weights which can map to many other memories as well. This allows for efficient storage of many memories without storing separate weights for each instance. This associative network also allows for associative learning which is the ability to learn the recall the relationship between initially unrelated instances.(1)





Concept

  • In a traditional recurrent architecture we have our slow weights. These weights are used with the input and the previous hidden state to determine the next hidden state. These weights are responsible for the long-term knowledge of our systems. These weights are updated at the end of a batch, so they are quite slow to update and decay.

  • We introduce the concept of fast weights, in conjunction with the slow weights, in order to account for short-term knowledge. These weights are quick to update and decay as they change from the introduction of new hidden states.

  • For each connection in our network, the total weight is the sum of the results from both the slow and fast weights. The hidden state for each time step for each input is determined by the operations with the slow and fast weights. We use a fast memory weights matrix A to alter the hidden states to keep track of the features required for any associative learning tasks.

  • The fast weights memory matrix, A(t), starts with 0 at the beginning of the sequence. Then all the inputs for the time step are processed and A(t) is updated with a scalar decay with the previous A(t) and the outer product of the hidden state with a scalar operation with learning rate eta.





Notice the last two terms when computing the inner loops next hidden vector. This is just the scalar product of the earlier hidden state vector, h(\tau ), and the current hidden state vector, hs(t+ 1) in the inner loop. So you can think of each iteration as attending to the past hidden vectors in proportion to the similarity with the current inner loop hidden vector.

We do not use this method in our basic implementation because I wanted to explicitly show what the fast weights matrix looks like and having this "memory augmented" view does not really inhibit using minibatches (as you can see). But the problem an explicit fast weights matrix can create is the space issue, so using this efficient implementation will really help us out there.

Note that this 'efficient implementation' will be costly if our sequence length is greater than the hidden state dimensionality. The computations will scale quadratically now because since we need to attend to all previous hidden states with the current inner loop's hidden representation.

Requirements

  • tensorflow (>0.10)

Execution

  • To see the advantage behind the fast weights, Ba et. al. used a very simple toy task.

Given: g1o2k3??g we need to predict 1.

  • You can think of each letter-number pair as a key/value pair. We are given a key at the end and we need to predict the appropriate value. The fast associative memory is required here in order to keep track of the key/value pairs it has just seen and retrieve the proper value given a key. After backpropagation, the fast memory will give us a hidden state vector, for example after g and 1, with a part for g and another part for 1 and learn to associate the two together.

  • Create datasets:

python data_utils.py
  • For training:

python train.py train <model_name: RNN-LN-FW | RNN-LN | CONTROL | GRU-LN >
  • For sampling:

python train.py test <model_name: RNN-LN-FW | RNN-LN | CONTROL | GRU-LN >
  • For plotting results:

python train.py plot

Results

  • Control: RNN without layer normalization (LN) or fast weights (FW) 

Bag of Tricks

  • Initialize slow hidden weights with an identity matrix in RNN to avoid gradient issues.(2)

  • Layer norm is required when using an RNN for convergence.

  • Weights should be properly initialized in order to have unit variance after the dot product, prior to non-linearity or else things can blow up really quickly.

  • Keep track of the gradient norm and tune accordingly.

  • No need to add extra input processing and extra layer after softmax as Jimmy Ba did. (He was doing that simply to compare with another task so it will just add extra computation if you blindly follow that).

Extensions

  • I will be releasing my code comparing fast weights with an attention interface for language related sequence to sequence tasks (in a sep repo).

  • It will also be interesting to compare the computational demands of fast weights compared to LSTM/GRUs and see which one is better for test time.

Citations

  1. Suzuki, Wendy A. "Associative Learning and the Hippocampus." APA. American Psychological Association, Feb. 2005. Web. 04 Dec. 2016.

  2. Hinton, Geoffrey. "FieldsLive Video Archive." Fields Institute for Research in Mathematical Sciences. University of Toronto, 13 Oct. 2016. Web. 04 Dec. 2016.

Author

Goku Mohandas (gokumd@gmail.com)


登录查看更多
2

相关内容

FAST:Conference on File and Storage Technologies。 Explanation:文件和存储技术会议。 Publisher:USENIX。 SIT:http://dblp.uni-trier.de/db/conf/fast/
因果图,Causal Graphs,52页ppt
专知会员服务
253+阅读 · 2020年4月19日
100+篇《自监督学习(Self-Supervised Learning)》论文最新合集
专知会员服务
166+阅读 · 2020年3月18日
深度强化学习策略梯度教程,53页ppt
专知会员服务
184+阅读 · 2020年2月1日
开源书:PyTorch深度学习起步
专知会员服务
51+阅读 · 2019年10月11日
强化学习最新教程,17页pdf
专知会员服务
182+阅读 · 2019年10月11日
17种深度强化学习算法用Pytorch实现
新智元
31+阅读 · 2019年9月16日
Transferring Knowledge across Learning Processes
CreateAMind
29+阅读 · 2019年5月18日
meta learning 17年:MAML SNAIL
CreateAMind
11+阅读 · 2019年1月2日
LibRec 精选:基于LSTM的序列推荐实现(PyTorch)
LibRec智能推荐
50+阅读 · 2018年8月27日
Hierarchical Imitation - Reinforcement Learning
CreateAMind
19+阅读 · 2018年5月25日
【推荐】用Python/OpenCV实现增强现实
机器学习研究会
15+阅读 · 2017年11月16日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
20+阅读 · 2017年11月5日
【推荐】深度学习目标检测全面综述
机器学习研究会
21+阅读 · 2017年9月13日
【推荐】RNN/LSTM时序预测
机器学习研究会
25+阅读 · 2017年9月8日
Continual Unsupervised Representation Learning
Arxiv
7+阅读 · 2019年10月31日
Arxiv
7+阅读 · 2018年12月26日
The Matrix Calculus You Need For Deep Learning
Arxiv
12+阅读 · 2018年7月2日
Arxiv
15+阅读 · 2018年6月23日
Arxiv
8+阅读 · 2018年6月19日
Arxiv
4+阅读 · 2017年7月25日
VIP会员
相关VIP内容
因果图,Causal Graphs,52页ppt
专知会员服务
253+阅读 · 2020年4月19日
100+篇《自监督学习(Self-Supervised Learning)》论文最新合集
专知会员服务
166+阅读 · 2020年3月18日
深度强化学习策略梯度教程,53页ppt
专知会员服务
184+阅读 · 2020年2月1日
开源书:PyTorch深度学习起步
专知会员服务
51+阅读 · 2019年10月11日
强化学习最新教程,17页pdf
专知会员服务
182+阅读 · 2019年10月11日
相关资讯
17种深度强化学习算法用Pytorch实现
新智元
31+阅读 · 2019年9月16日
Transferring Knowledge across Learning Processes
CreateAMind
29+阅读 · 2019年5月18日
meta learning 17年:MAML SNAIL
CreateAMind
11+阅读 · 2019年1月2日
LibRec 精选:基于LSTM的序列推荐实现(PyTorch)
LibRec智能推荐
50+阅读 · 2018年8月27日
Hierarchical Imitation - Reinforcement Learning
CreateAMind
19+阅读 · 2018年5月25日
【推荐】用Python/OpenCV实现增强现实
机器学习研究会
15+阅读 · 2017年11月16日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
20+阅读 · 2017年11月5日
【推荐】深度学习目标检测全面综述
机器学习研究会
21+阅读 · 2017年9月13日
【推荐】RNN/LSTM时序预测
机器学习研究会
25+阅读 · 2017年9月8日
相关论文
Top
微信扫码咨询专知VIP会员