1 https://flyyufelix.github.io/2017/11/17/direct-future-prediction.html
Direct Future Prediction - Supervised Learning for Reinforcement Learning
2 原文https://www.oreilly.com/ideas/reinforcement-learning-for-complex-goals-using-tensorflow,
建议 这一段参考原文
This new formulation changes our neural network in several ways. Instead of just a state, we will also provide as input to the network the current measurements and goal. Instead of Q-values, our network will now output a prediction tensor of the form [Measurements X Actions X Offsets]. Taking the product of the summed predicted future changes and our goals, we can pick actions that best satisfy our goals over time:
量子位的中文翻译: