【1】World Models

【URL】http://arxiv.org/abs/1803.10122

【Time】2018-05-09

一、研究领域

强化学习,世界模型

二、研究动机

着眼于训练大型神经网络来处理强化学习任务,方法是将智能体划分为大型世界模型和小型控制器模型。为强化学习的 environments 构建的生成模型称为世界模型。

“The image of the world around us, which we carry in our head, is just a model. Nobody in his head imagines all the world, government or country. He has only selected concepts, and relationships between them, and uses those to represent the real system. (Forrester, 1971)” 我们头脑中的周围世界的形象只是一个模型。没有人会想象整个世界、政府或国家。他只选择了概念以及它们之间的关系,并用它们来代表真实的系统。 (福雷斯特,1971)

三、方法与技术

整体架构:视觉感知模块、记忆模块、决策模块。其中视觉感知模块、记忆模块共同构成世界模型

(1)Vision:用VAE实现,输入是图片,目的是学习agent看到的视觉信息的压缩表示

(2)Memory:用MDN-RNN实现,输入是压缩后的视觉信息,目的是预测未来

(3)Controller:实现得尽可能小,并且与V、M分开训练,保证复杂性尽可能在world model中

Untitled

V、M、C 三个模块与环境的交互方式:

Figure 8. Flow diagram of our Agent model. The raw observation is first processed by V at each time step t to produce zt. The input into C is this latent vector zt concatenated with M’s hidden state ht at each time step. C will then output an action vector at for motor control, and will affect the environment. M will then take the current zt and action at as an input to update its own hidden state to produce ht+1 to be used at time t + 1.

Figure 8. Flow diagram of our Agent model. The raw observation is first processed by V at each time step t to produce zt. The input into C is this latent vector zt concatenated with M’s hidden state ht at each time step. C will then output an action vector at for motor control, and will affect the environment. M will then take the current zt and action at as an input to update its own hidden state to produce ht+1 to be used at time t + 1.

四、总结

评价很高的World Model 的开山之作(毕竟LSTM之父)