Generalization, i.e., the ability to adapt to novel scenarios, is the hallmark of human intelligence. While we have systems that excel at recognizing objects, cleaning floors, playing complex games and occasionally beating humans, they are incredibly specific in that they only perform the tasks they are trained for and are miserable at generalization. Could optimizing towards fixed external goals be hindering the generalization instead of aiding it? In this thesis, we present our initial efforts toward endowing artificial agents with a human-like ability to generalize in diverse scenarios. The main insight is to first allow the agent to learn general-purpose skills in a completely self-supervised manner, without optimizing for any external goal.
To be able to learn on its own, the claim is that an artificial agent must be embodied in the world, develop an understanding of its sensory input (e.g., image stream) and simultaneously learn to map this understanding to its motor outputs (e.g., torques) in an unsupervised manner. All these considerations lead to two fundamental questions: how to learn rich representations of the world similar to what humans learn?; and how to re-use such a representation of past knowledge to incrementally adapt and learn more about the world similar to how humans do? We believe prediction is the key to this answer. We propose generic mechanisms that employ prediction as a supervisory signal in allowing the agents to learn sensory representations as well as motor control. These two abilities equip an embodied agent with a basic set of general-purpose skills which are then later repurposed to perform complex tasks.
We discuss how this framework can be instantiated to develop curiosity-driven agents (virtual as well as real) that can learn to play games, learn to walk, and learn to perform real-world object manipulation without any rewards or supervision. These self-supervised robotic agents, after exploring the environment, can generalize to find their way in office environments, tie knots using rope, rearrange object configuration, and compose their skills in a modular fashion.