Grasping by a robot in unstructured environments is deemed a critical challenge because of the requirement for effective adaptation to a wide variation in object geometries, material properties, and other environmental factors. In this paper, we propose a novel framework for robotic grasping based on the idea of compressing high-dimensional target and gripper features in a common latent space using a set of autoencoders. Our approach simplifies grasping by using three autoencoders dedicated to the target, the gripper, and a third one that fuses their latent representations. This allows the RL agent to achieve higher learning rates at the initial stages of exploration of a new environment, as well as at non-zero shot grasp attempts. The agent explores the latent space of the third autoencoder for better quality grasp without explicit reconstruction of objects. By implementing the PoWER algorithm into the RL training process, updates on the agent's policy will be made through the perturbation in the reward-weighted latent space. The successful exploration efficiently constrains both position and pose integrity for feasible executions of grasps. We evaluate our system on a diverse set of objects, demonstrating the high success rate in grasping with minimum computational overhead. We found that approach enhances the adaptation of the RL agent by more than 35 \% in simulation experiments.
 翻译:暂无翻译