The initial years of an infant's life are known as the critical period, during which the overall development of learning performance is significantly impacted due to neural plasticity. In recent studies, an AI agent, with a deep neural network mimicking mechanisms of actual neurons, exhibited a learning period similar to human's critical period. Especially during this initial period, the appropriate stimuli play a vital role in developing learning ability. However, transforming human cognitive bias into an appropriate shaping reward is quite challenging, and prior works on critical period do not focus on finding the appropriate stimulus. To take a step further, we propose multi-stage reinforcement learning to emphasize finding ``appropriate stimulus" around the critical period. Inspired by humans' early cognitive-developmental stage, we use multi-stage guidance near the critical period, and demonstrate the appropriate shaping reward (stage-2 guidance) in terms of the AI agent's performance, efficiency, and stability.
翻译:婴儿生命的最初几年被称为“关键时期”,在此期间,由于神经可塑性,学习成绩的总体发展受到显著影响。在最近的研究中,一个拥有模仿实际神经元的深神经网络机制的AI代理物展示了类似于人类关键时期的学习阶段,特别是在这一初始阶段,适当的刺激物在发展学习能力方面发挥着关键作用。然而,将人类认知偏见转化为适当的塑造奖励是相当困难的,而以前在关键时期的工作的重点不是寻找适当的刺激物。为了进一步采取行动,我们建议多阶段强化学习,强调在关键时期找到“适当的刺激物 ” 。在人类早期认知-发展阶段的启发下,我们在关键时期附近使用多阶段指导物,并展示适当的塑造奖励物(第2阶段指导物 ), 其表现为AI 代理人的绩效、 效率和稳定性。