Humans have needs motivating their behavior according to intensity and context. However, we also create preferences associated with each action's perceived pleasure, which is susceptible to changes over time. This makes decision-making more complex, requiring learning to balance needs and preferences according to the context. To understand how this process works and enable the development of robots with a motivational-based learning model, we computationally model a motivation theory proposed by Hull. In this model, the agent (an abstraction of a mobile robot) is motivated to keep itself in a state of homeostasis. We added hedonic dimensions to see how preferences affect decision-making, and we employed reinforcement learning to train our motivated-based agents. We run three agents with energy decay rates representing different metabolisms in two different environments to see the impact on their strategy, movement, and behavior. The results show that the agent learned better strategies in the environment that enables choices more adequate according to its metabolism. The use of pleasure in the motivational mechanism significantly impacted behavior learning, mainly for slow metabolism agents. When survival is at risk, the agent ignores pleasure and equilibrium, hinting at how to behave in harsh scenarios.
翻译:人类需要根据强度和背景来激励他们的行为。 但是, 我们还创造了与每个行动感知的快感相关的偏好, 这很容易随着时间的变化而变化。 这让决策更加复杂, 需要学习如何平衡需求和偏好, 需要根据背景来学习。 要理解这一过程如何运作, 并能够开发具有激励学习模式的机器人, 我们算出赫尔提出的激励理论模型。 在这个模型中, 代理人( 移动机器人的抽象化) 的动机是保持自我自闭状态。 我们添加了偏好的偏好维度, 以了解喜好如何影响决策, 我们利用强化学习来培训我们的动机型代理。 我们运行了三个能量衰减率代表两种不同环境的不同新陈代谢率的代理物, 以观察其对策略、 运动和行为的影响。 结果显示, 代理人在环境中学到了更好的策略, 使得选择更适合其新陈代谢的选项。 在激励机制中, 快乐的使用会严重影响行为学习, 主要是为慢速的代谢剂。 当生存受到威胁时, 代理人忽略了快乐和平衡, 暗示如何在严酷的情景中的行为 。