We consider the problem of synthetically generating data that can closely resemble human decisions made in the context of an interactive human-AI system like a computer game. We propose a novel algorithm that can generate synthetic, human-like, decision making data while starting from a very small set of decision making data collected from humans. Our proposed algorithm integrates the concept of reward shaping with an imitation learning algorithm to generate the synthetic data. We have validated our synthetic data generation technique by using the synthetically generated data as a surrogate for human interaction data to solve three sequential decision making tasks of increasing complexity within a small computer game-like setup. Different empirical and statistical analyses of our results show that the synthetically generated data can substitute the human data and perform the game-playing tasks almost indistinguishably, with very low divergence, from a human performing the same tasks.
翻译:我们考虑合成能够紧密仿真人在交互式人工智能系统(如计算机游戏)中做出的决策的数据的问题。我们提出了一种新的算法,可以从极少量的从人类收集到的决策数据开始生成类人决策数据。我们的算法将奖励塑形的概念与模仿学习算法集成在一起,以生成合成数据。我们已经验证了我们的合成数据生成技术,通过将合成生成的数据作为人机交互数据的替代品来解决三个逐渐复杂的串行决策任务,在一个类似计算机游戏的小型设置中。不同的经验和统计分析结果表明,合成生成的数据可以替代人类数据,在几乎不可区分的情况下完成相同任务,与人类执行相同任务的偏差极小。