We introduce a data-driven, model-agnostic technique for generating a human-interpretable summary of the salient points of contrast within an evolving dynamical system, such as the learning process of a control agent. It involves the aggregation of transition data along both spatial and temporal dimensions according to an information-theoretic divergence measure. A practical algorithm is outlined for continuous state spaces, and deployed to summarise the learning histories of deep reinforcement learning agents with the aid of graphical and textual communication methods. We expect our method to be complementary to existing techniques in the realm of agent interpretability.
翻译:我们采用了一种数据驱动、模型-不可知技术,对不断变化的动态系统(例如控制剂的学习过程)中的显著对比点进行人文解释总结,根据信息理论差异计量,将空间和时间层面的过渡数据汇总在一起,为连续性国家空间绘制实用算法,利用图形和文字通信方法总结深强化学习剂的学习历史,我们期望我们的方法能够补充代理人可解释性领域的现有技术。