We focus on a class of real-world domains, where gathering hierarchical knowledge is required to accomplish a task. Many problems can be represented in this manner, such as network penetration testing, targeted advertising or medical diagnosis. In our formalization, the task is to sequentially request pieces of information about a sample to build the knowledge hierarchy and terminate when suitable. Any of the learned pieces of information can be further analyzed, resulting in a complex and variable action space. We present a combination of techniques in which the knowledge hierarchy is explicitly represented and given to a deep reinforcement learning algorithm as its input. To process the hierarchical input, we employ Hierarchical Multiple-Instance Learning and to cope with the complex action space, we factor it with hierarchical softmax. Our end-to-end differentiable model is trained with A2C, a standard deep reinforcement learning algorithm. We demonstrate the method in a set of seven classification domains, where the task is to achieve the best accuracy with a set budget on the amount of information retrieved. Compared to baseline algorithms, our method achieves not only better results, but also better generalization.
翻译:我们关注的是一组真实世界域, 需要收集等级知识才能完成任务。 许多问题可以以这种方式表现, 比如网络渗透测试、 定向广告或医学诊断。 在我们正规化的过程中, 任务就是按顺序要求关于样本的片段信息, 以构建知识等级, 并在合适的时候终止。 任何学到的信息都可以进一步分析, 从而产生一个复杂和可变的动作空间。 我们展示了各种技术的组合, 知识等级明确代表知识等级, 并赋予深度强化学习算法作为输入。 为了处理等级输入, 我们使用等级多因子学习, 并应对复杂的行动空间, 我们用等级软体来将它作为因素。 我们的端到端差异模型是用A2C来训练的, 一种标准的深度学习算法。 我们用一套七种分类域来展示方法, 在那里的任务是在检索的信息数量上实现最准确的设定预算。 与基线算法相比, 我们的方法不仅能取得更好的结果, 而且还能更好概括化。