We motivate our research with a real-world problem of classifying malicious web domains using a remote service that provides various information. Crucially, some of the information can be further analyzed into a certain depth and this process sequentially creates a tree of hierarchically structured multiple-instance data. Each request sent to the remote service is associated with a cost (e.g., time or another cost per request) and the objective is to maximize the accuracy, constrained with a budget. We present a generic framework able to work with a class of similar problems. Our method is based on Classification with Costly Features (CwCF), Hierarchical Multiple-Instance Learning (HMIL) and hierarchical decomposition of the action space. It works with samples described as partially-observed trees of features of various types (similar to a JSON/XML file), which allows to model data with complex structure. The process is modeled as a Markov Decision Process (MDP), where a state represents acquired features, and actions select yet unknown ones. The policy is trained with deep reinforcement learning and we demonstrate our method with both real-world and synthetic data.
翻译:我们的研究以使用提供各种信息的远程服务对恶意网络域进行分类这一现实世界性问题为动因。 关键是,有些信息可以进一步分析到一定深度,这一过程依次产生一棵按等级结构排列的多因子数据树。 向远程服务发送的每项请求都与成本相关( 例如,每个请求的时间或其他费用), 目标是在预算的限制下最大限度地提高准确性。 我们提出了一个通用框架,能够处理一系列类似的问题。 我们的方法基于成本特性分类(CwCF)、等级多因子学习(HMIL)和动作空间的等级分解。 它与描述为部分观测过的各类特征树的样本( 类似于 JSON/ XML 文件), 可以模拟复杂结构的数据。 这一过程建模为Markov 决策过程( MDP ), 州代表已获得的特征, 并选择了未知的行动。 该政策经过深层次的强化学习, 我们用真实世界和合成数据演示了我们的方法。