Opaque models belonging to the machine learning world are ever more exploited in the most different application areas. These models, acting as black boxes (BB) from the human perspective, cannot be entirely trusted if the application is critical unless there exists a method to extract symbolic and human-readable knowledge out of them. In this paper we analyse a recurrent design adopted by symbolic knowledge extractors for BB regressors - that is, the creation of rules associated with hypercubic input space regions. We argue that this kind of partitioning may lead to suboptimal solutions when the data set at hand is high-dimensional or does not satisfy symmetric constraints. We then propose a (deep) clustering-based approach to be performed before symbolic knowledge extraction to achieve better performance with data sets of any kind.
翻译:属于机器学习世界的不透明模型在最不同的应用领域日益被利用。这些模型从人类的角度作为黑盒(BB),如果应用至关重要,则无法完全信任这些模型,除非有方法从中提取象征性和人类可读的知识。在本文中,我们分析了BB回归器的象征性知识提取器采用的经常性设计,即创建与超立体输入空间区域相关的规则。我们争论说,当手头的数据集是高维或不能满足对称限制时,这种分割可能导致不优化的解决方案。我们然后提议在象征性知识提取之前采用(深)集群方法,以便用任何类型的数据集取得更好的性能。