行为和文字数据分类法规则-引号 (Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data)

Machine learning models built on behavioral and textual data can result in highly accurate prediction models, but are often very difficult to interpret. Linear models require investigating thousands of coefficients, while the opaqueness of nonlinear models makes things worse. Rule-extraction techniques have been proposed to combine the desired predictive accuracy of complex "black-box" models with global explainability. However, rule-extraction in the context of high-dimensional, sparse data, where many features are relevant to the predictions, can be challenging, as replacing the black-box model by many rules leaves the user again with an incomprehensible explanation. To address this problem, we develop and test a rule-extraction methodology based on higher-level, less-sparse "metafeatures". We empirically validate the quality of the explanation rules in terms of fidelity, stability, and accuracy over a collection of data sets, and benchmark their performance against rules extracted using the fine-grained behavioral and textual features. A key finding of our analysis is that metafeatures-based explanations are better at mimicking the behavior of the black-box prediction model, as measured by the fidelity of explanations.

翻译：以行为和文字数据为基础的机器学习模型可以导致高度准确的预测模型,但往往很难解释。线性模型需要调查数千个系数,而非线性模型的不透明性则使情况更加糟糕。已经提出规则推广技术,将复杂的“黑箱”模型的预期准确性与全球解释结合起来。然而,在高维、稀少的数据中,规则扩展(其中有许多特征与预测有关)可能具有挑战性,因为用许多规则取代黑箱模型,使用户再次得到无法理解的解释。为了解决这一问题,我们根据更高层次、较不偏差的“形形体”模型制定和测试规则扩展方法。我们用经验验证数据集收集的解释性、稳定性和准确性方面的解释性规则的质量,并根据使用精细的动作和文字特征提取的规则衡量其性能。我们分析的主要发现,基于元性的解释在模拟黑箱预测性预测性模型的行为行为上,以真实性衡量。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【清华大学-腾讯】关系提取综述，Review and Outlook for Relation Extraction

专知会员服务

38+阅读 · 2020年4月8日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日