Machine learning models may involve decision boundaries that change over time due to updates to rules and regulations, such as in loan approvals or claims management. However, in such scenarios, it may take time for sufficient training data to accumulate in order to retrain the model to reflect the new decision boundaries. While work has been done to reinforce existing decision boundaries, very little has been done to cover these scenarios where decision boundaries of the ML models should change in order to reflect new rules. In this paper, we focus on user-provided feedback rules as a way to expedite the ML models update process, and we formally introduce the problem of pre-processing training data to edit an ML model in response to feedback rules such that once the model is retrained on the pre-processed data, its decision boundaries align more closely with the rules. To solve this problem, we propose a novel data augmentation method, the Feedback Rule-Based Oversampling Technique. Extensive experiments using different ML models and real world datasets demonstrate the effectiveness of the method, in particular the benefit of augmentation and the ability to handle many feedback rules.
翻译:机器学习模式可能涉及因更新规则和条例而随着时间的推移而变化的决定界限,如贷款批准或索偿管理,但是,在这种情形下,可能需要时间积累足够的培训数据,以便积累足够的培训数据,对模型进行再培训,以反映新的决定界限; 虽然为加强现有决定界限已经做了工作,但对于为了反映新规则而改变ML模式的决定界限的假设,却没有做多少工作; 在本文件中,我们侧重于用户提供的反馈规则,作为加快ML模型更新过程的一种方式,我们正式提出预先处理培训数据的问题,以便根据反馈规则编辑ML模型,这样,一旦该模型经过对预处理的数据进行再培训,其决定界限就与规则更加一致。为了解决这一问题,我们提出了新的数据增强方法,即基于反馈规则的过度抽样技术。使用不同的ML模型和真实的世界数据集进行的广泛实验,显示了该方法的有效性,特别是增强的好处和处理许多反馈规则的能力。