At the same time that AI and machine learning are becoming central to human life, their potential harms become more vivid. In the presence of such drawbacks, a critical question one needs to address before using these data-driven technologies to make a decision is whether to trust their outcomes. Aligned with recent efforts on data-centric AI, this paper proposes a novel approach to address the reliability question through the lens of data by associating data sets with distrust quantification that specifies their scope of use for predicting future query points. The distrust values raise warning signals when a prediction based on a dataset is questionable and are valuable alongside other techniques for trustworthy AI. We propose novel algorithms for efficiently and effectively computing distrust values. Learning the necessary components of the measures from the data itself, our sub-linear algorithms scale to very large and multi-dimensional settings. Furthermore, we design estimators to enable no-data access during the query time. Besides demonstrating the efficiency of our algorithms, our extensive experiments reflect a consistent correlation between distrust values and model performance. This highlights the necessity of dismissing prediction outcomes for cases with high distrust values, at least for critical decisions.
翻译:在AI和机器学习成为人类生活的核心的同时,它们的潜在危害会变得更加生动。在出现这些缺陷的情况下,在使用这些数据驱动的技术来作出决定之前需要解决的一个关键问题是,是否信任其结果。 与最近关于以数据为中心的AI的努力相一致,本文件提出一种新的方法,通过数据透镜处理可靠性问题,将数据集与说明其用于预测未来查询点的使用范围的不信任量化联系起来。不信任值在基于数据集的预测有疑问时会发出警告信号,并与可靠的AI的其他技术一道具有价值。我们提出了高效和有效计算不信任值的新算法。我们从数据本身、我们的亚线性算法规模到非常大和多维环境学习了措施的必要组成部分。此外,我们设计了在查询时间允许无数据访问的估算器。除了表明我们的算法的效率外,我们的广泛实验还反映了不信任值和模型性能之间的一贯关联。这突出表明,对于具有高度不信任值的案件,至少对于关键的决定,必须否定预测结果。