An exciting recent development is the uptake of deep neural networks in many scientific fields, where the main objective is outcome prediction with the black-box nature. Significance testing is promising to address the black-box issue and explore novel scientific insights and interpretation of the decision-making process based on a deep learning model. However, testing for a neural network poses a challenge because of its black-box nature and unknown limiting distributions of parameter estimates while existing methods require strong assumptions or excessive computation. In this article, we derive one-split and two-split tests relaxing the assumptions and computational complexity of existing black-box tests and extending to examine the significance of a collection of features of interest in a dataset of possibly a complex type such as an image. The one-split test estimates and evaluates a black-box model based on estimation and inference subsets through sample splitting and data perturbation. The two-split test further splits the inference subset into two but require no perturbation. Also, we develop their combined versions by aggregating the p-values based on repeated sample splitting. By deflating the bias-sd-ratio, we establish asymptotic null distributions of the test statistics and the consistency in terms of Type II error. Numerically, we demonstrate the utility of the proposed tests on seven simulated examples and six real datasets. Accompanying this paper is our Python library dnn-inference (https://dnn-inference.readthedocs.io/en/latest/) that implements the proposed tests.
翻译:令人振奋的近期发展是在许多科学领域建立深层神经网络,主要目标是用黑盒性质进行结果预测,其中主要目标是用黑盒性质进行结果预测。重要测试很有希望解决黑盒问题,并探索基于深层学习模式的决策进程的新科学见解和解释。然而,神经网络的测试因其黑盒性质和参数估计分布限制程度未知而构成挑战,而现有方法则需要强有力的假设或过度计算。在本篇文章中,我们得出一线和两线测试,以放松现有黑盒测试的假设和计算复杂性,并扩展以研究在可能具有复杂类型(如图象)的数据集中收集兴趣特征的重要性。一线测试估算并评估基于估算和推断的黑盒模型子集。两线测试进一步将推断子分解为两个,但不需要再作任何扭曲。此外,我们通过将我们基于反复抽样的纸箱测试的P-价值组合组合来发展这些测试组合版本。通过淡化的图象性测试和图象性地展示了纸质二号/图表的准确性测试,这是纸质测试的准确性测试。