The lack of interpretability and transparency are preventing economists from using advanced tools like neural networks in their empirical work. In this paper, we propose a new class of interpretable neural network models that can achieve both high prediction accuracy and interpretability in regression problems with time series cross-sectional data. Our model can essentially be written as a simple function of a limited number of interpretable features. In particular, we incorporate a class of interpretable functions named persistent change filters as part of the neural network. We apply this model to predicting individual's monthly employment status using high-dimensional administrative data in China. We achieve an accuracy of 94.5% on the out-of-sample test set, which is comparable to the most accurate conventional machine learning methods. Furthermore, the interpretability of the model allows us to understand the mechanism that underlies the ability for predicting employment status using administrative data: an individual's employment status is closely related to whether she pays different types of insurances. Our work is a useful step towards overcoming the "black box" problem of neural networks, and provide a promising new tool for economists to study administrative and proprietary big data.
翻译:缺乏可解释性和透明性使得经济学家无法使用神经网络等先进工具。 在本文中,我们建议了一种新的可解释神经网络模型,在时间序列跨部门数据中,在回归问题中实现高预测准确性和可解释性。我们的模型基本上可以作为数量有限的可解释特征的简单功能来写。特别是,我们将一类可解释功能作为神经网络的一部分,称为持续变化过滤器。我们运用这一模型来预测个人月就业状况,使用中国的高维行政数据。我们在模拟测试中实现了94.5%的准确性,这与最准确的常规机器学习方法相类似。此外,模型的可解释性使我们能够理解利用行政数据预测就业状况的能力所依据的机制:个人的就业状况与她是否支付不同类型的保险密切相关。我们的工作是克服神经网络“黑盒”问题的有益步骤,并为经济学家研究行政和专有大数据提供了有希望的新工具。