Employing a large dataset (at most, the order of n = 10^6), this study attempts enhance the literature on the comparison between regression and machine learning (ML)-based rent price prediction models by adding new empirical evidence and considering the spatial dependence of the observations. The regression-based approach incorporates the nearest neighbor Gaussian processes (NNGP) model, enabling the application of kriging to large datasets. In contrast, the ML-based approach utilizes typical models: extreme gradient boosting (XGBoost), random forest (RF), and deep neural network (DNN). The out-of-sample prediction accuracy of these models was compared using Japanese apartment rent data, with a varying order of sample sizes (i.e., n = 10^4, 10^5, 10^6). The results showed that, as the sample size increased, XGBoost and RF outperformed NNGP with higher out-of-sample prediction accuracy. XGBoost achieved the highest prediction accuracy for all sample sizes and error measures in both logarithmic and real scales and for all price bands (when n = 10^5 and 10^6). A comparison of several methods to account for the spatial dependence in RF showed that simply adding spatial coordinates to the explanatory variables may be sufficient.
翻译:这项研究试图通过增加新的经验证据和考虑观测的空间依赖性,加强关于回归和机器学习租金价格预测模型之间比较的文献。基于回归的方法包括最近的邻居高森进程(NNGP)模型,使Kriging能够应用于大型数据集。相比之下,以ML为基础的方法使用典型模型:极端梯度推动(XGBoost)、随机森林(RF)和深神经网络(DNN)。这些模型的超模预测准确性是用日本公寓租金数据作比较的,样本大小不同(即,n=10 ⁇ 4,10 ⁇ 5,10 ⁇ 6)。 结果表明,随着样本规模的扩大,XGBoost和RF的超模化,NNGP的外预测准确性较高。XGBost在所有样本大小和深神经网络(DNNNN)中实现了最高预测准确性,所有样本大小和误差措施都使用日本公寓租金数据进行抽样预测的准确性比较,样本大小有不同的顺序(即,n=10 ⁇ 4,10 ⁇ 5,10 ⁇ 6),所有价格等级的空基度的比标值都显示A=10=10的坐标。