Group number selection is a key question for group panel data modelling. In this work, we develop a cross validation method to tackle this problem. Specifically, we split the panel data into a training dataset and a testing dataset on the time span. We first use the training dataset to estimate the parameters and group memberships. Then we apply the fitted model to the testing dataset and then the group number is estimated by minimizing certain loss function values on the testing dataset. We design the loss functions for panel data models either with or without fixed effects. The proposed method has two advantages. First, the method is totally data-driven thus no further tuning parameters are involved. Second, the method can be flexibly applied to a wide range of panel data models. Theoretically, we establish the estimation consistency by taking advantage of the optimization property of the estimation algorithm. Experiments on a variety of synthetic and empirical datasets are carried out to further illustrate the advantages of the proposed method.
翻译:组号选择是小组小组数据建模的一个关键问题。 在这一工作中,我们开发了一个交叉验证方法来解决这个问题。 具体地说, 我们将小组数据分成一个培训数据集和一个时间跨度测试数据集。 我们首先使用培训数据集来估计参数和组群成员资格。 然后, 我们将适合的模型应用到测试数据集中, 然后通过将测试数据集的某些损失函数值最小化来估计组号。 我们为具有固定效果或没有固定效果的小组数据模型设计损失函数。 拟议方法有两个优点。 首先, 该方法完全是数据驱动的,因此不涉及进一步的调试参数。 其次, 该方法可以灵活地应用于广泛的小组数据模型。 从理论上讲, 我们利用估算算法的优化属性来确定估算的一致性。 对各种合成和经验数据集进行实验,以进一步说明拟议方法的优点。