In many complex applications, data heterogeneity and homogeneity exist simultaneously. Ignoring either one will result in incorrect statistical inference. In addition, coping with complex data that are non-Euclidean becomes more common. To address these issues we consider a distributional data response additive model in which the response is a distributional density function and the individual effect curves are homogeneous within a group but heterogeneous across groups, the covariates capturing the variation share common additive bivariate functions. A transformation approach is first utilized to map density functions into a linear space. We then apply the B-spline series approximating method to estimate the unknown subject-specific and additive bivariate functions, and identify the latent group structures by hierarchical agglomerative clustering (HAC) algorithm. Our method is demonstrated to identify the true latent group structures with probability approaching one. To improve the efficiency, we further construct the backfitted local linear estimators for grouped structures and additive bivariate functions in post-grouping model. We establish the asymptotic properties of the resultant estimators including the convergence rates, asymptotic distributions and the post-grouping oracle efficiency. The performance of the proposed method is illustrated by simulation studies and empirical analysis with some interesting results.
翻译:在许多复杂的应用中,数据异质性和同质性同时存在。 忽略其中任何一个将会导致不正确的统计推断。 此外, 处理非欧化物的复杂数据会变得更为常见。 为了解决这些问题, 我们考虑一个分布式数据响应添加模型, 该模型的响应是一个分布式密度函数, 个别效应曲线在一个组内是同质的, 但各组之间各有差异, 共变中捕捉变量共享的共同添加性双变函数。 首先使用变异法将密度函数映射成线性空间。 然后我们采用B- spline序列近似法来估计未知的主题和添加性双变函数, 并通过等级的聚集(HAC)算法确定潜在组结构。 我们的方法被证明能够识别真实的潜在群群结构, 其概率接近一, 提高效率, 我们进一步构建组合后模型中组合结构结构和添加性双变异功能的本地线性估计器。 我们用B- 样序列序列序列方法来估计结果特定主题和添加性双变函数的属性, 并用高级集法进行模拟分析, 模拟分析的结果和模拟结果分析, 模拟分析是模拟性分析, 演化结果分析, 演化结果分析, 演算法和演化结果分析, 演算法分析是。