Most data sets comprise of measurements on continuous and categorical variables. In regression and classification Statistics literature, modeling high-dimensional mixed predictors has received limited attention. In this paper we study the general regression problem of inferring on a variable of interest based on high dimensional mixed continuous and binary predictors. The aim is to find a lower dimensional function of the mixed predictor vector that contains all the modeling information in the mixed predictors for the response, which can be either continuous or categorical. The approach we propose identifies sufficient reductions by reversing the regression and modeling the mixed predictors conditional on the response. We derive the maximum likelihood estimator of the sufficient reductions, asymptotic tests for dimension, and a regularized estimator, which simultaneously achieves variable (feature) selection and dimension reduction (feature extraction). We study the performance of the proposed method and compare it with other approaches through simulations and real data examples.
翻译:多数数据集包含对连续和绝对变量的测量。在回归和分类统计文献中,高维混合预测物的模型化得到的关注有限。在本文件中,我们研究了基于高维混合连续和二元预测物对利益变量进行推论的一般回归问题。目的是找到混合预测物矢量的较低维功能,其中含有用于响应的混合预测体中的所有建模信息,这些信息可以是连续的,也可以是绝对的。我们建议的方法通过逆转回归物和以响应为条件的混合预测物的模型化来确定足够的减排量。我们得出了足够减排的最大可能性估计值,对维度进行零位测试,以及一个常规的估测器,同时实现变量(地)选择和减少维度(地精)。我们研究拟议方法的性能,并通过模拟和真实数据实例与其他方法进行比较。