具有高维混合变量的线性差异分析 (Linear Discriminant Analysis with High-dimensional Mixed Variables)

Datasets containing both categorical and continuous variables are frequently encountered in many areas, and with the rapid development of modern measurement technologies, the dimensions of these variables can be very high. Despite the recent progress made in modelling high-dimensional data for continuous variables, there is a scarcity of methods that can deal with a mixed set of variables. To fill this gap, this paper develops a novel approach for classifying high-dimensional observations with mixed variables. Our framework builds on a location model, in which the distributions of the continuous variables conditional on categorical ones are assumed Gaussian. We overcome the challenge of having to split data into exponentially many cells, or combinations of the categorical variables, by kernel smoothing, and provide new perspectives for its bandwidth choice to ensure an analogue of Bochner's Lemma, which is different to the usual bias-variance tradeoff. We show that the two sets of parameters in our model can be separately estimated and provide penalized likelihood for their estimation. Results on the estimation accuracy and the misclassification rates are established, and the competitive performance of the proposed classifier is illustrated by extensive simulation and real data studies.

翻译：包含绝对和连续变量的数据集在许多领域经常遇到,随着现代测量技术的迅速发展,这些变量的方方面面可能非常高。尽管最近在为连续变量建立高维数据模型方面取得了进展,但缺乏处理混合变量的方法。为填补这一空白,本文件开发了一种新颖的方法,用混合变量对高维观测进行分类。我们的框架基于一个位置模型,假设以绝对变量为条件的连续变量的分布是高山。我们克服了将数据分解成指数性多的细胞或绝对变量组合的挑战,我们通过内核平滑,为其带宽选择提供了新视角,以确保与通常的偏差取舍取舍不同的Bochner's Lemma的类似。我们表明,我们模型中的两套参数可以分别估算,并为估算提供受罚的可能性。关于估算准确性和分类错误率的结果已经确立,并且通过广泛的模拟和真实数据研究来说明拟议的分类师的竞争性表现。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日