The estimation of cumulative distribution functions (CDF) is an important learning task with a great variety of downstream applications, e.g., risk assessments in predictions and decision making. We study functional regression of contextual CDFs where each data point is sampled from a linear combination of context dependent CDF bases. We propose estimation methods that estimate CDFs accurately everywhere. In particular, given $n$ samples with $d$ bases, we show estimation error upper bounds of $\widetilde O(\sqrt{d/n})$ for fixed design, random design, and adversarial context cases. We also derive matching information theoretic lower bounds, establishing minimax optimality for CDF functional regression. To complete our study, we consider agnostic settings where there is a mismatch in the data generation process. We characterize the error of the proposed estimator in terms of the mismatched error, and show that the estimator is well-behaved under model mismatch.
翻译:累积分配功能(CDF)的估计是一项重要的学习任务,其下游应用种类繁多,例如预测和决策中的风险评估。我们研究了每个数据点从依赖CDF的直线组合基数中抽样的相上下文CDF的功能回归。我们提出了准确估计各地CDF的估计方法。特别是,考虑到以美元为基数的零美元样本,我们发现固定设计、随机设计和对抗性背景案例的美元(sqrt{d/n})的上限估计错误。我们还得出了匹配信息理论下限,为CDF功能回归建立了最小最大最佳性。为了完成我们的研究,我们考虑了在数据生成过程中出现不匹配的不可知性环境。我们用不匹配的错误来描述拟议估算符的错误,并表明估计符的误差在模型不匹配中是很好的。