We consider estimation under model misspecification where there is a model mismatch between the underlying system, which generates the data, and the model used during estimation. We propose a model misspecification framework which enables a joint treatment of the model misspecification types of having fake features as well as incorrect covariance assumptions on the unknowns and the noise. We present a decomposition of the output error into components that relate to different subsets of the model parameters corresponding to underlying, fake and missing features. Here, fake features are features which are included in the model but are not present in the underlying system. Under this framework, we characterize the estimation performance and reveal trade-offs between the number of samples, number of fake features, and the possibly incorrect noise level assumption. In contrast to existing work focusing on incorrect covariance assumptions or missing features, fake features is a central component of our framework. Our results show that fake features can significantly improve the estimation performance, even though they are not correlated with the features in the underlying system. In particular, we show that the estimation error can be decreased by including more fake features in the model, even to the point where the model is overparametrized, i.e., the model contains more unknowns than observations.
翻译:我们根据模型误差进行估计,如果基础系统(即生成数据的系统)与估算期间使用的模型之间存在模型不匹配之处,则根据模型误差进行估计; 我们提出一个模型误差框架,以便能够共同处理模型误差类型,即具有假特征的模型误差类型,以及对未知和噪音的不正确的共变假设; 我们将产出误差分解成与模型参数中与基础、假和缺失特征相对应的不同子集相关的组成部分。 这里,假特征是模型中包含但基础系统中没有的特征。 在这个框架内,我们描述估计性能,并揭示样本数量、假特征数量和可能不正确的噪声水平假设之间的利差。 与当前侧重于不正确的共变假设或缺失特征的工作相比,假特征是我们框架的一个核心组成部分。 我们的结果表明,假特征可以大大改进模型的性能,即使它们与基础系统中的特征没有关联。 我们特别表明,通过在模型中包含更多假特征,甚至模型中包含比未知的模型更隐含的模型。 i.e.e. 假特征可以减少估计错误。