检测、蒸馏和更新:已获得的DB系统 (Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data)

Machine Learning (ML) is changing DBs as many DB components are being replaced by ML models. One open problem in this setting is how to update such ML models in the presence of data updates. We start this investigation focusing on data insertions (dominating updates in analytical DBs). We study how to update neural network (NN) models when new data follows a different distribution (a.k.a. it is "out-of-distribution" -- OOD), rendering previously-trained NNs inaccurate. A requirement in our problem setting is that learned DB components should ensure high accuracy for tasks on old and new data (e.g., for approximate query processing (AQP), cardinality estimation (CE), synthetic data generation (DG), etc.). This paper proposes a novel updatability framework (DDUp). DDUp can provide updatability for different learned DB system components, even based on different NNs, without the high costs to retrain the NNs from scratch. DDUp entails two components: First, a novel, efficient, and principled statistical-testing approach to detect OOD data. Second, a novel model updating approach, grounded on the principles of transfer learning with knowledge distillation, to update learned models efficiently, while still ensuring high accuracy. We develop and showcase DDUp's applicability for three different learned DB components, AQP, CE, and DG, each employing a different type of NN. Detailed experimental evaluation using real and benchmark datasets for AQP, CE, and DG detail DDUp's performance advantages.

翻译：机器学习 (ML) 正在随着许多 DB 组件被 ML 模型取代而改变 DB 。在这种设置中,一个尚未解决的问题是如何在数据更新的情况下更新这类 ML 模型。我们开始这项调查, 重点是数据插入( 分析 DB 中的主要更新 ) 。当新数据在不同分布( a.k.a. 这是“ 超出分布” - OOD ) 时, 我们研究如何更新神经网络( NNN) 模型, 使以前受过训练的 NNS 不准确。我们的问题设置中的一项要求是, 所学的 DB 组件应该确保旧和新数据( 例如, 近似查询处理( AQP ) 、基点估计( CE ) 、合成数据生成( DDUP 等 ) 。当新数据发布时, DDUP 能为不同的 DB 系统组件提供更新数据, 即使基于不同的 NNS, 而无需高成本将 NNS 从头重新培训。 DDUP 包含两个组成部分: 第一,, 新的, 新的, 新的, 新的, 新的, 统计测试方法来检测 OODODRD 更新高级高级方法, 更新高级的的高级方法, 和更新的高级方法, 新的, 新的, 新的, 新的, 新的, 新的, 新的, 新的, 更新更新 C 新的, 新的, 新的, 更新 CDDDDDDDDDRDRDUDDUDUD 方法, 新的方法, 新的更新新的新的更新更新新的更新新的新的方法, 方法, 新的, 更新新的, 新的, 新的, 新的, 新的, 新的, 方法, 新的, 新的, 新的, 新的, 新的, 新的新的新的新的新的新的新的方法, 方法, 新的新的新的新的方法, 方法, 新的, 新的新的新的, 新的新的新的新的新的新的, 新的, 新的, 新的, 新的, 方法,