Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such `structured missingness' raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here, we outline the current literature and propose a set of grand challenges in learning from data with structured missingness.
翻译:缺失数据是许多机器学习任务中不可避免的问题。当数据“随机缺失”时,存在一系列处理方法和技术可用。然而,随着机器学习研究变得更加雄心勃勃,试图从不断增长的异构数据中学习,越来越多地遇到缺失值出现关联或结构的问题,无论是显性还是隐性。这种“结构缺失”引发了一系列尚未系统解决的挑战,并成为大规模机器学习的根本障碍。在此,我们概述了当前文献,并提出了一组关于学习具有结构缺失数据的大挑战。