In this paper, we first situate the challenges for measuring data quality under Project Lighthouse in the broader academic context. We then discuss in detail the three core data quality metrics we use for measurement--two of which extend prior academic work. Using those data quality metrics as examples, we propose a framework, based on machine learning classification, for empirically justifying the choice of data quality metrics and their associated minimum thresholds. Finally we outline how these methods enable us to rigorously meet the principle of data minimization when analyzing potential experience gaps under Project Lighthouse, which we term quantitative data minimization.
翻译:暂无翻译