Whether future AI models make the world safer or less safe for humans rests in part on our ability to efficiently collect accurate data from people about what they want the models to do. However, collecting high quality data is difficult, and most AI/ML researchers are not trained in data collection methods. The growing emphasis on data-centric AI highlights the potential of data to enhance model performance. It also reveals an opportunity to gain insights from survey methodology, the science of collecting high-quality survey data. In this position paper, we summarize lessons from the survey methodology literature and discuss how they can improve the quality of training and feedback data, which in turn improve model performance. Based on the cognitive response process model, we formulate specific hypotheses about the aspects of label collection that may impact training data quality. We also suggest collaborative research ideas into how possible biases in data collection can be mitigated, making models more accurate and human-centric.
翻译:暂无翻译