Data Scientists leverage common sense reasoning and domain knowledge to understand and enrich data for building predictive models. In recent years, we have witnessed a surge in tools and techniques for {\em automated machine learning}. While data scientists can employ various such tools to help with model building, many other aspects such as {\em feature engineering} that require semantic understanding of concepts, remain manual to a large extent. In this paper we discuss important shortcomings of current automated data science solutions and machine learning. We discuss how leveraging basic semantic reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation. Moreover, semantics can assist data scientists in a new manner by helping with challenges related to {\em trust}, {\em bias}, and {\em explainability}.
翻译:数据科学家利用常识推理和领域知识来理解和丰富数据,以建立预测模型。近年来,我们目睹了用于 ~em 自动机器学习的工具和技术的激增。虽然数据科学家可以使用各种工具来帮助模型建设,但其他许多方面,如 ~em 特征工程} 在很大程度上,还需要对概念进行语义学理解,仍然保持手动。在本文中,我们讨论了当前自动化数据科学解决方案和机器学习的重大缺陷。我们讨论了如何利用数据的基本语义推理,结合数据科学自动化的新工具,帮助数据科学自动化工具实现一致和可解释的数据增强和转换。此外,语义学可以通过帮助应对与 ~em Trust}, ~em 偏向, 和 ~可解释性有关的挑战,以新的方式帮助数据科学家。