In recent years, Large Language Models (LLMs) have emerged as transformative tools across numerous domains, impacting how professionals approach complex analytical tasks. This systematic mapping study comprehensively examines the application of LLMs throughout the Data Science lifecycle. By analyzing relevant papers from Scopus and IEEE databases, we identify and categorize the types of LLMs being applied, the specific stages and tasks of the data science process they address, and the methodological approaches used for their evaluation. Our analysis includes a detailed examination of evaluation metrics employed across studies and systematically documents both positive contributions and limitations of LLMs when applied to data science workflows. This mapping provides researchers and practitioners with a structured understanding of the current landscape, highlighting trends, gaps, and opportunities for future research in this rapidly evolving intersection of LLMs and data science.
翻译:近年来,大语言模型(LLMs)已成为众多领域的变革性工具,深刻影响了专业人员处理复杂分析任务的方式。本系统性图谱研究全面考察了LLMs在整个数据科学生命周期中的应用。通过分析Scopus和IEEE数据库中的相关文献,我们识别并分类了当前应用的LLM类型、它们所针对的数据科学流程的具体阶段与任务,以及用于评估这些模型的方法学途径。我们的分析包含了对各项研究中所采用评估指标的详细考察,并系统性地记录了大语言模型应用于数据科学工作流时所产生的积极贡献与现有局限。本图谱为研究人员与实践者提供了对当前研究格局的结构化理解,突出了这一快速演进的大语言模型与数据科学交叉领域的研究趋势、空白以及未来机遇。