大型语言模型时代下软件工程研究的实证性与可持续性反思 (Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models: A Reflection) - 专知论文

会员服务 ·

0

软件 · 软件工程 · 基准 · 基准测试 · 可复现性 ·

Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models: A Reflection

翻译：大型语言模型时代下软件工程研究的实证性与可持续性反思

David Williams,Max Hort,Maria Kechagia,Aldeida Aleti,Justyna Petke,Federica Sarro

from arxiv, 5 pages, Camera Ready Accepted at ICSE-NIER 2026

Software Engineering (SE) research involving the use of Large Language Models (LLMs) has introduced several new challenges related to rigour in benchmarking, contamination, replicability, and sustainability. In this paper, we invite the research community to reflect on how these challenges are addressed in SE. Our results provide a structured overview of current LLM-based SE research at ICSE, highlighting both encouraging practices and persistent shortcomings. We conclude with recommendations to strengthen benchmarking rigour, improve replicability, and address the financial and environmental costs of LLM-based SE.

翻译：在涉及大型语言模型（LLMs）的软件工程（SE）研究中，基准测试的严谨性、数据污染、可复现性及可持续性等方面涌现出诸多新挑战。本文旨在引导研究界深入思考软件工程领域应对这些挑战的现状。我们的研究结果系统梳理了当前ICSE会议上基于LLM的软件工程研究，既揭示了值得推广的实践模式，也指出了长期存在的不足。最后，我们提出系列建议以增强基准测试的严谨性、提升研究可复现性，并应对基于LLM的软件工程研究产生的财务与环境成本。

0

相关内容

软件（中国大陆及香港用语，台湾作软体，英文：Software）是一系列按照特定顺序组织的计算机数据和指令的集合。一般来讲软件被划分为编程语言、系统软件、应用软件和介于这两者之间的中间件。软件就是程序加文档的集合体。

从自动化到自主性：大型语言模型在科学发现中的应用综述

从自动化到自主性：大型语言模型在科学发现中的应用综述

专知会员服务

23+阅读 · 5月20日

LLM4SR：关于大规模语言模型在科学研究中的应用综述

LLM4SR：关于大规模语言模型在科学研究中的应用综述

专知会员服务

41+阅读 · 1月9日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

知识图谱研究最新综述论文: 表示、获取与应用，A Survey on Knowledge Graphs: Representation, Acquisition and Applications

知识图谱研究最新综述论文: 表示、获取与应用，A Survey on Knowledge Graphs: Representation, Acquisition and Applications

专知会员服务

207+阅读 · 2020年2月16日

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

深度强化学习实验室

19+阅读 · 2020年8月11日

美国DARPA204页可解释人工智能文献综述论文《Explanation in Human-AI Systems》

美国DARPA204页可解释人工智能文献综述论文《Explanation in Human-AI Systems》

人工智能学家

26+阅读 · 2019年2月9日

AAA2019 Tutorial：可解释AI—人工智能的圣杯（160页PPT从理论到动机，应用和局限性）

AAA2019 Tutorial：可解释AI—人工智能的圣杯（160页PPT从理论到动机，应用和局限性）

专知

18+阅读 · 2019年1月28日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

ECCV2018教程146页《对抗机器学习》PPT教程（附PPT下载）

ECCV2018教程146页《对抗机器学习》PPT教程（附PPT下载）

专知

21+阅读 · 2018年9月7日

中国地区生产率差距研究——基于异质性企业、劳动力与产业空间分布的视角

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

企业内正式与非正式网络互动及其对组织适应性影响和权变机理研究：CAS视角的分析

国家自然科学基金

1+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

Making AI Work: An Autoethnography of a Workaround in Higher Education

Arxiv

0+阅读 · 12月24日

Mapping Research Data at the University of Bologna

Arxiv

0+阅读 · 12月22日

Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives

Arxiv

0+阅读 · 12月21日

When Data Quality Issues Collide: A Large-Scale Empirical Study of Co-Occurring Data Quality Issues in Software Defect Prediction

Arxiv

0+阅读 · 12月19日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

153+阅读 · 2023年3月29日

VIP会员

文章信息

相关主题

相关VIP内容

从自动化到自主性：大型语言模型在科学发现中的应用综述

从自动化到自主性：大型语言模型在科学发现中的应用综述

专知会员服务

23+阅读 · 5月20日

LLM4SR：关于大规模语言模型在科学研究中的应用综述

LLM4SR：关于大规模语言模型在科学研究中的应用综述

专知会员服务

41+阅读 · 1月9日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

知识图谱研究最新综述论文: 表示、获取与应用，A Survey on Knowledge Graphs: Representation, Acquisition and Applications

知识图谱研究最新综述论文: 表示、获取与应用，A Survey on Knowledge Graphs: Representation, Acquisition and Applications

专知会员服务

207+阅读 · 2020年2月16日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】数据、决策与过度依赖：构建可信人工智能的核心挑战

《多域时代中维持弹性军事训练：挑战与机遇》

【AAAI2026】专家数量何为最优？面向混合专家模型的语义专业化优化研究

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

相关资讯

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

Distributional Soft Actor-Critic (DSAC)强化学习算法的设计与验证

深度强化学习实验室

19+阅读 · 2020年8月11日

美国DARPA204页可解释人工智能文献综述论文《Explanation in Human-AI Systems》

美国DARPA204页可解释人工智能文献综述论文《Explanation in Human-AI Systems》

人工智能学家

26+阅读 · 2019年2月9日

AAA2019 Tutorial：可解释AI—人工智能的圣杯（160页PPT从理论到动机，应用和局限性）

AAA2019 Tutorial：可解释AI—人工智能的圣杯（160页PPT从理论到动机，应用和局限性）

专知

18+阅读 · 2019年1月28日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

ECCV2018教程146页《对抗机器学习》PPT教程（附PPT下载）

ECCV2018教程146页《对抗机器学习》PPT教程（附PPT下载）

专知

21+阅读 · 2018年9月7日

相关论文

Making AI Work: An Autoethnography of a Workaround in Higher Education

Arxiv

0+阅读 · 12月24日

Mapping Research Data at the University of Bologna

Arxiv

0+阅读 · 12月22日

Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives

Arxiv

0+阅读 · 12月21日

When Data Quality Issues Collide: A Large-Scale Empirical Study of Co-Occurring Data Quality Issues in Software Defect Prediction

Arxiv

0+阅读 · 12月19日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

153+阅读 · 2023年3月29日

相关基金

中国地区生产率差距研究——基于异质性企业、劳动力与产业空间分布的视角

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

企业内正式与非正式网络互动及其对组织适应性影响和权变机理研究：CAS视角的分析

国家自然科学基金

1+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员