小型语言模型作为编译器专家：面向异构系统的自动并行化 (Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems) - 专知论文

会员服务 ·

0

编译器 · 系统 · 自动并行化 · 并行 · 并行化 ·

Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems

翻译：小型语言模型作为编译器专家：面向异构系统的自动并行化

Prathamesh Devadiga

from arxiv, Accepted at NeurIPS 2025 ML for Systems Workshop

Traditional auto-parallelizing compilers, reliant on rigid heuristics, struggle with the complexity of modern heterogeneous systems. This paper presents a comprehensive evaluation of small (approximately 1B parameter) language-model-driven compiler auto-parallelization. We evaluate three models: gemma3, llama3.2, and qwen2.5, using six reasoning strategies across 11 real-world kernels drawn from scientific computing, graph algorithms, and machine learning. Our system is benchmarked against strong compiler baselines, including LLVM Polly, TVM, and Triton. Across 376 total evaluations, the proposed approach achieves an average speedup of 6.81x and a peak performance of 43.25x on convolution operations. We analyze scalability, verify correctness using multiple sanitizers, and confirm robustness across diverse compilers and hardware platforms. Our results demonstrate that small, efficient language models can serve as powerful reasoning engines for complex compiler optimization tasks.

翻译：传统的自动并行化编译器依赖僵化的启发式方法，难以应对现代异构系统的复杂性。本文对小型（约10亿参数）语言模型驱动的编译器自动并行化进行了全面评估。我们评估了三种模型：gemma3、llama3.2和qwen2.5，在来自科学计算、图算法和机器学习的11个真实世界内核上使用了六种推理策略。我们的系统与强大的编译器基线（包括LLVM Polly、TVM和Triton）进行了基准测试。在总计376次评估中，所提方法在卷积运算上实现了平均6.81倍的加速和43.25倍的峰值性能。我们分析了可扩展性，使用多种检测工具验证了正确性，并确认了在不同编译器和硬件平台上的鲁棒性。我们的结果表明，小型高效的语言模型能够作为复杂编译器优化任务的强大推理引擎。

0

相关内容

编译器

编译器（Compiler），是一种计算机程序，它会将用某种编程语言写成的源代码（原始语言），转换成另一种编程语言（目标语言）。

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【华南理工大学】无监督多类域自适应:理论、算法和实践，Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice

【华南理工大学】无监督多类域自适应:理论、算法和实践，Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice

专知会员服务

32+阅读 · 2020年2月26日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知

10+阅读 · 2021年4月14日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器学习研究会

12+阅读 · 2017年12月24日

组合测试用例优先排序算法及选择策略研究

国家自然科学基金

9+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于代数规约的Web服务在线测试理论和技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于模型驱动的并发建模语言Apla+设计及其可靠性研究

国家自然科学基金

3+阅读 · 2014年12月31日

可重构的环境自适应RS码软判决译码器研究

国家自然科学基金

0+阅读 · 2014年12月31日

Bootstrapping LLMs via Preference-Based Policy Optimization

Arxiv

0+阅读 · 12月24日

MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment

Arxiv

0+阅读 · 12月24日

Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis

Arxiv

0+阅读 · 12月22日

Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems

Arxiv

0+阅读 · 12月19日

Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving

Arxiv

0+阅读 · 12月18日

VIP会员

文章信息

相关主题

自动并行化

相关VIP内容

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【华南理工大学】无监督多类域自适应:理论、算法和实践，Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice

【华南理工大学】无监督多类域自适应:理论、算法和实践，Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice

专知会员服务

32+阅读 · 2020年2月26日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

【书籍】从零开始构建文本生成图像生成器：基于 Transformers 与扩散模型

人工智能与未来指挥

【伯克利博士论文】将大语言模型绑定至虚拟人格：实现人类行为模拟

稀疏自编码器综述：解释大语言模型的内部机制

相关资讯

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知

10+阅读 · 2021年4月14日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器学习研究会

12+阅读 · 2017年12月24日

相关论文

Bootstrapping LLMs via Preference-Based Policy Optimization

Arxiv

0+阅读 · 12月24日

MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment

Arxiv

0+阅读 · 12月24日

Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis

Arxiv

0+阅读 · 12月22日

Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems

Arxiv

0+阅读 · 12月19日

Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving

Arxiv

0+阅读 · 12月18日

相关基金

组合测试用例优先排序算法及选择策略研究

国家自然科学基金

9+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于代数规约的Web服务在线测试理论和技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于模型驱动的并发建模语言Apla+设计及其可靠性研究

国家自然科学基金

3+阅读 · 2014年12月31日

可重构的环境自适应RS码软判决译码器研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员