Traditional auto-parallelizing compilers, reliant on rigid heuristics, struggle with the complexity of modern heterogeneous systems. This paper presents a comprehensive evaluation of small (approximately 1B parameter) language-model-driven compiler auto-parallelization. We evaluate three models: gemma3, llama3.2, and qwen2.5, using six reasoning strategies across 11 real-world kernels drawn from scientific computing, graph algorithms, and machine learning. Our system is benchmarked against strong compiler baselines, including LLVM Polly, TVM, and Triton. Across 376 total evaluations, the proposed approach achieves an average speedup of 6.81x and a peak performance of 43.25x on convolution operations. We analyze scalability, verify correctness using multiple sanitizers, and confirm robustness across diverse compilers and hardware platforms. Our results demonstrate that small, efficient language models can serve as powerful reasoning engines for complex compiler optimization tasks.
翻译:传统的自动并行化编译器依赖僵化的启发式方法,难以应对现代异构系统的复杂性。本文对小型(约10亿参数)语言模型驱动的编译器自动并行化进行了全面评估。我们评估了三种模型:gemma3、llama3.2和qwen2.5,在来自科学计算、图算法和机器学习的11个真实世界内核上使用了六种推理策略。我们的系统与强大的编译器基线(包括LLVM Polly、TVM和Triton)进行了基准测试。在总计376次评估中,所提方法在卷积运算上实现了平均6.81倍的加速和43.25倍的峰值性能。我们分析了可扩展性,使用多种检测工具验证了正确性,并确认了在不同编译器和硬件平台上的鲁棒性。我们的结果表明,小型高效的语言模型能够作为复杂编译器优化任务的强大推理引擎。