The automated translation of C code to Java code is a notoriously difficult task, fraught with challenges stemming from fundamental paradigm shifts (procedural vs. Object Oriented), memory models (manual pointers vs. Garbage Collection), and incompatible data types. This paper investigates the efficacy of 19 small, quantized LLMs (under 20 billion parameters) for the C to Java translation task. We use a novel, hybrid pipeline that leverages Abstract Syntax Trees (ASTs) for semantic decomposition and employs a highly constrained, rule based prompting strategy. The results are stark: a clear multi tiered performance divide emerged. The vast majority of models (Tier 3, e.g., llama3.1, gemma3, starcoder2) failed 100\% of the tests, proving incapable of generating even basic, runnable Java boilerplate. A small middle tier (Tier 2, e.g., mistral-nemo and mistral) produced runnable code but was plagued by dangerous semantic failures and wrong translations. Only three models (Tier 1: phi4, deepseek-coder-v2, codeqwen) proved viable, passing over 50\% of the test suite. Even these top models failed on the most complex C concepts, such as function pointers, sizeof, and enum logic, revealing a hard ceiling for the reasoning capabilities of current quantized models.
翻译:将C代码自动转换为Java代码是一项众所周知的艰巨任务,其挑战源于根本性的范式转换(过程式与面向对象)、内存模型(手动指针与垃圾回收)以及不兼容的数据类型。本文研究了19个小型量化大型语言模型(参数少于200亿)在C到Java翻译任务中的效能。我们采用一种新颖的混合流水线,利用抽象语法树进行语义分解,并采用高度约束的基于规则的提示策略。结果显著:出现了清晰的多层次性能分化。绝大多数模型(第三梯队,如llama3.1、gemma3、starcoder2)在100%的测试中失败,证明其甚至无法生成基本的可运行Java模板代码。一个较小的中间梯队(第二梯队,如mistral-nemo和mistral)生成了可运行代码,但受到危险的语义错误和错误翻译的困扰。仅有三个模型(第一梯队:phi4、deepseek-coder-v2、codeqwen)被证明可行,通过了超过50%的测试集。即使这些顶级模型在最复杂的C概念(如函数指针、sizeof和枚举逻辑)上也失败了,揭示了当前量化模型推理能力的硬性上限。