Emulating firmware of microcontrollers is challenging due to the lack of peripheral models. Existing work finds out how to respond to peripheral read operations by analyzing the target firmware. This is problematic because the firmware sometimes does not contain enough clues to support the emulation or even contains misleading information (e.g. buggy firmware). In this work, we propose a new approach that builds peripheral models from the peripheral specification. Using NLP, we translate peripheral behaviors in human language (documented in chip manuals) into a set of structured condition-action rules. By checking, executing, and chaining them at runtime, we can dynamically synthesize a peripheral model for each firmware execution. The extracted condition-action rules might not be complete or even be wrong. We, therefore, propose incorporating symbolic execution to quickly pinpoint the root cause. This assists us in the manual correction of the problematic rules. We have implemented our idea for five popular MCU boards spanning three different chip vendors. Using a new edit-distance-based algorithm to calculate trace differences, our evaluation against a large firmware corpus confirmed that our prototype achieves much higher fidelity compared with state-of-the-art solutions. Benefiting from the accurate emulation, our emulator effectively avoids false positives observed in existing fuzzing work. We also designed a new dynamic analysis method to perform driver code compliance checks against the specification. We found some non-compliance which we later confirmed to be bugs caused by race conditions.
翻译:模拟微型控制器的固态器件由于缺少外围模型而具有挑战性。 现有的工作发现如何通过分析目标固态软件来应对外围阅读操作。 这有问题, 因为固态软件有时没有足够线索来支持模拟或甚至包含误导性信息( 例如 buggy 固态软件 ) 。 在这项工作中, 我们提出一种新的方法, 从外围规格中构建外围模型。 使用 NLP, 我们将人类语言的外围行为( 记录在芯片手册中) 转化为一套结构化的条件行动规则。 通过在运行时检查、 执行和锁链, 我们可以动态地合成每个固态软件执行的外围模型。 提取的条件动作规则可能并不完整, 甚至可能错误。 因此, 我们提议采用象征性执行来快速定位根本原因。 这有助于我们手工修正问题规则。 我们实施了我们关于由三个不同的芯片供应商组成的五种流行的 MCU 董事会的想法。 我们使用新的编辑远程算法来计算差异。 我们用一个大型的固态软件来确认我们原型的不忠实性, 而不是州级的精确度分析。 我们用一种不精确的逻辑方法来进行我们所设计的逻辑分析, 也避免了一种不精确的逻辑化的方法。