The availability of debug information for optimized executables can largely ease crucial tasks such as crash analysis. Source-level debuggers use this information to display program state in terms of source code, allowing users to reason on it even when optimizations alter program structure extensively. A few recent endeavors have proposed effective methodologies for identifying incorrect instances of debug information, which can mislead users by presenting them with an inconsistent program state. In this work, we identify and study a related important problem: the completeness of debug information. Unlike correctness issues for which an unoptimized executable can serve as reference, we find there is no analogous oracle to deem when the cause behind an unreported part of program state is an unavoidable effect of optimization or a compiler implementation defect. In this scenario, we argue that empirically derived conjectures on the expected availability of debug information can serve as an effective means to expose classes of these defects. We propose three conjectures involving variable values and study how often synthetic programs compiled with different configurations of the popular gcc and LLVM compilers deviate from them. We then discuss techniques to pinpoint the optimizations behind such violations and minimize bug reports accordingly. Our experiments revealed, among others, 24 bugs already confirmed by the developers of the gcc-gdb and clang-lldb ecosystems.
翻译:为优化可执行文件提供调试信息,可以大大缓解崩溃分析等关键任务。 源级调试器使用这种信息来显示源代码方面的程序状态,使用户即使在优化大规模改变程序结构时也能对此进行解释。 最近的一些努力提出了有效的方法,用以识别不正确的调试信息案例,这可能会以不一致的程序状态误导用户。在这项工作中,我们发现并研究一个相关的重要问题:调试信息的完整性。与未优化可执行文件可用作参考的正确性问题不同,我们发现,在未报告的某部分程序状态背后的原因成为优化或汇编者实施缺陷的不可避免的影响时,不存在类似或触礁。在这种情形下,我们指出,根据经验得出的调试信息预期可获得性预测可以作为暴露这些缺陷类别的有效手段。我们提出了三种假设,涉及变量,并研究与流行的 gcc 和 LLLVM 编译者的不同配置的合成程序如何经常被汇编为参考,我们随后讨论了如何精确优化生态系统中的某个部分是优化或汇编者实施缺陷。