Developers relax restrictions on a type to reuse methods with other types. While type casts are prevalent, in weakly typed languages such as C++, they are also extremely permissive. Assignments where a source expression is cast into a new type and assigned to a target variable of the new type, can lead to software bugs if performed without care. In this paper, we propose an information-theoretic approach to identify poor implementations of explicit cast operations. Our approach measures accord between the source expression and the target variable using conditional entropy. We collect casts from 34 components of the Chromium project, which collectively account for 27MLOC and random-uniformly sample this dataset to create a manually labelled dataset of 271 casts. Information-theoretic vetting of these 271 casts achieves a peak precision of 81% and a recall of 90%. We additionally present the findings of an in-depth investigation of notable explicit casts, two of which were fixed in recent releases of the Chromium project.
翻译:开发人员放宽对类型的限制以重用其他类型的方法。在类型转换常见的情况下,对于诸如C++这样的弱类型语言,它们也非常宽容。如果在不加注意地情况下执行将源表达式转换为新类型并分配给新类型的目标变量的赋值,可能会导致软件错误。在本文中,我们提出了一种信息理论方法来识别显式转换操作的低效实现。我们的方法使用条件熵来测量源表达式和目标变量之间的一致性。我们收集了Chromium项目的34个组件中的强制类型转换,这些组件共计有27MLOC,并通过随机均匀抽样创建了一个手动标记的数据集,包含271个强制类型转换。对这271个强制类型转换的信息理论判断达到了81%的精度和90%的召回率。我们还提供了对显着的显式转换进行深入调查的结果,其中两个在最近的Chromium项目版本中已经得到修复。