Developers are increasingly using services such as Dependabot to automate dependency updates. However, recent research has shown that developers perceive such services as unreliable, as they heavily rely on test coverage to detect conflicts in updates. To understand the prevalence of tests exercising dependencies, we calculate the test coverage of direct and indirect uses of dependencies in 521 well-tested Java projects. We find that tests only cover 58% of direct and 20% of transitive dependency calls. By creating 1,122,420 artificial updates with simple faults covering all dependency usages in 262 projects, we measure the effectiveness of test suites in detecting semantic faults in dependencies; we find that tests can only detect 47% of direct and 35% of indirect artificial faults on average. To increase reliability, we investigate the use of change impact analysis as a means of reducing false negatives; on average, our tool can uncover 74% of injected faults in direct dependencies and 64% for transitive dependencies, nearly two times more than test suites. We then apply our tool in 22 real-world dependency updates, where it identifies three semantically conflicting cases and five cases of unused dependencies. Our findings indicate that the combination of static and dynamic analysis should be a requirement for future dependency updating systems.
翻译:开发者正在越来越多地使用依赖性假冒等服务来自动更新依赖性。然而,最近的研究表明,开发者认为这类服务不可靠,因为他们严重依赖测试范围来检测更新中的冲突。为了了解进行依赖性测试的普及程度,我们计算了521个经过良好测试的爪哇项目中直接和间接使用依赖性的测试范围。我们发现,测试仅涉及直接依赖性电话的58%和过渡性依赖性电话的20%。通过在262个项目中创建1 122 420个简单错误的人工更新,覆盖所有依赖性使用,我们测量测试套装在发现依赖性缺陷方面的有效性;我们发现测试只能平均检测47%的直接和35%的间接人为缺陷。为了提高可靠性,我们调查使用变化影响分析作为减少假负值的一种手段;我们发现,平均而言,我们的工具可以发现74%的直接依赖性注射缺陷和64%的过渡依赖性依赖性电话,比测试套几乎高出2倍。我们随后在22个实体性依赖性更新中应用了我们的工具,我们发现测试套工具只能探测出平均47%的直接和35%的间接人为缺陷;为了提高可靠性,我们的未来的可靠性分析应当显示三个的动态的可靠性,而依赖性需要的组合。