How can we make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve the predictions generated via software analytics. Sometimes, very fast hyperparameter optimization can be achieved by "DODGE-ing"; i.e. simply steering way from settings that lead to similar conclusions. But when is it wise to use that simple approach and when must we use more complex (and much slower) optimizers?} To answer this, we applied hyperparameter optimization to 120 SE data sets that explored bad smell detection, predicting Github issue close time, bug report analysis, defect prediction, and dozens of other non-SE problems. We find that the simple DODGE works best for data sets with low "intrinsic dimensionality" (u ~ 3) and very poorly for higher-dimensional data (u > 8). Nearly all the SE data seen here was intrinsically low-dimensional, indicating that DODGE is applicable for many SE analytics tasks.
翻译:如何使软件分析更简单、更快? 一种方法是将分析的复杂性与正在探索的数据的内在复杂性相匹配。 例如,超参数优化器为数据矿工找到控制设置,从而改进通过软件分析产生的预测。 有时,非常快速的超参数优化可以通过“DODGE-ing”实现; 也就是简单地从导致类似结论的设置中引向方向。 但是,当使用这一简单方法明智时,当我们必须使用更复杂(和慢得多)的优化器时? }我们用超参数优化法对120个SE数据集进行了调查,这些数据集探索了坏气味检测,预测了Github的近距离、错误报告分析、缺陷预测以及数十个其他非SE问题。 我们发现,简单的DDGE对低“内在维度”(u~ 3)和高维数据(u > 8)的数据集最有效。 这里看到的所有SEEGE数据都是内在的低度,表明DGE适用于许多SE分析任务。