The practice of continuous deployment has enabled companies to reduce time-to-market by increasing the rate at which software can be deployed. However, deploying more frequently bears the risk that occasionally defective changes are released. For Internet companies, this has the potential to degrade the user experience and increase user abandonment. Therefore, quality control gates are an important component of the software delivery process. These are used to build confidence in the reliability of a release or change. Towards this end, a common approach is to perform a canary test to evaluate new software under production workloads. Detecting defects as early as possible is necessary to reduce exposure and to provide immediate feedback to the developer. We present a statistical framework for rapidly detecting regressions in software deployments. Our approach is based on sequential tests of stochastic order and of equality in distribution. This enables canary tests to be continuously monitored, permitting regressions to be rapidly detected while strictly controlling the false detection probability throughout. The utility of this approach is demonstrated based on two case studies at Netflix.
翻译:连续部署的做法使公司能够通过提高软件部署速度来缩短时间到市场,不过,部署更频繁地承担偶尔出现有缺陷的变化的风险。对于互联网公司来说,这有可能降低用户的经验,增加用户的放弃。因此,质量控制门户是软件交付过程的一个重要组成部分,用于建立对释放或变化可靠性的信心。为此,一个共同的做法是进行罐头测试,以评价生产工作量下的新软件。尽早发现缺陷是必要的,以便减少接触并向开发商提供即时反馈。我们提出了一个统计框架,用于迅速发现软件部署中的倒退。我们的方法是以连续测试随机顺序和分布平等为基础,从而能够不断监测罐头测试,允许快速检测回归,同时严格控制整个过程中的错误检测概率。这一方法的有用性在Netflix的两个案例研究中得到了证明。