Autonomous vehicles are continually increasing their presence on public roads. However, before any new autonomous driving software can be approved, it must first undergo a rigorous assessment of driving quality. These quality evaluations typically focus on estimating the frequency of (undesirable) behavioral events. While rate estimation would be straight-forward with complete data, in the autonomous driving setting this estimation is greatly complicated by the fact that \textit{detecting} these events within large driving logs is a non-trivial task that often involves human reviewers. In this paper we outline a \textit{streaming partial tiered event review} configuration that ensures both high recall and high precision on the events of interest. In addition, the framework allows for valid streaming estimates at any phase of the data collection process, even when labels are incomplete, for which we develop the maximum likelihood estimate and show it is unbiased. Constructing honest and effective confidence intervals (CI) for these rate estimates, particularly for rare safety-critical events, is a novel and challenging statistical problem due to the complexity of the data likelihood. We develop and compare several CI approximations, including a novel Gamma CI method that approximates the exact but intractable distribution with a weighted sum of independent Poisson random variables. There is a clear trade-off between statistical coverage and interval width across the different CI methods, and the extent of this trade-off varies depending on the specific application settings (e.g., rare vs. common events). In particular, we argue that our proposed CI method is the best-suited when estimating the rate of safety-critical events where guaranteed coverage of the true parameter value is a prerequisite to safely launching a new ADS on public roads.
翻译:暂无翻译