This paper addresses acoustic vehicle counting using one-channel audio. We predict the pass-by instants of vehicles from local minima of clipped vehicle-to-microphone distance. This distance is predicted from audio using a two-stage (coarse-fine) regression, with both stages realised via neural networks (NNs). Experiments show that the NN-based distance regression outperforms by far the previously proposed support vector regression. The $ 95\% $ confidence interval for the mean of vehicle counting error is within $[0.28\%, -0.55\%]$. Besides the minima-based counting, we propose a deep learning counting that operates on the predicted distance without detecting local minima. Although outperformed in accuracy by the former approach, deep counting has a significant advantage in that it does not depend on minima detection parameters. Results also show that removing low frequencies in features improves the counting performance.
 翻译:本文用单声道音频计算音响飞行器。 我们从车到麦克风距离的当地小微米中预测车辆路过瞬时。 使用两阶段( 粗微) 回归从音频中预测距离, 两阶段都通过神经网络( NNS) 实现。 实验显示, NN 的距离回归远比先前提议的支持矢量回归远。 车辆计时误差的置信区间为$95 美元。 除了基于小数的计时外, 我们建议对预测距离进行深度的计算,而不探测当地迷你。 尽管前一方法的准确性高于前一方法, 深度计数有很大的优势在于它不依赖于迷你检测参数。 结果还显示, 去除低频率的功能提高了计数的性能。