Early risk diagnosis and driving anomaly detection from vehicle stream are of great benefits in a range of advanced solutions towards Smart Road and crash prevention, although there are intrinsic challenges, especially lack of ground truth, definition of multiple risk exposures. This study proposes a domain-specific automatic clustering (termed Autocluster) to self-learn the optimal models for unsupervised risk assessment, which integrates key steps of risk clustering into an auto-optimisable pipeline, including feature and algorithm selection, hyperparameter auto-tuning. Firstly, based on surrogate conflict measures, indicator-guided feature extraction is conducted to construct temporal-spatial and kinematical risk features. Then we develop an elimination-based model reliance importance (EMRI) method to unsupervised-select the useful features. Secondly, we propose balanced Silhouette Index (bSI) to evaluate the internal quality of imbalanced clustering. A loss function is designed that considers the clustering performance in terms of internal quality, inter-cluster variation, and model stability. Thirdly, based on Bayesian optimisation, the algorithm selection and hyperparameter auto-tuning are self-learned to generate the best clustering partitions. Various algorithms are comprehensively investigated. Herein, NGSIM vehicle trajectory data is used for test-bedding. Findings show that Autocluster is reliable and promising to diagnose multiple distinct risk exposures inherent to generalised driving behaviour. Besides, we also delve into risk clustering, such as, algorithms heterogeneity, Silhouette analysis, hierarchical clustering flows, etc. Meanwhile, the Autocluster is also a method for unsupervised multi-risk data labelling and indicator threshold calibration. Furthermore, Autocluster is useful to tackle the challenges in imbalanced clustering without ground truth or priori knowledge
翻译:早期风险诊断和驱动车辆流异常现象检测,在一系列先进的解决方案中,实现智能路和崩溃预防的早期风险诊断和驱动异常现象检测,具有巨大的好处,尽管存在内在挑战,特别是缺乏地面真实性和运动性风险分析。本研究报告提出一个针对特定域的自动集成(Meded Auto Group),以自我阅读不受监督的风险评估最佳模型,将风险集成的关键步骤纳入可自动操作的管道,包括特性和算法选择、超参数自动调。首先,根据替代冲突措施,进行指标制导特征提取,以构建时间空间和运动性风险等级风险分析。然后,我们开发一种基于消除模式依赖重要性(EMRI)的方法,以不受监督地选择有用的特征。第二,我们提出平衡的Silhouette指数(bSI),以评价不平衡的集群的内部质量。一个损失函数在设计中考虑到内部质量、组合变异和模型稳定性方面的组合性能。第三,基于贝亚的优化、算学选择和超偏直径的自动演算数据,而采用自我定位的NIM-road-roud 数据流是最佳的自我演算法,用来进行最佳的自我分析。