Quantifying the complexity and irregularity of time series data is a primary pursuit across various data-scientific disciplines. Sample entropy (SampEn) is a widely adopted metric for this purpose, but its reliability is sensitive to the choice of its hyperparameters, the embedding dimension $(m)$ and the similarity radius $(r)$, especially for short-duration signals. This paper presents a novel methodology that addresses this challenge. We introduce a Bayesian optimization framework, integrated with a bootstrap-based variance estimator tailored for short signals, to simultaneously and optimally select the values of $m$ and $r$ for reliable SampEn estimation. Through validation on synthetic signal experiments, our approach outperformed existing benchmarks. It achieved a 60 to 90% reduction in relative error for estimating SampEn variance and a 22 to 45% decrease in relative mean squared error for SampEn estimation itself ($p \leq 0.043$). Applying our method to publicly available short-signal benchmarks yielded promising results. Unlike existing competitors, our approach was the only one to successfully identify known entropy differences across all signal sets ($p \leq 0.042$). Additionally, we introduce "EristroPy," an open-source Python package that implements our proposed optimization framework for SampEn hyperparameter selection. This work holds potential for applications where accurate estimation of entropy from short-duration signals is paramount.
翻译:暂无翻译