Weak signal learning (WSL) is a common challenge in many fields like fault diagnosis, medical imaging, and autonomous driving, where critical information is often masked by noise and interference, making feature identification difficult. Even in tasks with abundant strong signals, the key to improving model performance often lies in effectively extracting weak signals. However, the lack of dedicated datasets has long constrained research. To address this, we construct the first specialized dataset for weak signal feature learning, containing 13,158 spectral samples. It features low SNR dominance (over 55% samples with SNR below 50) and extreme class imbalance (class ratio up to 29:1), providing a challenging benchmark for classification and regression in weak signal scenarios. We also propose a dual-view representation (vector + time-frequency map) and a PDVFN model tailored to low SNR, distribution skew, and dual imbalance. PDVFN extracts local sequential features and global frequency-domain structures in parallel, following principles of local enhancement, sequential modeling, noise suppression, multi-scale capture, frequency extraction, and global perception. This multi-source complementarity enhances representation for low-SNR and imbalanced data, offering a novel solution for WSL tasks like astronomical spectroscopy. Experiments show our method achieves higher accuracy and robustness in handling weak signals, high noise, and extreme class imbalance, especially in low SNR and imbalanced scenarios. This study provides a dedicated dataset, a baseline model, and establishes a foundation for future WSL research.
翻译:弱信号学习(WSL)是故障诊断、医学成像和自动驾驶等众多领域面临的共同挑战,其中关键信息常被噪声和干扰所掩盖,导致特征识别困难。即使在强信号丰富的任务中,提升模型性能的关键也往往在于有效提取弱信号。然而,专用数据集的缺乏长期制约着相关研究。为此,我们构建了首个专用于弱信号特征学习的公开数据集,包含13,158个光谱样本。该数据集以低信噪比样本为主(超过55%的样本SNR低于50),且存在极端的类别不平衡(类别比例高达29:1),为弱信号场景下的分类与回归任务提供了一个具有挑战性的基准。我们还提出了一种双视图表示(向量+时频图)以及一个针对低SNR、分布偏斜和双重不平衡问题设计的PDVFN模型。PDVFN遵循局部增强、序列建模、噪声抑制、多尺度捕获、频率提取和全局感知的原则,并行提取局部序列特征与全局频域结构。这种多源互补性增强了对低SNR与不平衡数据的表征能力,为天文光谱学等WSL任务提供了一种新颖的解决方案。实验表明,我们的方法在处理弱信号、高噪声和极端类别不平衡时,尤其在低SNR与不平衡场景下,取得了更高的准确性与鲁棒性。本研究提供了一个专用数据集、一个基准模型,并为未来的WSL研究奠定了基础。