In the Machine Learning (ML) literature, a well-known problem is the Dataset Shift problem where, differently from the ML standard hypothesis, the data in the training and test sets can follow different probability distributions, leading ML systems toward poor generalisation performances. This problem is intensely felt in the Brain-Computer Interface (BCI) context, where bio-signals as Electroencephalographic (EEG) are often used. In fact, EEG signals are highly non-stationary both over time and between different subjects. To overcome this problem, several proposed solutions are based on recent transfer learning approaches such as Domain Adaption (DA). In several cases, however, the actual causes of the improvements remain ambiguous. This paper focuses on the impact of data normalisation, or standardisation strategies applied together with DA methods. In particular, using \textit{SEED}, \textit{DEAP}, and \textit{BCI Competition IV 2a} EEG datasets, we experimentally evaluated the impact of different normalization strategies applied with and without several well-known DA methods, comparing the obtained performances. It results that the choice of the normalisation strategy plays a key role on the classifier performances in DA scenarios, and interestingly, in several cases, the use of only an appropriate normalisation schema outperforms the DA technique.
翻译:在机器学习(ML)文献中,一个众所周知的问题是数据集偏移问题,即训练集和测试集中的数据可以遵循不同的概率分布,从而使ML系统朝向糟糕的泛化性能。这个问题在脑机接口(BCI)环境中尤为强烈,其中生物标志物如脑电图(EEG)经常被使用。事实上,EEG信号在时间和不同受试者之间高度不稳定。为了解决这个问题,提出了许多解决方案,其中许多是基于最近的转移学习方法,如领域适应(DA)。然而,在许多情况下,改进的实际原因仍不清楚。本文重点研究在DA方法中应用的数据归一化或标准化策略的影响。特别地,使用SEED、DEAP和BCI Competition IV 2a EEG数据集,我们实验性地评估了不同标准化策略在应用多种著名的DA方法时对性能的影响,并进行了比较。结果表明,在DA情景下,标准化策略的选择对分类器的性能起着关键作用,有趣的是,在许多情况下,仅使用适当的标准化方案超越了DA技术。