数据集大小和长期使用基于生态合作小组的BCI数据库对深学习解码器业绩的影响 (Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance)

In brain-computer interfaces (BCI) research, recording data is time-consuming and expensive, which limits access to big datasets. This may influence the BCI system performance as machine learning methods depend strongly on the training dataset size. Important questions arise: taking into account neuronal signal characteristics (e.g., non-stationarity), can we achieve higher decoding performance with more data to train decoders? What is the perspective for further improvement with time in the case of long-term BCI studies? In this study, we investigated the impact of long-term recordings on motor imagery decoding from two main perspectives: model requirements regarding dataset size and potential for patient adaptation. We evaluated the multilinear model and two deep learning (DL) models on a long-term BCI and Tetraplegia NCT02550522 clinical trial dataset containing 43 sessions of ECoG recordings performed with a tetraplegic patient. In the experiment, a participant executed 3D virtual hand translation using motor imagery patterns. We designed multiple computational experiments in which training datasets were increased or translated to investigate the relationship between models' performance and different factors influencing recordings. Our analysis showed that adding more data to the training dataset may not instantly increase performance for datasets already containing 40 minutes of the signal. DL decoders showed similar requirements regarding the dataset size compared to the multilinear model while demonstrating higher decoding performance. Moreover, high decoding performance was obtained with relatively small datasets recorded later in the experiment, suggesting motor imagery patterns improvement and patient adaptation. Finally, we proposed UMAP embeddings and local intrinsic dimensionality as a way to visualize the data and potentially evaluate data quality.

翻译：在大脑-计算机界面(BCI)研究中,记录数据耗时费钱,限制了对大数据集的访问。这可能会影响BCI系统的业绩,因为机器学习方法在很大程度上取决于培训数据集的大小。出现的重要问题有:考虑到神经信号的特性(如非静止性),我们能否用更多的数据实现更高的解码性能,以更多的数据来培训解码器?在长期BCI研究中,用时间进一步改进数据的视角是什么?在本研究中,我们从两个主要角度调查了长期记录对汽车图像解码的影响:关于数据集的模型尺寸和病人适应实验潜力的模型。我们评估了多线性能模型和两个深度学习(DL)模型,长期BCI和Tecraggregia NCT02550522临床数据数据集包含43次ECOG记录,用四肢病人进行测算;在实验中,一位参与者用相对的图像模型进行了3D虚拟手译。我们设计了多个计算性能改进实验,在其中增加了或翻译了数据设置的数值,以调查模型和40种性能适应的模型之间的关系,而模拟性能分析则显示数据显示数据显示,而数据显示,数据正在显示数据正在显示,数据正在显示,数据正在显示,数据正在显示,数据正在进行更接近性地进行。