Sharing medical data between institutions is difficult in practice due to data protection laws and official procedures within institutions. Therefore, most existing algorithms are trained on relatively small electroencephalogram (EEG) data sets which is likely to be detrimental to prediction accuracy. In this work, we simulate a case when the data can not be shared by splitting the publicly available data set into disjoint sets representing data in individual institutions. We propose to train a (local) detector in each institution and aggregate their individual predictions into one final prediction. Four aggregation schemes are compared, namely, the majority vote, the mean, the weighted mean and the Dawid-Skene method. The method was validated on an independent data set using only a subset of EEG channels. The ensemble reaches accuracy comparable to a single detector trained on all the data when sufficient amount of data is available in each institution. The weighted mean aggregation scheme showed best performance, it was only marginally outperformed by the Dawid--Skene method when local detectors approach performance of a single detector trained on all available data.
翻译:由于机构内部的数据保护法和正式程序,各机构之间难以分享医疗数据,因此在实践中很难在机构之间分享医疗数据,因此,大多数现有算法都是在相对较小的电子脑图(EEG)数据集上培训的,这很可能损害预测的准确性。在这项工作中,我们模拟一个无法分享数据的案例,将公开可得的数据集分成代表个别机构数据的不衔接数据集,我们提议在每个机构培训一个(当地)探测器,并将其个别预测汇总成一个最终预测。四个汇总办法比较了多数选票、平均数、加权平均值和达维德-斯肯尼方法。这种方法在独立数据集中验证,只使用EEG频道的一个子。当每个机构掌握足够数量的数据时,该集合的准确性与经过培训的所有数据的单一探测器相当。加权平均汇总办法显示最佳业绩,当当地探测器接近对所有可用数据进行单独检测员的性能时,它只是略为超出Dawid-Skene方法。