While developments in machine learning led to impressive performance gains on big data, many human subjects data are, in actuality, small and sparsely labeled. Existing methods applied to such data often do not easily generalize to out-of-sample subjects. Instead, models must make predictions on test data that may be drawn from a different distribution, a problem known as \textit{zero-shot learning}. To address this challenge, we develop an end-to-end framework using a meta-learning approach, which enables the model to rapidly adapt to a new prediction task with limited training data for out-of-sample test data. We use three real-world small-scale human subjects datasets (two randomized control studies and one observational study), for which we predict treatment outcomes for held-out treatment groups. Our model learns the latent treatment effects of each intervention and, by design, can naturally handle multi-task predictions. We show that our model performs the best holistically for each held-out group and especially when the test group is distinctly different from the training group. Our model has implications for improved generalization of small-size human studies to the wider population.
翻译:当前,机器学习的发展在海量数据上取得了很大的进展和成果,但实际上,很多人类数据规模很小,标记也稀疏。对于这种数据,现有方法往往无法适用于样本外的预测。因此,模型需要在可能来自不同分布的测试数据上进行预测——这是我们所称的零样本学习问题。 为了解决这一难题,我们采用元学习方法构建了一个端到端框架,使模型能够快速适应新的预测任务,即使只有很少的训练数据用于测试数据。我们使用了三个真实的小样本人类研究数据集(包括两个随机对照研究和一个观察性研究),在这些数据集中,我们预测了保留的治疗组的治疗效果。我们的模型学习了每种干预产生的潜在治疗效果,并且设计上可以自然地处理多任务预测。我们的实验结果表明,在整体预测效果中,我们的模型表现最好,特别是当测试组与训练组有明显差异时。我们的模型对于改善小样本人类研究数据的泛化具有重要意义。