Modeling text-based time-series to make prediction about a future event or outcome is an important task with a wide range of applications. The standard approach is to train and test the model using the same input window, but this approach neglects the data collected in longer input windows between the prediction time and the final outcome, which are often available during training. In this study, we propose to treat this neglected text as privileged information available during training to enhance early prediction modeling through knowledge distillation, presented as Learning using Privileged tIme-sEries Text (LuPIET). We evaluate the method on clinical and social media text, with four clinical prediction tasks based on clinical notes and two mental health prediction tasks based on social media posts. Our results show LuPIET is effective in enhancing text-based early predictions, though one may need to consider choosing the appropriate text representation and windows for privileged text to achieve optimal performance. Compared to two other methods using transfer learning and mixed training, LuPIET offers more stable improvements over the baseline, standard training. As far as we are concerned, this is the first study to examine learning using privileged information for time-series in the NLP context.
翻译:为预测未来事件或结果而建模基于文本的时间序列是一项重要任务,其应用范围很广。标准做法是用同一个输入窗口来培训和测试模型,但这一方法忽视了在预测时间与最后结果之间的较长输入窗口中收集的数据,而这些数据往往是在培训期间可以获得的。在本研究中,我们提议将这一被忽视的文本视为在培训期间可以得到的特有信息,以便通过知识蒸馏(作为使用精密的time-series Text(LuPIET)的学习)加强早期预测。我们评估了临床和社交媒体文本的方法,有4项临床预测任务,还有2项基于社交媒体的心理健康预测任务。我们的结果显示LupiPIET在加强基于文本的早期预测方面是有效的,尽管人们可能需要考虑选择适当的文本表述和特有文本的窗口,以达到最佳绩效。与使用转移学习和混合培训的另外两种方法相比,LuPiET提供了比基线和标准培训更稳定的改进。就我们而言,这是第一次研究使用国家实验室中的时间序列使用特许信息进行学习。