This paper introduces the 2nd place solution for the Riiid! Answer Correctness Prediction in Kaggle, the world's largest data science competition website. This competition was held from October 16, 2020, to January 7, 2021, with 3395 teams and 4387 competitors. The main insights and contributions of this paper are as follows. (i) We pointed out existing Transformer-based models are suffering from a problem that the information which their query/key/value can contain is limited. To solve this problem, we proposed a method that uses LSTM to obtain query/key/value and verified its effectiveness. (ii) We pointed out 'inter-container' leakage problem, which happens in datasets where questions are sometimes served together. To solve this problem, we showed special indexing/masking techniques that are useful when using RNN-variants and Transformer. (iii) We found additional hand-crafted features are effective to overcome the limits of Transformer, which can never consider the samples older than the sequence length.
翻译:本文介绍了世界最大的数据科学竞争网站Kagle的 Riiid! 回答正确性预测第二位解决方案。 这场竞争从2020年10月16日至2021年1月7日, 共有3395个团队和4387个竞争者。 本文的主要见解和贡献如下。 (一) 我们指出,现有的基于变异器模型存在一个问题,而其查询/钥匙/价值可能包含的信息是有限的。 为了解决这一问题,我们提出了一个方法,用LSTM获取查询/钥匙/价值并核实其有效性。 (二) 我们指出“ 内容器渗漏问题, 发生于有时一起解决问题的数据集中。 为了解决这个问题,我们展示了特殊的索引/制成技术,在使用RNN- 变异器和变异器时是有用的。 (三) 我们发现更多手工制作的特性能够有效克服变异器的极限, 变异器的样品永远不能考虑比序列长度长的样品。