This paper describes our winning solution for the ECML-PKDD ChAT Discovery Challenge 2020. We show that whether or not a Twitch user has subscribed to a channel can be well predicted by modeling user activity with boosting trees. We introduce the connection between target-encodings and boosting trees in the context of high cardinality categoricals and find that modeling user activity is more powerful then direct modeling of content when encoded properly and combined with a suitable optimization approach.
翻译:本文描述了我们在2020年ECML-PKDD ChAT发现挑战中获胜的解决方案。 我们显示,通过模拟用户活动,以树为生,可以很好地预测开关用户是否已经订阅了一个频道。 我们介绍了目标编码和树的提振在高基本线直径背景下之间的联系,发现在正确编码并结合适当的优化方法时,模拟用户活动更强大,然后直接对内容进行建模。