From the ad network standpoint, a user's activity is a multi-type sequence of temporal events consisting of event types and time intervals. Understanding user patterns in ad networks has received increasing attention from the machine learning community. Particularly, the problems of fraud detection, Conversion Rate (CVR), and Click-Through Rate (CTR) prediction are of interest. However, the class imbalance between major and minor classes in these tasks can bias a machine learning model leading to poor performance. This study proposes using two multi-type (continuous and discrete) training approaches for GANs to deal with the limitations of traditional GANs in passing the gradient updates for discrete tokens. First, we used the Reinforcement Learning (RL)-based training approach and then, an approximation of the multinomial distribution parameterized in terms of the softmax function (Gumble-Softmax). Our extensive experiments based on synthetic data have shown the trained generator can generate sequences with desired properties measured by multiple criteria.
翻译:从广告网络的角度来看,用户的活动是一个由事件类型和时间间隔组成的多类型时间事件序列。理解网络中的用户模式已经日益受到机器学习界的注意。 特别是,欺诈检测、转换率和点击浏览率的预测问题引起了人们的兴趣。 然而,这些任务中主要和次要类别之间的阶级不平衡可能偏向机器学习模式,导致工作表现不佳。本研究报告提议对GAN采用两种多类型(连续和离散)的培训方法,处理传统的GAN在通过离散标志的梯度更新时的局限性。 首先,我们采用了基于强化学习(RL)的培训方法,然后以软负函数(Gumble-Softmax)为参数的多重分布参数的近似值。我们根据合成数据进行的广泛实验显示,经过培训的发电机能够产生符合多种标准要求的序列。