Datasets are often generated in a sequential manner, where the previous samples and intermediate decisions or interventions affect subsequent samples. This is especially prominent in cases where there are significant human-AI interactions, such as in recommender systems. To characterize the importance of this relationship across samples, we propose to use adversarial attacks on popular evaluation processes. We present sequence-aware boosting attacks and provide a lower bound on the amount of extra information that can be exploited from a confidential test set solely based on the order of the observed data. We use real and synthetic data to test our methods and show that the evaluation process on the MovieLense-100k dataset can be affected by $\sim1\%$ which is important when considering the close competition. Codes are publicly available.
翻译:数据集往往是以顺序方式生成的,以往的样本和中间决定或干预措施都对随后的样本产生影响,这一点在人类-大赦国际之间有重大互动的情况下尤其突出,例如在建议系统中。为了说明这种跨样本关系的重要性,我们提议对大众评价过程使用对抗性攻击。我们提出序列认知攻击,对完全根据观察到的数据顺序从保密测试集中可以利用的额外信息量规定较低的限制。我们使用真实和合成数据测试我们的方法,并表明MoviceLense-100k数据集的评价过程可能受到$\sim1 ⁇ ($)的影响,这对于考虑近距离竞争十分重要。代码是公开的。