TVOR方法不适用于USHMM 数据外部识别 (Inapplicability of the TVOR method to USHMM Data Outlier Identification)

Recent paper "TVOR: Finding Discrete Total Variation Outliers Among Histograms" [arXiv:2012.11574] introduces the Total Variation Outlier Recognizer (TVOR) method for identification of outliers among a given set of histograms. The method relies on comparing the smoothness of each given histogram, given by its discrete total variation, to those of other histograms in the dataset, with the underlying assumption that most histograms in the data set should be of similar smoothness. The paper concludes by applying the TVOR model to histograms of ages of Holocaust victims produced using United States Holocaust Memorial Museum (USHMM) data, and purports to identify the list of victims of the Jasenovac concentration camp as potentially suspicious. In this paper, we show that the TVOR model and its assumptions are grossly inapplicable to the considered dataset. Namely, the dataset does not satisfy the model's critical assumption of the shared smoothness between distributions of the victims' ages across lists, the model is biased in assigning a higher outlier score to histograms of larger sizes, and the dataset has not been reviewed to remove obvious data processing errors, leading to duplication of hundreds of thousands of entries when performing the data analysis.

翻译：最近的论文“ TVOR: 在直方图中找到分辨的完全挥发性外向图像” [arXiv: 2012.11574] 介绍了在一组直方图中识别异常点的全变异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异色异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异异