来自代表性不足的中东人群的前列腺活检全切片图像数据集 (Prostate biopsy whole slide image dataset from an underrepresented Middle Eastern population)

Peshawa J. Muhammad Ali,Navin Vincent,Saman S. Abdulla,Han N. Mohammed Fadhl,Anders Blilie,Kelvin Szolnoky,Julia Anna Mielcarz,Xiaoyi Ji,Kimmo Kartasalo,Abdulbasit K. Al-Talabani,Nita Mulliqi

from arxiv, 13 pages, 2 figures and 1 table

Artificial intelligence (AI) is increasingly used in digital pathology. Publicly available histopathology datasets remain scarce, and those that do exist predominantly represent Western populations. Consequently, the generalizability of AI models to populations from less digitized regions, such as the Middle East, is largely unknown. This motivates the public release of our dataset to support the development and validation of pathology AI models across globally diverse populations. We present 339 whole-slide images of prostate core needle biopsies from a consecutive series of 185 patients collected in Erbil, Iraq. The slides are associated with Gleason scores and International Society of Urological Pathology grades assigned independently by three pathologists. Scanning was performed using two high-throughput scanners (Leica and Hamamatsu) and one compact scanner (Grundium). All slides were de-identified and are provided in their native formats without further conversion. The dataset enables grading concordance analyses, color normalization, and cross-scanner robustness evaluations. Data will be deposited in the Bioimage Archive (BIA) under accession code: to be announced (TBA), and released under a CC BY 4.0 license.

翻译：人工智能（AI）在数字病理学中的应用日益广泛。公开可用的组织病理学数据集仍然稀缺，且现有数据主要代表西方人群。因此，AI模型对于中东等数字化程度较低地区人群的泛化能力在很大程度上尚不明确。这促使我们公开本数据集，以支持针对全球多样化人群的病理学AI模型的开发与验证。我们提供了来自伊拉克埃尔比勒连续收集的185名患者的339张前列腺穿刺活检全切片图像。这些切片关联了由三位病理学家独立判定的Gleason评分和国际泌尿病理学会分级。扫描使用了两台高通量扫描仪（Leica和Hamamatsu）及一台紧凑型扫描仪（Grundium）完成。所有切片均已去标识化，并以原始格式提供，未进行进一步转换。该数据集支持分级一致性分析、颜色归一化及跨扫描仪鲁棒性评估。数据将存放于生物图像档案库（BIA），登录号为待公布（TBA），并以CC BY 4.0许可协议发布。