Purpose: To assess Generative Pre-trained Transformer version 4's (GPT-4) ability to classify articles containing functional evidence relevant to assessments of variant pathogenicity. Results: GPT-4 settings and prompts were trained on a set of 45 articles and genetic variants. A final test set of 72 manually classified articles and genetic variants were then processed using two prompts. The prompts asked GPT-4 to supply all functional evidence present in an article for a variant or indicate that no functional evidence is present. For articles with having functional evidence, a second prompt asked GPT-4 to classify the evidence into pathogenic, benign, intermediate, and inconclusive categories. The first prompt identified articles with variant-level functional evidence with 87% sensitivity and 89% positive predictive value (PPV). Five of 26 articles with no functional data were indicated as having functional evidence by GPT-4. For variants with functional assays present as determined by both manual review and GPT-4, the sensitivity and PPV of GPT-4 prompt concordance was: Pathogenic (92% sensitive and 73% PPV), Intermediate or Inconclusive (67% sensitive and 93% PPV), Benign (100% sensitive and 73% PPV). Conclusion: The GPT-4 prompts detected the presence or absence of a functional assay with high sensitivity and PPV, and articles with unambiguous evidence supporting a benign or pathogenic classification with high sensitivity and reasonable PPV. Our prompts detected papers with intermediate or inconclusive evidence with lower sensitivity but high PPV. Our results support that GPT-4 may be useful in variant classification workflows by enabling prioritization of articles for review that are likely to have functional evidence supporting or refuting pathogenicity, but not that GPT-4 is capable of fully automating the genetics literature review component of variant classification.
翻译:暂无翻译