Tuberculosis (TB), an infectious bacterial disease, is a significant cause of death, especially in low-income countries, with an estimated ten million new cases reported globally in $2020$. While TB is treatable, non-adherence to the medication regimen is a significant cause of morbidity and mortality. Thus, proactively identifying patients at risk of dropping off their medication regimen enables corrective measures to mitigate adverse outcomes. Using a proxy measure of extreme non-adherence and a dataset of nearly $700,000$ patients from four states in India, we formulate and solve the machine learning (ML) problem of early prediction of non-adherence based on a custom rank-based metric. We train ML models and evaluate against baselines, achieving a $\sim 100\%$ lift over rule-based baselines and $\sim 214\%$ over a random classifier, taking into account country-wide large-scale future deployment. We deal with various issues in the process, including data quality, high-cardinality categorical data, low target prevalence, distribution shift, variation across cohorts, algorithmic fairness, and the need for robustness and explainability. Our findings indicate that risk stratification of non-adherent patients is a viable, deployable-at-scale ML solution.
翻译:肺结核是一种传染性细菌疾病,是一种传染性细菌疾病,是造成死亡的重要原因,特别是在低收入国家,全球报告的新病例估计有1 000万新病例,为202020美元。尽管肺结核是可以治疗的,但不遵守药物疗法是发病和死亡的重要原因。因此,主动查明有放弃药物疗法危险的病人,有助于采取纠正措施,减轻不良后果。我们采用极端不遵守的代用措施和来自印度四个邦的近70万美元病人的数据集,制定和解决机器学习问题,即根据按定级标准及早预测不遵守的问题。我们培训ML模型并对照基线进行评估,在基于规则的基线基础上实现100美元升至100%,在随机分类中达到214美元,同时考虑到全国范围的大规模未来部署情况。我们处理这一进程中的各种问题,包括数据质量、高心率绝对数据、低目标流行率、分布变化、各组之间差异、算法公平性、以及需要稳健和解释性。我们的调查结果表明,部署不可靠的ML风险是可行的解决办法。