In most of the world, causes of death are not recorded. Verbal autopsies are structured interviews with people close to the deceased, which are used to estimate the likelihood of various causes of death. Such estimates typically make use of a table of marginal probabilities, called a `probbase', describing the frequency of answers to each interview question conditional on each cause of death. Assembling probbase tables is challenging, since data labelled with verified causes-of-death are not typically available, and is generally done on the basis of expert opinion. We propose a method to verify or partially learn a probbase table given only a set of verbal autopsy questionnaires (i.e., unlabelled data). Essentially, we assess how well a probbase can be used to impute answers. Our method requires a mild conditional independence assumption on the joint distribution of questionnaire data and causes of death. More generally, our method serves as a means to assess verbal autopsy algorithms and parameters without the need for external cause-of-death labelling. We offer theoretical arguments to support our method, and some brief evaluations on data simulated to resemble realistic verbal autopsy questionnaires. We find moderate promise for the approach in this context, in that we may differentiate probbase values which are too high or too low with around 75% correctness using 1500 verbal autopsy questionnaires. This paper serves as an introduction to our approach and a statement of intent, in the spirit of preregistration. We identify a range of theoretical and practical open problems and describe a planned outline of work to evaluate the method. We invite comments and suggestions on our approach and open questions. We stress that our method has not yet been thoroughly tested and we do not endorse its use in a real-world setting at this stage.
翻译:暂无翻译