Multiple systems estimation is a standard approach to quantifying hidden populations where data sources are based on lists of known cases. A typical modelling approach is to fit a Poisson loglinear model to the numbers of cases observed in each possible combination of the lists. It is necessary to decide which interaction parameters to include in the model, and information criterion approaches are often used for model selection. Difficulties in the context of multiple systems estimation may arise due to sparse or nil counts based on the intersection of lists, and care must be taken when information criterion approaches are used for model selection due to issues relating to the existence of estimates and identifiability of the model. Confidence intervals are often reported conditional on the model selected, providing an over-optimistic impression of the accuracy of the estimation. A bootstrap approach is a natural way to account for the model selection procedure. However, because the model selection step has to be carried out for every bootstrap replication, there may be a high or even prohibitive computational burden. We explore the merit of modifying the model selection procedure in the bootstrap to look only among a subset of models, chosen on the basis of their information criterion score on the original data. This provides large computational gains with little apparent effect on inference. Another model selection approach considered and investigated is a downhill search approach among models, possibly with multiple starting points.
翻译:多模型估计是一种量化基于已知案例列表的隐藏人口的标准方法。典型的建模方法是对每个列表组合中观察到的案例数量拟合泊松对数线性模型。必须决定在模型中包括哪些交互参数,并且通常使用信息准则方法进行模型选择。由于基于列表交集的稀疏或零计数而导致的困难可能会出现在多模型估计的情况下,并且由于存在估计和模型可识别性问题,因此在使用信息准则方法进行模型选择时必须小心。通常在选定模型的条件下报告置信区间,从而提供有关估计准确性的过分乐观印象。引导法是解决模型选择过程的自然方法。但是,由于必须为每个引导式重复进行模型选择步骤,因此可能存在高甚至不能承受的计算负担。我们探讨了修改引导式中的模型选择过程的优点,以仅在基于原始数据的信息准则得分选择的一组模型中查找。这提供了大的计算收益,对推理几乎没有影响。还考虑并研究了一种下山式搜索方法,以在模型之间进行选择,可能具有多个起点。