Integrated population models (IPMs) combine multiple ecological data types such as capture-mark-recapture histories, reproduction surveys, and population counts into a single statistical framework. In such models, each data type is generated by a probabilistic submodel, and an assumption of independence between the different data types is usually made. The fact that the same biological individuals can contribute to multiple data types has been perceived as affecting their independence, and several studies have even investigated IPM robustness in this scenario. However, what matters from a statistical perspective is probabilistic independence: the joint probability of observing all data is equal to the product of the likelihoods of the various datasets. Contrary to a widespread perception, probabilistic non-independence does not automatically result from collecting data on the same physical individuals. Conversely, while there can be good reasons for non-independence of IPM submodels arising from sharing of individuals between data types, these relations do not seem to be included in IPMs whose robustness is being investigated. Furthermore, conditional rather than true independence is sometimes assumed. In this conceptual paper, I survey the various independence concepts used in IPMs, try to make sense of them by getting back to first principles in toy models, and show that it is possible to obtain probabilistic independence (or near-independence) despite two or three data types collected on the same set of biological individuals. I then revisit recommendations pertaining to component data collection and IPM robustness checks, and provide some suggestions to bridge the current gap between individual-level IPMs and their population-level approximations using composite likelihoods.
翻译:暂无翻译