We introduce a random recursive tree model with two communities, called balanced community modulated random recursive tree, or BCMRT in short. In this setting, pairs of nodes of different type appear sequentially. Each node of the pair decides independently to attach to their own type with probability 1-q, or to the other type with probability q, and then chooses its parent uniformly within the set of existing nodes with the selected type. We find that the limiting degree distributions coincide for different q. Therefore, as far as inference is concerned, other statistics have to be studied. We first consider the setting where the time-labels of the nodes, i.e., their time of arrival, are observed but their type is not. In this setting, we design a consistent estimator for q and provide bounds for the feasibility of testing between two different values of q. Moreover, we show that if q is small enough, then it is possible to cluster the nodes in a way correlated with the true partition, even though the algorithm is exponential in time (in passing, we show that our clustering procedure is intimately connected to the NP-hard problem of minimum fair bisection). In the unlabelled setting, i.e., when only the tree structure is observed, we show that it is possible to test between different values of q in a strictly better way than by random guessing. This follows from a delicate analysis of the sum-of-distances statistic.
翻译:暂无翻译