We generalize the problem of reconstructing strings from their substring compositions first introduced by Acharya et al. in 2015 motivated by polymer-based advanced data storage systems utilizing mass spectrometry. Namely, we see strings as labeled path graphs, and as such try to reconstruct labeled graphs. For a given integer t, the subgraph compositions contain either vectors of labels for each connected subgraph of order t (t-multiset-compositions) or the sum of all labels of all connected subgraphs of order t (t-sum-composition). We ask whether, given a graph of which we know the structure and an oracle whom you can query for compositions, we can reconstruct the labeling of the graph. If it is possible, then the graph is reconstructable; otherwise, it is confusable, and two labeled graphs with the same compositions are called equicomposable. We prove that reconstructing through a brute-force algorithm is wildly inefficient, before giving methods for reconstructing several graph classes using as few compositions as possible. We also give negative results, finding the smallest confusable graphs and trees, as well as families with a large number of equicomposable non-isomorphic graphs. An interesting result occurs when twinning one leaf of a path: some paths are confusable, creating a twin out of a leaf sees the graph alternating between reconstructable and confusable depending on the parity of the path, and creating a false twin out of a leaf makes the graph reconstructable using only sum-compositions in all cases.
翻译:暂无翻译