The identification of essential proteins can help in understanding the minimum requirements for cell survival and development. Network-based centrality approaches are commonly used to identify essential proteins from protein-protein interaction networks (PINs). Unfortunately, these approaches are limited by the poor quality of the underlying PIN data. To overcome this problem, researchers have focused on the prediction of essential proteins by combining PINs with other biological data. In this paper, we proposed a network refinement method based on module discovery and biological information to obtain a higher quality PIN. First, to extract the maximal connected subgraph in the PIN and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules based on the homology information, subcellular localization information and topology information within each module, and to construct a more refined network (CM-PIN). To evaluate the effectiveness of the proposed method, we used 10 typical network-based centrality methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR) to compare the overall performance of the CM-PIN with those the refined dynamic protein network (RD-PIN). The experimental results showed that the CM-PIN was optimal in terms of precision-recall curve, jackknife curve and other criteria, and can help to identify essential proteins more accurately.
翻译:暂无翻译