Negative protein-protein interaction datasets are needed for training and evaluation of interaction prediction methods, as well as validation of high-throughput interaction discovery experiments. In large-scale two-hybrid assays, the direct interaction of a large number of protein pairs is systematically probed. We present a simple method to harness two-hybrid data to obtain negative protein-protein interaction datasets, which we validated using other available experimental data. The method identifies interactions that were likely tested but not observed in a two-hybrid screen. For each negative interaction, a confidence score is defined as the shortest-path length between the two proteins in the interaction network derived from the two-hybrid experiment. We show that these high-quality negative datasets are particularly important when a specific biological context is considered, such as in the study of protein interaction specificity. We also illustrate the use of a negative dataset in the evaluation of the InterPreTS interaction prediction method.
Copyright © 2012 Elsevier Inc. All rights reserved.