The ability to generate hypotheses based upon the contents of large-scale, heterogeneous data sets is critical to the design of translational clinical studies. In previous reports, we have described the application of a conceptual knowledge engineering technique, known as constructive induction (CI) in order to satisfy such needs. However, one of the major limitations of this method is the need to engage multiple subject matter experts to verify potential hypotheses generated using CI. In this manuscript, we describe an alternative verification technique that leverages published biomedical literature abstracts. Our report will be framed in the context of an ongoing project to generate hypotheses related to the contents of a translational research data repository maintained by the CLL Research Consortium. Such hypotheses will are intended to inform the design of prospective clinical studies that can elucidate the relationships that may exist between biomarkers and patient phenotypes.