We have developed novel molecular methods using a stool sample, which contains intact sloughed colon cells, to quantify colonic gene expression profiles. In this study, our goal was to identify diagnostic gene sets (combinations) for the noninvasive classification of different phenotypes. For this purpose, the effects of a legume-enriched, low glycemic index, high fermentable fiber diet was evaluated in subjects with four possible combinations of risk factors, including insulin resistance and a history of adenomatous polyps. In a randomized crossover design controlled feeding study, each participant (a total of 23; 5-12 per group) consumed the experimental diet (1.5 cups of cooked dry beans) and a control diet (isocaloric average American diet) for 4 weeks with a 3-week washout period between diets. Using prior biological knowledge, the complexity of feature selection was reduced to perform an exhaustive search on all allowable feature (gene) sets of size 3, and among these, 27 had (unbiased) error estimates of 0.15 or less. Linear discriminant analysis was successfully used to identify the best single genes and two- to three-gene combinations for distinguishing subjects with insulin resistance, a history of polyps, or exposure to a chemoprotective legume-rich diet. These results support our premise that gene products (RNA) isolated from stool have diagnostic value in terms of assessing colon cancer risk.