Objective: We conducted this systematic review to support the U.S. Preventive Services Task Force in updating its recommendation on screening for colorectal cancer (CRC). Our review addresses three questions: 1) What is the effectiveness of screening programs in reducing incidence of and mortality from CRC? 2) What are the test performance characteristics of the different screening tests for detecting CRC, advanced adenomas, and/or adenomatous polyps based on size? and 3) What are the adverse effects of the different screening tests, and do adverse effects vary by important subpopulations?
Data Sources: We updated our prior systematic review and searched MEDLINE, PubMed, and the Cochrane Central Register of Controlled Trials to locate relevant studies for all key questions, from the end of our prior review through December 31, 2014.
Study Selection: We reviewed 8,492 abstracts and 696 articles against the specified inclusion criteria. We carried an additional 33 studies forward from our prior review. Eligible studies included English-language studies conducted in asymptomatic screening populations age 40 years and older at average risk or unselected for risk factors.
Data Analysis: We conducted dual independent critical appraisal of all included studies and extracted all important study details and outcomes from fair- or good-quality studies. We synthesized results by key question and type of screening test. We primarily used qualitative synthesis. We used random-effects meta-analyses when appropriate. We also summarized the overall strength of evidence for each key question.
Key question 1: We included 25 unique fair- to good-quality studies that assessed the effectiveness or comparative effectiveness of screening tests as a single application or in a screening program on CRC incidence and mortality. Based on four randomized, controlled trials (RCTs) (n=458,002), flexible sigmoidoscopy (FS) consistently decreased CRC-specific mortality compared to no screening at 11 to 12 years of followup (incidence rate ratio, 0.73 [95% CI, 0.66 to 0.82]). Based on five RCTs (n=404,396), biennial screening with the guaiac-based fecal occult blood test (Hemoccult II) compared to no screening resulted in reduction of CRC-specific mortality at 11 to 30 years of followup, ranging from 9 to 22 percentage points after two to nine rounds of screening. One prospective cohort (n=88,902) found that the CRC-specific mortality rate was lower at 24 years in persons who self-reported screening with colonoscopy (adjusted hazard ratio, 0.32 [95% CI, 0.24 to 0.45]) compared to those who had never had screening endoscopy.
Key question 2: We included 33 unique studies evaluating the one-time diagnostic accuracy of various screening tests compared to an adequate reference standard. Only four fair- to good-quality studies (n=4,821) reported the diagnostic accuracy of colonoscopy generalizable to community practice. Based on three studies comparing colonoscopy to CTC or CTC-enhanced colonoscopy (n=2,290), the per-person sensitivity for adenomas 10 mm or larger ranged from 89.1 percent (95% CI, 77.8 to 95.7) to 94.7 percent (95% CI, 74.0 to 99.9), and the per-person sensitivity for adenomas 6 mm or larger ranged from 74.6 percent (95% CI, 62.9 to 84.2) to 92.8 percent (95% CI, 88.1 to 96.0).
Based on studies of computed tomographic colonography (CTC) with bowel preparation (k=7), the per-person sensitivity and specificity to detect adenomas 10 mm or larger ranged from 66.7 percent (95% CI, 45.4 to 83.7) to 93.5 percent (95% CI, 83.6 to 98.1) and 86.0 percent (95% CI, 84.6 to 87.3) to 97.9 percent (95% CI, 95.7 to 99.1), respectively. The per-person sensitivity and specificity to detect adenomas 6 mm or larger ranged from 72.7 percent (95% CI, 58.4 to 84.1) to 98.0 percent (95% CI, 90.9 to 99.8) and 79.6 percent (95% CI, 77.1 to 82.0) to 93.1 percent (95% CI, 89.5 to 95.7), respectively.
The sensitivity varied considerably across different qualitative and quantitative fecal immunochemical test (FIT) assays in the included diagnostic accuracy studies. Based on studies using colonoscopy as the reference standard (k=14), we focused on selected qualitative and quantitative tests cleared by the U.S. Food and Drug Administration (i.e., OC-Light and OC FIT-CHEK, respectively) and evaluated in more than one study. Lowest sensitivity with accompanying specificity for CRC in studies using one stool specimen was 73.3 percent (95% CI, 48.3 to 90.2) and 95.5 percent (95% CI, 94.6 to 96.3), respectively. Similarly, the highest sensitivity and paired specificity was 87.5 percent (95% CI, 54.6 to 98.6) and 90.0 percent (95% CI, 89.2 to 92.4), respectively. In the largest studies, sensitivity ranged from 73.8 percent (95% CI, 62.3 to 83.3) for quantitative (n=9,989) to 78.6 percent (95% CI, 61.0 to 90.5) for qualitative (n=18,296) test categories. In one small study (n=770) that tested three stool specimens, sensitivity was 92.3 percent (95% CI, 69.3 to 99.2) and specificity was reduced to 87 percent (95% CI, 85 to 89). Results from studies using differential followup generally fell within these ranges. One fair-quality study (n=9,989) evaluated a multitarget stool DNA (mtsDNA) assay (FIT plus stool DNA) in comparison to an OC FIT-CHEK test and found that the sensitivity to detect CRC was higher than for FIT (92.3% [95% CI, 84.0 to 97.0]) but with a tradeoff of a lower specificity to detect CRC (84.4% [95% CI, 83.6 to 85.1]).
Thus far, only one blood test, which detects circulating methylated SEPT9 DNA, has been prospectively evaluated in a screening population. This test had a sensitivity of only 48.2 percent (95% CI, 32.4 to 63.6) to detect CRC.
Key question 3: We included 98 fair- to good-quality studies for the harms of CRC screening. Serious adverse events from screening colonoscopy or colonoscopy in asymptomatic persons are relatively uncommon, with a pooled estimate of 4 perforations (k=26) (95% CI, 2 to 5) and 8 major bleeds (k=22) (95% CI, 5 to 14) per 10,000 procedures. Serious adverse events from screening FS are even less common, with a pooled estimate of 1 perforation (k=16) (95% CI, 0.4 to 1.4) and 2 major bleeds (k=10) (95% CI, 1 to 4) per 10,000 procedures. Complication rates are higher in diagnostic/therapeutic colonoscopy conducted as followup to positive stool tests or FS. Eighteen studies provided analyses of differential harms of colonoscopy by age (groups). These studies generally found increasing rates of serious adverse events with increasing age, including perforation and bleeding. The risk of perforation for screening CTC (k=14) was less than 2 events per 10,000 examinations. CTC may also have harms resultant from exposure to low-dose ionizing radiation (range, 1 to 7 mSv per examination). Approximately 5 to 37 percent of examinations have extracolonic findings that necessitate actual diagnostic followup.
Limitations: Comparative effectiveness studies to date do not provide evidence of the relative benefit of different screening programs on CRC incidence or mortality. Variation of CTC test performance may be due to differences in bowel preparation, CTC imaging, or differences in reader experience or reading protocols. FITs do not represent a class of testing; therefore, evidence should be considered per family of FIT. Evidence for mtsDNA testing is limited to one study. Serious harms from endoscopy other than perforations and bleeding are subject to reporting bias, and few studies of endoscopy harms report rates of adverse events in nonendoscopy comparator arms. It is unclear if detecting extracolonic findings represents a net benefit or harm.
Conclusions: Since the 2008 USPSTF recommendation, we have more evidence on the effectiveness of FS on reducing CRC mortality, the test performance of screening CTC, and the decreasing radiation exposure from CTC, as well as the test performance of a number of promising FITs, including one FIT plus stool DNA test, that are available in the United States and approved by the U.S. Food and Drug Administration for screening. Currently used screening modalities, including colonoscopy, FS, CTC, and various high-sensitivity stool-based tests, each have different levels of evidence to support their use and different test performance to detect cancer and precursor lesions, as well as different risks of harms. Recommendations on which screening tests to use or a hierarchy of preferred screening tests will depend on the decisionmaker’s criteria for sufficiency of evidence and weighing the net benefit.