Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays

Konrad H Stopsack; Svitlana Tyekucheva; Molin Wang; Travis A Gerke; J Bailey Vaselkiv; Kathryn L Penney; Philip W Kantoff; Stephen P Finn; Michelangelo Fiorentino; Massimo Loda; Tamara L Lotan; Giovanni Parmigiani; Lorelei A Mucci

doi:10.7554/eLife.71265

Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays

Elife. 2021 Dec 23:10:e71265. doi: 10.7554/eLife.71265.

Authors

Konrad H Stopsack^{1

2}, Svitlana Tyekucheva^{3

4}, Molin Wang^{1

3

5}, Travis A Gerke⁶, J Bailey Vaselkiv¹, Kathryn L Penney^{1

5}, Philip W Kantoff², Stephen P Finn^{7

8}, Michelangelo Fiorentino^{1

9}, Massimo Loda¹⁰, Tamara L Lotan¹¹, Giovanni Parmigiani³, Lorelei A Mucci¹

Affiliations

¹ Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, United States.
² Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, United States.
³ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, United States.
⁴ Department of Data Science, Dana-Farber Cancer Institute, Boston, United States.
⁵ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, United States.
⁶ Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, United States.
⁷ Department of Pathology, St. James's Hospital, Dublin, Ireland.
⁸ Trinity College, Dublin, Ireland.
⁹ Pathology Unit, Addarii Institute, S. Orsola-Malpighi Hospital, Bologna, Italy.
¹⁰ Department of Pathology, Weill Cornell Medical College, New York, United States.
¹¹ Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, United States.

Abstract

Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1448 primary prostate cancers. In half of the biomarkers, more than 10% of biomarker variance was attributable to between-TMA differences (range, 1-48%). We implemented different methods to mitigate batch effects (R package batchtma), tested in plasmode simulation. Biomarker levels were more similar between mitigation approaches compared to uncorrected values. For some biomarkers, associations with clinical features changed substantially after addressing batch effects. Batch effects and resulting bias are not an error of an individual study but an inherent feature of TMA-based protein biomarker studies. They always need to be considered during study design and addressed analytically in studies using more than one TMA.

Keywords: R package; batch effect; batchtma; cancer biology; human; measurement error; tissue microarray.

Plain language summary

To understand cancer, researchers need to know which molecules tumor cells use. These so-called ‘biomarkers’ tag cancer cells as being different from healthy cells, and can be used to predict how aggressive a tumor may be, or how well it might respond to treatment. A popular technique for assessing biomarkers across multiple tumors is to use tissue microarrays. This involves taking samples from different tumors and embedding them in a block of wax, which is then cut into micro-thin slices and stained with reagents that can detect specific biomarkers, such as proteins. Each block contains hundreds of samples, which all experience the same conditions. So, any patterns detected in the staining are likely to represent real variations in the biomarkers present. Many cancer studies, however, often compare samples from multiple tissue microarrays, which may increase the risk of technical artifacts: for example, staining may look stronger in one batch of tissue samples than another, even though the amount of biomarker present in these different arrays is roughly the same. These ‘batch effects’ could potentially bias the results of the experiment and lead to the identification of misleading patterns. To evaluate how batch effects impact tissue microarray studies, Stopsack et al. examined 14 wax blocks which contained tumor samples from 1,448 men with prostate cancer. This revealed that for some biomarkers, but not others, there were noticeable differences between tissue microarrays that were clearly the result of batch effects. Stopsack et al. then tested six different ways of fixing these discrepancies using statistical methods. All six approaches were successful, even if the arrays included tumors with different characteristics, such as tumors that had been diagnosed more or less recently. This work highlights the importance of considering batch effects when using tissue microarrays to study cancer. Stopsack et al. have used their statistical approaches to develop freely available software which can reduce the biases that sometimes arise from these technical artifacts. This could help researchers avoid misleading patterns in their data and make it easier to detect real variations in the biomarkers present between tumor samples.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Biomarkers, Tumor / metabolism*
Humans
Male
Prostatic Neoplasms / diagnosis*
Prostatic Neoplasms / etiology
Tissue Array Analysis / methods*

Substances

Biomarkers, Tumor

Abstract

Plain language summary

Publication types

MeSH terms

Substances

Grants and funding