Digital pathology has led to a demand for automated detection of regions of interest, such as cancerous tissue, from scanned whole slide images. With accurate methods using image analysis and machine learning, significant speed-up, and savings in costs through increased throughput in histological assessment could be achieved. This article describes a machine learning approach for detection of cancerous tissue from scanned whole slide images. Our method is based on feature engineering and supervised learning with a random forest model. The features extracted from the whole slide images include several local descriptors related to image texture, spatial structure, and distribution of nuclei. The method was evaluated in breast cancer metastasis detection from lymph node samples. Our results show that the method detects metastatic areas with high accuracy (AUC = 0.97-0.98 for tumor detection within whole image area, AUC = 0.84-0.91 for tumor vs. normal tissue detection) and that the method generalizes well for images from more than one laboratory. Further, the method outputs an interpretable classification model, enabling the linking of individual features to differences between tissue types. © 2017 International Society for Advancement of Cytometry.
Keywords: breast cancer; computer aided diagnosis; digital pathology; machine learning; metastasis detection; random forest; sentinel lymph nodes; whole slide images.
© 2017 International Society for Advancement of Cytometry.