An annotated high-content fluorescence microscopy dataset with Hoechst 33342-stained nuclei and manually labelled outlines

Data Brief. 2022 Nov 21:46:108769. doi: 10.1016/j.dib.2022.108769. eCollection 2023 Feb.

Abstract

Automated detection of cell nuclei in fluorescence microscopy images is a key task in bioimage analysis. It is essential for most types of microscopy-based high-throughput drug and genomic screening and is often required in smaller scale experiments as well. To develop and evaluate algorithms and neural networks that perform instance or semantic segmentation for detecting nuclei, high quality annotated data is essential. Here we present a benchmarking dataset of fluorescence microscopy images with Hoechst 33342-stained nuclei together with annotations of nuclei, nuclear fragments and micronuclei. Images were randomly selected from an RNA interference screen with a modified U2OS osteosarcoma cell line, acquired on a Thermo Fischer CX7 high-content imaging system at 20x magnification. Labelling was performed by a single annotator and reviewed by a biomedical expert. The dataset, called Aitslab-bioimaging1, contains 50 images showing over 2000 labelled nuclear objects in total, which is sufficiently large to train well-performing neural networks for instance or semantic segmentation. The dataset is split into training, development and test set for user convenience.

Keywords: Biomedical image analysis; Computer vision; Deep learning training and evaluation; Fluorescence microscopy; High-content screening; Instance segmentation.