RafanoSet: Dataset of raw, manually, and automatically annotated Raphanus Raphanistrum weed images for object detection and segmentation

Data Brief. 2024 Apr 16:54:110430. doi: 10.1016/j.dib.2024.110430. eCollection 2024 Jun.

Abstract

The rationale for this data article is to provide resources which could facilitate the studies focussed over weed detection and segmentation in precision farming using computer vision. We have curated Multispectral (MS) images over crop fields of Triticum Aestivum containing heterogenous mix of Raphanus raphanistrum in both uniform and random crop spacing. This dataset is designed to facilitate weed detection and segmentation based on manual and automatically annotated Raphanus raphanistrum, commonly known as wild radish. The dataset is publicly available through the Zenodo data library and provides annotated pixel-level information that is crucial for registration and segmentation purposes. The dataset consists of 85 original MS images captured over 17 scenes covering various spectra including Blue, Green, Red, NIR (Near-Infrared), and RedEdge. Each image has a dimension of 1280 × 960 pixels and serves as the basis for the specific weed detection and segmentation. Manual annotations were performed using Visual Geometry Group Image Annotator (VIA) and the results were saved in Common Objects in Context (COCO) segmentation format. To facilitate this resource-intensive task of annotation, a Grounding DINO + Segment Anything Model (SAM) was trained with this manually annotated data to obtain automated Visual Object Classes Extended Markup Language (PASCAL VOC) annotations for 80 MS images. The dataset emphasizes quality control, validating both the 'manual" and 'automated" repositories by extracting and evaluating binary masks. The codes used for these processes are accessible to ensure transparency and reproducibility. This dataset is the first-of-its-kind public resource providing manual and automatically annotated weed information over close-ranged MS images in heterogenous agriculture environment. Researchers and practitioners in the fields of precision agriculture and computer vision can use this dataset to improve MS image registration and segmentation at close range photogrammetry with a focus on wild radish. The dataset not only helps with intra-subject registration to improve segmentation accuracy, but also provides valuable spectral information for training and refining machine learning models.

Keywords: Automatic annotation; Grounding DINO; Multispectral; Segment anything model; Weed segmentation.