WeedCube: Proximal hyperspectral image dataset of crops and weeds for machine learning applications

Data Brief. 2024 Aug 13:56:110837. doi: 10.1016/j.dib.2024.110837. eCollection 2024 Oct.

Abstract

WeedCube dataset consists of hyperspectral images of three crops (canola, soybean, and sugarbeet) and four invasive weeds species (kochia, common waterhemp, redroot pigweed, and common ragweed). Plants were grown in two separate greenhouses and plant canopies were captured from a top-down camera angle. A push-broom hyperspectral sensor in the visible near infrared region of 400-1000 nm was used for data collection. The dataset includes 160 calibrated images. The number of images can be further increased by selection of smaller region of interests (ROIs). Dataset is supplemented by Jupyter Notebook scripts that help in data augmentation, spectral pre-processing, ROI selection for points and images, and data visualization. The primary purpose of this dataset is to support weed classification or identification studies by enhancing existing training datasets and validating the generalization capabilities of existing models. Owing to the three-dimensional (3D) nature of hyperspectral images, this dataset can also be utilized by researchers and educators across various domains for the development and testing of deep learning algorithms, the creation of automated data processing pipelines effective for 3D data, the development of tools for 3D data visualization, the creation of innovative solutions for data compression, and addressing system memory issues associated with high-dimensional data.

Keywords: Crop; Deep learning; Hyperspectral imaging; Machine learning; Precision agriculture; Weed.