One to All: Toward a Unified Model for Counting Cereal Crop Heads Based on Few-Shot Learning

Qiang Wang; Xijian Fan; Ziqing Zhuang; Tardi Tjahjadi; Shichao Jin; Honghua Huan; Qiaolin Ye

doi:10.34133/plantphenomics.0271

One to All: Toward a Unified Model for Counting Cereal Crop Heads Based on Few-Shot Learning

Plant Phenomics. 2024 Nov 28:6:0271. doi: 10.34133/plantphenomics.0271. eCollection 2024.

Authors

Qiang Wang¹, Xijian Fan¹, Ziqing Zhuang¹, Tardi Tjahjadi², Shichao Jin³, Honghua Huan⁴, Qiaolin Ye¹

Affiliations

¹ Nanjing Forestry University, Nanjing 210037, China.
² University of Warwick, Coventry CV4 7AL, UK.
³ Crop Phenomics Research Centre, Academy for Advanced Interdisciplinary Studies, Collaborative Innovation Centre for Modern Crop Production cosponsored by Province and Ministry, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China.
⁴ Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China.

Abstract

Accurate counting of cereals crops, e.g., maize, rice, sorghum, and wheat, is crucial for estimating grain production and ensuring food security. However, existing methods for counting cereal crops focus predominantly on building models for specific crop head; thus, they lack generalizability to different crop varieties. This paper presents Counting Heads of Cereal Crops Net (CHCNet), which is a unified model designed for counting multiple cereal crop heads by few-shot learning, which effectively reduces labeling costs. Specifically, a refined vision encoder is developed to enhance feature embedding, where a foundation model, namely, the segment anything model (SAM), is employed to emphasize the marked crop heads while mitigating complex background effects. Furthermore, a multiscale feature interaction module is proposed for integrating a similarity metric to facilitate automatic learning of crop-specific features across varying scales, which enhances the ability to describe crop heads of various sizes and shapes. The CHCNet model adopts a 2-stage training procedure. The initial stage focuses on latent feature mining to capture common feature representations of cereal crops. In the subsequent stage, inference is performed without additional training, by extracting domain-specific features of the target crop from selected exemplars to accomplish the counting task. In extensive experiments on 6 diverse crop datasets captured from ground cameras and drones, CHCNet substantially outperformed state-of-the-art counting methods in terms of cross-crop generalization ability, achieving mean absolute errors (MAEs) of 9.96 and 9.38 for maize, 13.94 for sorghum, 7.94 for rice, and 15.62 for mixed crops. A user-friendly interactive demo is available at http://cerealcropnet.com/, where researchers are invited to personally evaluate the proposed CHCNet. The source code for implementing CHCNet is available at https://github.com/Small-flyguy/CHCNet.