Sign in to use this feature.

Years

Between: -

Search Results (130)

Search Parameters:
Keywords = HRNet

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 14422 KiB  
Article
YOLO-SegNet: A Method for Individual Street Tree Segmentation Based on the Improved YOLOv8 and the SegFormer Network
by Tingting Yang, Suyin Zhou, Aijun Xu, Junhua Ye and Jianxin Yin
Agriculture 2024, 14(9), 1620; https://doi.org/10.3390/agriculture14091620 - 15 Sep 2024
Abstract
In urban forest management, individual street tree segmentation is a fundamental method to obtain tree phenotypes, which is especially critical. Most existing tree image segmentation models have been evaluated on smaller datasets and lack experimental verification on larger, publicly available datasets. Therefore, this [...] Read more.
In urban forest management, individual street tree segmentation is a fundamental method to obtain tree phenotypes, which is especially critical. Most existing tree image segmentation models have been evaluated on smaller datasets and lack experimental verification on larger, publicly available datasets. Therefore, this paper, based on a large, publicly available urban street tree dataset, proposes YOLO-SegNet for individual street tree segmentation. In the first stage of the street tree object detection task, the BiFormer attention mechanism was introduced into the YOLOv8 network to increase the contextual information extraction and improve the ability of the network to detect multiscale and multishaped targets. In the second-stage street tree segmentation task, the SegFormer network was proposed to obtain street tree edge information more efficiently. The experimental results indicate that our proposed YOLO-SegNet method, which combines YOLOv8+BiFormer and SegFormer, achieved a 92.0% mean intersection over union (mIoU), 95.9% mean pixel accuracy (mPA), and 97.4% accuracy on a large, publicly available urban street tree dataset. Compared with those of the fully convolutional neural network (FCN), lite-reduced atrous spatial pyramid pooling (LR-ASPP), pyramid scene parsing network (PSPNet), UNet, DeepLabv3+, and HRNet, the mIoUs of our YOLO-SegNet increased by 10.5, 9.7, 5.0, 6.8, 4.5, and 2.7 percentage points, respectively. The proposed method can effectively support smart agroforestry development. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

18 pages, 6073 KiB  
Article
Estimation of NPP in Huangshan District Based on Deep Learning and CASA Model
by Ziyu Wang, Youfeng Zhou, Xinyu Sun and Yannan Xu
Forests 2024, 15(8), 1467; https://doi.org/10.3390/f15081467 - 21 Aug 2024
Viewed by 397
Abstract
Net primary productivity (NPP) is a key indicator of the health of forest ecosystems that offers important information about the net carbon sequestration capacity of these systems. Precise assessment of NPP is crucial for measuring carbon fixation and assessing the general well-being of [...] Read more.
Net primary productivity (NPP) is a key indicator of the health of forest ecosystems that offers important information about the net carbon sequestration capacity of these systems. Precise assessment of NPP is crucial for measuring carbon fixation and assessing the general well-being of forest ecosystems. Due to the distinct ecological characteristics of various forest types, accurately understanding and delineating the distribution of these types is crucial for studying NPP. Therefore, an accurate forest-type classification is necessary prior to NPP calculation to ensure the accuracy and reliability of the research findings. This study introduced deep learning technology and constructed an HRNet-CASA framework that integrates the HRNet deep learning model and the CASA model to achieve accurate estimation of forest NPP in Huangshan District, Huangshan City, Anhui Province. Firstly, based on VHR remote sensing images, we utilized the HRNet to classify the study area into six forest types and obtained the forest type distribution map of the study area. Then, combined with climate data and forest type distribution data, the CASA model was used to estimate the NPP of forest types in the study area, and the comparison with the field data proved that the HRNet-CASA framework simulated the NPP of the study area well. The experimental findings show that the HRNet-CASA framework offers a novel approach to precise forest NPP estimation. Introducing deep learning technology not only enables precise classification of forest types but also allows for accurate estimation of NPP for different types of forests. This provides a more effective tool for forest ecological research and environmental protection. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

14 pages, 7566 KiB  
Article
Ship Segmentation via Combined Attention Mechanism and Efficient Channel Attention High-Resolution Representation Network
by Xiaoyi Li
J. Mar. Sci. Eng. 2024, 12(8), 1411; https://doi.org/10.3390/jmse12081411 - 16 Aug 2024
Viewed by 352
Abstract
Ship segmentation with small imaging size, which challenges ship detection and visual navigation model performance due to imaging noise interference, has attracted significant attention in the field. To address the issues, this study proposed a novel combined attention mechanism and efficient channel attention [...] Read more.
Ship segmentation with small imaging size, which challenges ship detection and visual navigation model performance due to imaging noise interference, has attracted significant attention in the field. To address the issues, this study proposed a novel combined attention mechanism and efficient channel attention high-resolution representation network (CA2HRNET). More specially, the proposed model fulfills accurate ship segmentation by introducing a channel attention mechanism, a multi-scale spatial attention mechanism, and a weight self-adjusted attention mechanism. Overall, the proposed CA2HRNET model enhances attention mechanism performance by focusing on the trivial yet important features and pixels of a ship against background-interference pixels. The proposed ship segmentation model can accurately focus on ship features by implementing both channel and spatial fusion attention mechanisms at each scale feature layer. Moreover, the channel attention mechanism helps the proposed framework allocate higher weights to ship-feature-related pixels. The experimental results show that the proposed CA2HRNET model outperforms its counterparts in terms of accuracy (Accs), precision (Pc), F1-score (F1s), intersection over union (IoU), and frequency-weighted IoU (FIoU). The average Accs, Pc, F1s, IoU, and FIoU for the proposed CA2HRNET model were 99.77%, 97.55%, 97%, 96.97%, and 99.55%, respectively. The research findings can promote intelligent ship visual navigation and maritime traffic management in the smart shipping era. Full article
Show Figures

Figure 1

24 pages, 7302 KiB  
Article
CTDUNet: A Multimodal CNN–Transformer Dual U-Shaped Network with Coordinate Space Attention for Camellia oleifera Pests and Diseases Segmentation in Complex Environments
by Ruitian Guo, Ruopeng Zhang, Hao Zhou, Tunjun Xie, Yuting Peng, Xili Chen, Guo Yu, Fangying Wan, Lin Li, Yongzhong Zhang and Ruifeng Liu
Plants 2024, 13(16), 2274; https://doi.org/10.3390/plants13162274 - 15 Aug 2024
Viewed by 341
Abstract
Camellia oleifera is a crop of high economic value, yet it is particularly susceptible to various diseases and pests that significantly reduce its yield and quality. Consequently, the precise segmentation and classification of diseased Camellia leaves are vital for managing pests and diseases [...] Read more.
Camellia oleifera is a crop of high economic value, yet it is particularly susceptible to various diseases and pests that significantly reduce its yield and quality. Consequently, the precise segmentation and classification of diseased Camellia leaves are vital for managing pests and diseases effectively. Deep learning exhibits significant advantages in the segmentation of plant diseases and pests, particularly in complex image processing and automated feature extraction. However, when employing single-modal models to segment Camellia oleifera diseases, three critical challenges arise: (A) lesions may closely resemble the colors of the complex background; (B) small sections of diseased leaves overlap; (C) the presence of multiple diseases on a single leaf. These factors considerably hinder segmentation accuracy. A novel multimodal model, CNN–Transformer Dual U-shaped Network (CTDUNet), based on a CNN–Transformer architecture, has been proposed to integrate image and text information. This model first utilizes text data to address the shortcomings of single-modal image features, enhancing its ability to distinguish lesions from environmental characteristics, even under conditions where they closely resemble one another. Additionally, we introduce Coordinate Space Attention (CSA), which focuses on the positional relationships between targets, thereby improving the segmentation of overlapping leaf edges. Furthermore, cross-attention (CA) is employed to align image and text features effectively, preserving local information and enhancing the perception and differentiation of various diseases. The CTDUNet model was evaluated on a self-made multimodal dataset compared against several models, including DeeplabV3+, UNet, PSPNet, Segformer, HrNet, and Language meets Vision Transformer (LViT). The experimental results demonstrate that CTDUNet achieved an mean Intersection over Union (mIoU) of 86.14%, surpassing both multimodal models and the best single-modal model by 3.91% and 5.84%, respectively. Additionally, CTDUNet exhibits high balance in the multi-class segmentation of Camellia oleifera diseases and pests. These results indicate the successful application of fused image and text multimodal information in the segmentation of Camellia disease, achieving outstanding performance. Full article
(This article belongs to the Special Issue Sustainable Strategies for Tea Crops Protection)
Show Figures

Figure 1

23 pages, 13090 KiB  
Article
Accurate UAV Small Object Detection Based on HRFPN and EfficentVMamba
by Shixiao Wu, Xingyuan Lu, Chengcheng Guo and Hong Guo
Sensors 2024, 24(15), 4966; https://doi.org/10.3390/s24154966 - 31 Jul 2024
Viewed by 610
Abstract
(1) Background: Small objects in Unmanned Aerial Vehicle (UAV) images are often scattered throughout various regions of the image, such as the corners, and may be blocked by larger objects, as well as susceptible to image noise. Moreover, due to their small size, [...] Read more.
(1) Background: Small objects in Unmanned Aerial Vehicle (UAV) images are often scattered throughout various regions of the image, such as the corners, and may be blocked by larger objects, as well as susceptible to image noise. Moreover, due to their small size, these objects occupy a limited area in the image, resulting in a scarcity of effective features for detection. (2) Methods: To address the detection of small objects in UAV imagery, we introduce a novel algorithm called High-Resolution Feature Pyramid Network Mamba-Based YOLO (HRMamba-YOLO). This algorithm leverages the strengths of a High-Resolution Network (HRNet), EfficientVMamba, and YOLOv8, integrating a Double Spatial Pyramid Pooling (Double SPP) module, an Efficient Mamba Module (EMM), and a Fusion Mamba Module (FMM) to enhance feature extraction and capture contextual information. Additionally, a new Multi-Scale Feature Fusion Network, High-Resolution Feature Pyramid Network (HRFPN), and FMM improved feature interactions and enhanced the performance of small object detection. (3) Results: For the VisDroneDET dataset, the proposed algorithm achieved a 4.4% higher Mean Average Precision (mAP) compared to YOLOv8-m. The experimental results showed that HRMamba achieved a mAP of 37.1%, surpassing YOLOv8-m by 3.8% (Dota1.5 dataset). For the UCAS_AOD dataset and the DIOR dataset, our model had a mAP 1.5% and 0.3% higher than the YOLOv8-m model, respectively. To be fair, all the models were trained without a pre-trained model. (4) Conclusions: This study not only highlights the exceptional performance and efficiency of HRMamba-YOLO in small object detection tasks but also provides innovative solutions and valuable insights for future research. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

25 pages, 8213 KiB  
Article
Automatic Perception of Typical Abnormal Situations in Cage-Reared Ducks Using Computer Vision
by Shida Zhao, Zongchun Bai, Lianfei Huo, Guofeng Han, Enze Duan, Dongjun Gong and Liaoyuan Gao
Animals 2024, 14(15), 2192; https://doi.org/10.3390/ani14152192 - 27 Jul 2024
Viewed by 395
Abstract
Overturning and death are common abnormalities in cage-reared ducks. To achieve timely and accurate detection, this study focused on 10-day-old cage-reared ducks, which are prone to these conditions, and established prior data on such situations. Using the original YOLOv8 as the base network, [...] Read more.
Overturning and death are common abnormalities in cage-reared ducks. To achieve timely and accurate detection, this study focused on 10-day-old cage-reared ducks, which are prone to these conditions, and established prior data on such situations. Using the original YOLOv8 as the base network, multiple GAM attention mechanisms were embedded into the feature fusion part (neck) to enhance the network’s focus on the abnormal regions in images of cage-reared ducks. Additionally, the Wise-IoU loss function replaced the CIoU loss function by employing a dynamic non-monotonic focusing mechanism to balance the data samples and mitigate excessive penalties from geometric parameters in the model. The image brightness was adjusted by factors of 0.85 and 1.25, and mainstream object-detection algorithms were adopted to test and compare the generalization and performance of the proposed method. Based on six key points around the head, beak, chest, tail, left foot, and right foot of cage-reared ducks, the body structure of the abnormal ducks was refined. Accurate estimation of the overturning and dead postures was achieved using the HRNet-48. The results demonstrated that the proposed method accurately recognized these states, achieving a mean Average Precision (mAP) value of 0.924, which was 1.65% higher than that of the original YOLOv8. The method effectively addressed the recognition interference caused by lighting differences, and exhibited an excellent generalization ability and comprehensive detection performance. Furthermore, the proposed abnormal cage-reared duck pose-estimation model achieved an Object Key point Similarity (OKS) value of 0.921, with a single-frame processing time of 0.528 s, accurately detecting multiple key points of the abnormal cage-reared duck bodies and generating correct posture expressions. Full article
Show Figures

Figure 1

16 pages, 6299 KiB  
Article
Study on a Landslide Segmentation Algorithm Based on Improved High-Resolution Networks
by Hui Sun, Shuguang Yang, Rui Wang and Kaixin Yang
Appl. Sci. 2024, 14(15), 6459; https://doi.org/10.3390/app14156459 - 24 Jul 2024
Cited by 1 | Viewed by 433
Abstract
Landslides are a kind of geological hazard with great destructive potential. When a landslide event occurs, a reliable landslide segmentation method is important for assessing the extent of the disaster and preventing secondary disasters. Although deep learning methods have been applied to improve [...] Read more.
Landslides are a kind of geological hazard with great destructive potential. When a landslide event occurs, a reliable landslide segmentation method is important for assessing the extent of the disaster and preventing secondary disasters. Although deep learning methods have been applied to improve the efficiency of landslide segmentation, there are still some problems that need to be solved, such as the poor segmentation due to the similarity between old landslide areas and the background features and missed detections of small-scale landslides. To tackle these challenges, a proposed high-resolution semantic segmentation algorithm for landslide scenes enhances the accuracy of landslide segmentation and addresses the challenge of missed detections in small-scale landslides. The network is based on the high-resolution network (HR-Net), which effectively integrates the efficient channel attention mechanism (efficient channel attention, ECA) into the network to enhance the representation quality of the feature maps. Moreover, the primary backbone of the high-resolution network is further enhanced to extract more profound semantic information. To improve the network’s ability to perceive small-scale landslides, atrous spatial pyramid pooling (ASPP) with ECA modules is introduced. Furthermore, to address the issues arising from inadequate training and reduced accuracy due to the unequal distribution of positive and negative samples, the network employs a combined loss function. This combined loss function effectively supervises the training of the network. Finally, the paper enhances the Loess Plateau landslide dataset using a fractional-order-based image enhancement approach and conducts experimental comparisons on this enriched dataset to evaluate the enhanced network’s performance. The experimental findings show that the proposed methodology achieves higher accuracy in segmentation performance compared to other networks. Full article
Show Figures

Figure 1

22 pages, 19269 KiB  
Article
Student Motivation Analysis Based on Raising-Hand Videos
by Jiejun Chen, Miao Wang, Liang Wang and Fuquan Huang
Sensors 2024, 24(14), 4632; https://doi.org/10.3390/s24144632 - 17 Jul 2024
Viewed by 441
Abstract
In current smart classroom research, numerous studies focus on recognizing hand-raising, but few analyze the movements to interpret students’ intentions. This limitation hinders teachers from utilizing this information to enhance the effectiveness of smart classroom teaching. Assistive teaching methods, including robotic and artificial [...] Read more.
In current smart classroom research, numerous studies focus on recognizing hand-raising, but few analyze the movements to interpret students’ intentions. This limitation hinders teachers from utilizing this information to enhance the effectiveness of smart classroom teaching. Assistive teaching methods, including robotic and artificial intelligence teaching, require smart classroom systems to both recognize and thoroughly analyze hand-raising movements. This detailed analysis enables systems to provide targeted guidance based on students’ hand-raising behavior. This study proposes a morphology-based analysis method to innovatively convert students’ skeleton key point data into several one-dimensional time series. By analyzing these time series, this method offers a more detailed analysis of student hand-raising behavior, addressing the limitations of deep learning methods that cannot compare classroom hand-raising enthusiasm or establish a detailed database of such behavior. This method primarily utilizes a neural network to obtain students’ skeleton estimation results, which are then converted into time series of several variables using the morphology-based analysis method. The YOLOX and HrNet models were employed to obtain the skeleton estimation results; YOLOX is an object detection model, while HrNet is a skeleton estimation model. This method successfully recognizes hand-raising actions and provides a detailed analysis of their speed and amplitude, effectively supplementing the coarse recognition capabilities of neural networks. The effectiveness of this method has been validated through experiments. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

18 pages, 7778 KiB  
Article
Convolutional Block Attention Module–Multimodal Feature-Fusion Action Recognition: Enabling Miner Unsafe Action Recognition
by Yu Wang, Xiaoqing Chen, Jiaoqun Li and Zengxiang Lu
Sensors 2024, 24(14), 4557; https://doi.org/10.3390/s24144557 - 14 Jul 2024
Cited by 1 | Viewed by 569
Abstract
The unsafe action of miners is one of the main causes of mine accidents. Research on underground miner unsafe action recognition based on computer vision enables relatively accurate real-time recognition of unsafe action among underground miners. A dataset called unsafe actions of underground [...] Read more.
The unsafe action of miners is one of the main causes of mine accidents. Research on underground miner unsafe action recognition based on computer vision enables relatively accurate real-time recognition of unsafe action among underground miners. A dataset called unsafe actions of underground miners (UAUM) was constructed and included ten categories of such actions. Underground images were enhanced using spatial- and frequency-domain enhancement algorithms. A combination of the YOLOX object detection algorithm and the Lite-HRNet human key-point detection algorithm was utilized to obtain skeleton modal data. The CBAM-PoseC3D model, a skeleton modal action-recognition model incorporating the CBAM attention module, was proposed and combined with the RGB modal feature-extraction model CBAM-SlowOnly. Ultimately, this formed the Convolutional Block Attention Module–Multimodal Feature-Fusion Action Recognition (CBAM-MFFAR) model for recognizing unsafe actions of underground miners. The improved CBAM-MFFAR model achieved a recognition accuracy of 95.8% on the NTU60 RGB+D public dataset under the X-Sub benchmark. Compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, the recognition accuracy was improved by 2%, 2.7%, 7.3%, and 14.3%, respectively. On the UAUM dataset, the CBAM-MFFAR model achieved a recognition accuracy of 94.6%, with improvements of 2.6%, 4%, 12%, and 17.3% compared to the CBAM-PoseC3D, PoseC3D, 2S-AGCN, and ST-GCN models, respectively. In field validation at mining sites, the CBAM-MFFAR model accurately recognized similar and multiple unsafe actions among underground miners. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

16 pages, 7412 KiB  
Article
An Identification Method for Mixed Coal Vitrinite Components Based on An Improved DeepLabv3+ Network
by Fujie Wang, Fanfan Li, Wei Sun, Xiaozhong Song and Huishan Lu
Energies 2024, 17(14), 3453; https://doi.org/10.3390/en17143453 - 13 Jul 2024
Viewed by 606
Abstract
To address the high complexity and low accuracy issues of traditional methods in mixed coal vitrinite identification, this paper proposes a method based on an improved DeepLabv3+ network. First, MobileNetV2 is used as the backbone network to reduce the number of parameters. Second, [...] Read more.
To address the high complexity and low accuracy issues of traditional methods in mixed coal vitrinite identification, this paper proposes a method based on an improved DeepLabv3+ network. First, MobileNetV2 is used as the backbone network to reduce the number of parameters. Second, an atrous convolution layer with a dilation rate of 24 is added to the ASPP (atrous spatial pyramid pooling) module to further increase the receptive field. Meanwhile, a CBAM (convolutional block attention module) attention mechanism with a channel multiplier of 8 is introduced at the output part of the ASPP module to better filter out important semantic features. Then, a corrective convolution module is added to the network’s output to ensure the consistency of each channel’s output feature map for each type of vitrinite. Finally, images of 14 single vitrinite components are used as training samples for network training, and a validation set is used for identification testing. The results show that the improved DeepLabv3+ achieves 6.14% and 3.68% improvements in MIOU (mean intersection over union) and MPA (mean pixel accuracy), respectively, compared to the original DeepLabv3+; 12% and 5.3% improvements compared to U-Net; 9.26% and 4.73% improvements compared to PSPNet with ResNet as the backbone; 5.4% and 9.34% improvements compared to PSPNet with MobileNetV2 as the backbone; and 6.46% and 9.05% improvements compared to HRNet. Additionally, the improved ASPP module increases MIOU and MPA by 3.23% and 1.93%, respectively, compared to the original module. The CBAM attention mechanism with a channel multiplier of 8 improves MIOU and MPA by 1.97% and 1.72%, respectively, compared to the original channel multiplier of 16. The data indicate that the proposed identification method significantly improves recognition accuracy and can be effectively applied to mixed coal vitrinite identification. Full article
(This article belongs to the Special Issue Factor Analysis and Mathematical Modeling of Coals)
Show Figures

Figure 1

21 pages, 2548 KiB  
Article
Application of Advanced Deep Learning Models for Efficient Apple Defect Detection and Quality Grading in Agricultural Production
by Xiaotong Gao, Songwei Li, Xiaotong Su, Yan Li, Lingyun Huang, Weidong Tang, Yuanchen Zhang and Min Dong
Agriculture 2024, 14(7), 1098; https://doi.org/10.3390/agriculture14071098 - 9 Jul 2024
Viewed by 679
Abstract
In this study, a deep learning-based system for apple defect detection and quality grading was developed, integrating various advanced image-processing technologies and machine learning algorithms to enhance the automation and accuracy of apple quality monitoring. Experimental validation demonstrated the superior performance of the [...] Read more.
In this study, a deep learning-based system for apple defect detection and quality grading was developed, integrating various advanced image-processing technologies and machine learning algorithms to enhance the automation and accuracy of apple quality monitoring. Experimental validation demonstrated the superior performance of the proposed model in handling complex image tasks. In the defect-segmentation experiments, the method achieved a precision of 93%, a recall of 90%, an accuracy of 91% and a mean Intersection over Union (mIoU) of 92%, significantly surpassing traditional deep learning models such as U-Net, SegNet, PSPNet, UNet++, DeepLabv3+ and HRNet. Similarly, in the quality-grading experiments, the method exhibited high efficiency with a precision of 91%, and both recall and accuracy reaching 90%. Additionally, ablation experiments with different loss functions confirmed the significant advantages of the Jump Loss in enhancing model performance, particularly in addressing class imbalance and improving feature learning. These results not only validate the effectiveness and reliability of the system in practical applications but also highlight its potential in automating the detection and grading processes in the apple industry. This integration of advanced technologies provides a new automated solution for quality control of agricultural products like apples, facilitating the modernization of agricultural production. Full article
Show Figures

Figure 1

8 pages, 1586 KiB  
Article
Automated Laryngeal Invasion Detector of Boluses in Videofluoroscopic Swallowing Study Videos Using Action Recognition-Based Networks
by Kihwan Nam, Changyeol Lee, Taeheon Lee, Munseop Shin, Bo Hae Kim and Jin-Woo Park
Diagnostics 2024, 14(13), 1444; https://doi.org/10.3390/diagnostics14131444 - 6 Jul 2024
Viewed by 619
Abstract
We aimed to develop an automated detector that determines laryngeal invasion during swallowing. Laryngeal invasion, which causes significant clinical problems, is defined as two or more points on the penetration–aspiration scale (PAS). We applied two three-dimensional (3D) stream networks for action recognition in [...] Read more.
We aimed to develop an automated detector that determines laryngeal invasion during swallowing. Laryngeal invasion, which causes significant clinical problems, is defined as two or more points on the penetration–aspiration scale (PAS). We applied two three-dimensional (3D) stream networks for action recognition in videofluoroscopic swallowing study (VFSS) videos. To detect laryngeal invasion (PAS 2 or higher scores) in VFSS videos, we employed two 3D stream networks for action recognition. To establish the robustness of our model, we compared its performance with those of various current image classification-based architectures. The proposed model achieved an accuracy of 92.10%. Precision, recall, and F1 scores for detecting laryngeal invasion (≥PAS 2) in VFSS videos were 0.9470 each. The accuracy of our model in identifying laryngeal invasion surpassed that of other updated image classification models (60.58% for ResNet101, 60.19% for Swin-Transformer, 63.33% for EfficientNet-B2, and 31.17% for HRNet-W32). Our model is the first automated detector of laryngeal invasion in VFSS videos based on video action recognition networks. Considering its high and balanced performance, it may serve as an effective screening tool before clinicians review VFSS videos, ultimately reducing the burden on clinicians. Full article
(This article belongs to the Special Issue Advances in Diagnosis and Treatment in Otolaryngology)
Show Figures

Figure 1

21 pages, 5602 KiB  
Article
EMR-HRNet: A Multi-Scale Feature Fusion Network for Landslide Segmentation from Remote Sensing Images
by Yuanhang Jin, Xiaosheng Liu and Xiaobin Huang
Sensors 2024, 24(11), 3677; https://doi.org/10.3390/s24113677 - 6 Jun 2024
Viewed by 584
Abstract
Landslides constitute a significant hazard to human life, safety and natural resources. Traditional landslide investigation methods demand considerable human effort and expertise. To address this issue, this study introduces an innovative landslide segmentation framework, EMR-HRNet, aimed at enhancing accuracy. Initially, a novel data [...] Read more.
Landslides constitute a significant hazard to human life, safety and natural resources. Traditional landslide investigation methods demand considerable human effort and expertise. To address this issue, this study introduces an innovative landslide segmentation framework, EMR-HRNet, aimed at enhancing accuracy. Initially, a novel data augmentation technique, CenterRep, is proposed, not only augmenting the training dataset but also enabling the model to more effectively capture the intricate features of landslides. Furthermore, this paper integrates a RefConv and Multi-Dconv Head Transposed Attention (RMA) feature pyramid structure into the HRNet model, augmenting the model’s capacity for semantic recognition and expression at various levels. Last, the incorporation of the Dilated Efficient Multi-Scale Attention (DEMA) block substantially widens the model’s receptive field, bolstering its capability to discern local features. Rigorous evaluations on the Bijie dataset and the Sichuan and surrounding area dataset demonstrate that EMR-HRNet outperforms other advanced semantic segmentation models, achieving mIoU scores of 81.70% and 71.68%, respectively. Additionally, ablation studies conducted across the comprehensive dataset further corroborate the enhancements’ efficacy. The results indicate that EMR-HRNet excels in processing satellite and UAV remote sensing imagery, showcasing its significant potential in multi-source optical remote sensing for landslide segmentation. Full article
Show Figures

Figure 1

20 pages, 4796 KiB  
Article
ABNet: An Aggregated Backbone Network Architecture for Fine Landcover Classification
by Bo Si, Zhennan Wang, Zhoulu Yu and Ke Wang
Remote Sens. 2024, 16(10), 1725; https://doi.org/10.3390/rs16101725 - 13 May 2024
Viewed by 864
Abstract
High-precision landcover classification is a fundamental prerequisite for resource and environmental monitoring and land-use status surveys. Imbued with intricate spatial information and texture features, very high spatial resolution remote sensing images accentuate the divergence between features within the same category, thereby amplifying the [...] Read more.
High-precision landcover classification is a fundamental prerequisite for resource and environmental monitoring and land-use status surveys. Imbued with intricate spatial information and texture features, very high spatial resolution remote sensing images accentuate the divergence between features within the same category, thereby amplifying the complexity of landcover classification. Consequently, semantic segmentation models leveraging deep backbone networks have emerged as stalwarts in landcover classification tasks owing to their adeptness in feature representation. However, the classification efficacy of a solitary backbone network model fluctuates across diverse scenarios and datasets, posing a persistent challenge in the construction or selection of an appropriate backbone network for distinct classification tasks. To elevate the classification performance and bolster the generalization of semantic segmentation models, we propose a novel semantic segmentation network architecture, named the aggregated backbone network (ABNet), for the meticulous landcover classification. ABNet aggregates three prevailing backbone networks (ResNet, HRNet, and VoVNet), distinguished by significant structural disparities, using a same-stage fusion approach. Subsequently, it amalgamates these networks with the Deeplabv3+ head after integrating the convolutional block attention mechanism (CBAM). Notably, this amalgamation harmonizes distinct scale features extracted by the three backbone networks, thus enriching the model’s spatial contextual comprehension and expanding its receptive field, thereby facilitating more effective semantic feature extraction across different stages. The convolutional block attention mechanism primarily orchestrates channel adjustments and curtails redundant information within the aggregated feature layers. Ablation experiments demonstrate an enhancement of no less than 3% in the mean intersection over union (mIoU) of ABNet on both the LoveDA and GID15 datasets when compared with a single backbone network model. Furthermore, in contrast to seven classical or state-of-the-art models (UNet, FPN, PSPNet, DANet, CBNet, CCNet, and UPerNet), ABNet evinces excellent segmentation performance across the aforementioned datasets, underscoring the efficiency and robust generalization capabilities of the proposed approach. Full article
Show Figures

Figure 1

32 pages, 8146 KiB  
Article
SCRP-Radar: Space-Aware Coordinate Representation for Human Pose Estimation Based on SISO UWB Radar
by Xiaolong Zhou, Tian Jin, Yongpeng Dai, Yongping Song and Kemeng Li
Remote Sens. 2024, 16(9), 1572; https://doi.org/10.3390/rs16091572 - 28 Apr 2024
Cited by 1 | Viewed by 997
Abstract
Human pose estimation (HPE) is an integral component of numerous applications ranging from healthcare monitoring to human-computer interaction, traditionally relying on vision-based systems. These systems, however, face challenges such as privacy concerns and dependency on lighting conditions. As an alternative, short-range radar technology [...] Read more.
Human pose estimation (HPE) is an integral component of numerous applications ranging from healthcare monitoring to human-computer interaction, traditionally relying on vision-based systems. These systems, however, face challenges such as privacy concerns and dependency on lighting conditions. As an alternative, short-range radar technology offers a non-invasive, lighting-insensitive solution that preserves user privacy. This paper presents a novel radar-based framework for HPE, SCRP-Radar (space-aware coordinate representation for human pose estimation using single-input single-output (SISO) ultra-wideband (UWB) radar). The methodology begins with clutter suppression and denoising techniques to enhance the quality of radar echo signals, followed by the construction of a micro-Doppler (MD) matrix from these refined signals. This matrix is segmented into bins to extract distinctive features that are critical for pose estimation. The SCRP-Radar leverages the Hrnet and LiteHrnet networks, incorporating space-aware coordinate representation to reconstruct 2D human poses with high precision. Our method redefines HPE as dual classification tasks for vertical and horizontal coordinates, which is a significant departure from existing methods such as RF-Pose, RF-Pose 3D, UWB-Pose, and RadarFormer. Extensive experimental evaluations demonstrate that SCRP-Radar significantly surpasses these methods in accuracy and robustness, consistently exhibiting lower average error rates, achieving less than 40 mm across 17 skeletal key-points. This innovative approach not only enhances the precision of radar-based HPE but also sets a new benchmark for future research and application, particularly in sectors that benefit from accurate and privacy-preserving monitoring technologies. Full article
(This article belongs to the Special Issue State-of-the-Art and Future Developments: Short-Range Radar)
Show Figures

Figure 1

Back to TopTop