Sign in to use this feature.

Years

Between: -

Search Results (198)

Search Parameters:
Journal = Electronics
Section = Electronic Multimedia

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 531 KiB  
Article
A Robust Generalized Zero-Shot Learning Method with Attribute Prototype and Discriminative Attention Mechanism
by Xiaodong Liu, Weixing Luo, Jiale Du, Xinshuo Wang, Yuhao Dang and Yang Liu
Electronics 2024, 13(18), 3751; https://doi.org/10.3390/electronics13183751 - 21 Sep 2024
Abstract
In the field of Generalized Zero-Shot Learning (GZSL), the challenge lies in learning attribute-based information from seen classes and effectively conveying this knowledge to recognize both seen and unseen categories during the training process. This paper proposes an innovative approach to enhance the [...] Read more.
In the field of Generalized Zero-Shot Learning (GZSL), the challenge lies in learning attribute-based information from seen classes and effectively conveying this knowledge to recognize both seen and unseen categories during the training process. This paper proposes an innovative approach to enhance the generalization ability and efficiency of GZSL models by integrating a Convolutional Block Attention Module (CBAM). The CBAM blends channel-wise and spatial-wise information to emphasize key features, thereby improving the model’s discriminative and localization capabilities. Additionally, the method employs a ResNet101 backbone for systematic image feature extraction, enhanced contrastive learning, and a similarity map generator with attribute prototypes. This comprehensive framework aims to achieve robust visual–semantic embedding for classification tasks. The proposed method demonstrates significant improvements in performance metrics in benchmark datasets, showcasing its potential in advancing GZSL applications. Full article
(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)
Show Figures

Figure 1

27 pages, 4394 KiB  
Article
Exploring and Visualizing Multilingual Cultural Heritage Data Using Multi-Layer Semantic Graphs and Transformers
by Isabella Gagliardi and Maria Teresa Artese
Electronics 2024, 13(18), 3741; https://doi.org/10.3390/electronics13183741 - 20 Sep 2024
Abstract
The effectiveness of archives, particularly those related to cultural heritage, depends on their accessibility and navigability. An intuitive interface is essential for improving accessibility and inclusivity, enabling users with diverse backgrounds and expertise to interact with archival content effortlessly. This paper introduces a [...] Read more.
The effectiveness of archives, particularly those related to cultural heritage, depends on their accessibility and navigability. An intuitive interface is essential for improving accessibility and inclusivity, enabling users with diverse backgrounds and expertise to interact with archival content effortlessly. This paper introduces a new method for visualizing and navigating dataset information through the creation of semantic graphs. By leveraging pre-trained large language models, this approach groups data and generates semantic graphs. The development of multi-layer maps facilitates deep exploration of datasets, and the capability to handle multilingual datasets makes it ideal for archives containing documents in various languages. These features combine to create a user-friendly tool adaptable to various contexts, offering even non-expert users a new way to interact with and navigate the data. This enhances their overall experience, promoting a greater understanding and appreciation of the content. The paper presents experiments conducted on diverse datasets across different languages and topics employing various algorithms and methods. It provides a thorough discussion of the results obtained from these experiments. Full article
(This article belongs to the Special Issue Deep Learning in Multimedia and Computer Vision)
Show Figures

Figure 1

37 pages, 5927 KiB  
Article
Object and Pedestrian Detection on Road in Foggy Weather Conditions by Hyperparameterized YOLOv8 Model
by Ahmad Esmaeil Abbasi, Agostino Marcello Mangini and Maria Pia Fanti
Electronics 2024, 13(18), 3661; https://doi.org/10.3390/electronics13183661 - 14 Sep 2024
Abstract
Connected cooperative and automated (CAM) vehicles and self-driving cars need to achieve robust and accurate environment understanding. With this aim, they are usually equipped with sensors and adopt multiple sensing strategies, also fused among them to exploit their complementary properties. In recent years, [...] Read more.
Connected cooperative and automated (CAM) vehicles and self-driving cars need to achieve robust and accurate environment understanding. With this aim, they are usually equipped with sensors and adopt multiple sensing strategies, also fused among them to exploit their complementary properties. In recent years, artificial intelligence such as machine learning- and deep learning-based approaches have been applied for object and pedestrian detection and prediction reliability quantification. This paper proposes a procedure based on the YOLOv8 (You Only Look Once) method to discover objects on the roads such as cars, traffic lights, pedestrians and street signs in foggy weather conditions. In particular, YOLOv8 is a recent release of YOLO, a popular neural network model used for object detection and image classification. The obtained model is applied to a dataset including about 4000 foggy road images and the object detection accuracy is improved by changing hyperparameters such as epochs, batch size and augmentation methods. To achieve good accuracy and few errors in detecting objects in the images, the hyperparameters are optimized by four different methods, and different metrics are considered, namely accuracy factor, precision, recall, precision–recall and loss. Full article
(This article belongs to the Special Issue Applications and Challenges of Image Processing in Smart Environment)
Show Figures

Figure 1

17 pages, 2202 KiB  
Article
Maritime Object Detection by Exploiting Electro-Optical and Near-Infrared Sensors Using Ensemble Learning
by Muhammad Furqan Javed, Muhammad Osama Imam, Muhammad Adnan, Iqbal Murtza and Jin-Young Kim
Electronics 2024, 13(18), 3615; https://doi.org/10.3390/electronics13183615 - 11 Sep 2024
Abstract
Object detection in maritime environments is a challenging problem because of the continuously changing background and moving objects resulting in shearing, occlusion, noise, etc. Unluckily, this problem is of critical importance since such failure may result in significant loss of human lives and [...] Read more.
Object detection in maritime environments is a challenging problem because of the continuously changing background and moving objects resulting in shearing, occlusion, noise, etc. Unluckily, this problem is of critical importance since such failure may result in significant loss of human lives and economic loss. The available object detection methods rely on radar and sonar sensors. Even with the advances in electro-optical sensors, their employment in maritime object detection is rarely considered. The proposed research aims to employ both electro-optical and near-infrared sensors for effective maritime object detection. For this, dedicated deep learning detection models are trained on electro-optical and near-infrared (NIR) sensor datasets. For this, (ResNet-50, ResNet-101, and SSD MobileNet) are utilized in both electro-optical and near-infrared space. Then, dedicated ensemble classifications are constructed on each collection of base learners from electro-optical and near-infrared spaces. After this, decisions about object detection from these spaces are combined using logical-disjunction-based final ensemble classification. This strategy is utilized to reduce false negatives effectively. To evaluate the performance of the proposed methodology, the publicly available standard Singapore Maritime Dataset is used and the results show that the proposed methodology outperforms the contemporary maritime object detection techniques with a significantly improved mean average precision. Full article
(This article belongs to the Special Issue Applied Machine Learning in Intelligent Systems)
Show Figures

Figure 1

19 pages, 1547 KiB  
Review
Advancements in TinyML: Applications, Limitations, and Impact on IoT Devices
by Abdussalam Elhanashi, Pierpaolo Dini, Sergio Saponara and Qinghe Zheng
Electronics 2024, 13(17), 3562; https://doi.org/10.3390/electronics13173562 - 8 Sep 2024
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) have experienced rapid growth in both industry and academia. However, the current ML and AI models demand significant computing and processing power to achieve desired accuracy and results, often restricting their use to high-capability devices. With [...] Read more.
Artificial Intelligence (AI) and Machine Learning (ML) have experienced rapid growth in both industry and academia. However, the current ML and AI models demand significant computing and processing power to achieve desired accuracy and results, often restricting their use to high-capability devices. With advancements in embedded system technology and the substantial development in the Internet of Things (IoT) industry, there is a growing desire to integrate ML techniques into resource-constrained embedded systems for ubiquitous intelligence. This aspiration has led to the emergence of TinyML, a specialized approach that enables the deployment of ML models on resource-constrained, power-efficient, and low-cost devices. Despite its potential, the implementation of ML on such devices presents challenges, including optimization, processing capacity, reliability, and maintenance. This article delves into the TinyML model, exploring its background, the tools that support it, and its applications in advanced technologies. By understanding these aspects, we can better appreciate how TinyML is transforming the landscape of AI and ML in embedded and IoT systems. Full article
(This article belongs to the Special Issue Applied Machine Learning in Intelligent Systems)
Show Figures

Figure 1

20 pages, 14560 KiB  
Article
PAL-YOLOv8: A Lightweight Algorithm for Insulator Defect Detection
by Du Zhang, Kerang Cao, Kai Han, Changsu Kim and Hoekyung Jung
Electronics 2024, 13(17), 3500; https://doi.org/10.3390/electronics13173500 - 3 Sep 2024
Viewed by 227
Abstract
To address the challenges of high model complexity and low accuracy in detecting small targets in insulator defect detection using UAV aerial imagery, we propose a lightweight algorithm, PAL-YOLOv8. Firstly, the baseline model, YOLOv8n, is enhanced by incorporating the PKI Block from PKINet [...] Read more.
To address the challenges of high model complexity and low accuracy in detecting small targets in insulator defect detection using UAV aerial imagery, we propose a lightweight algorithm, PAL-YOLOv8. Firstly, the baseline model, YOLOv8n, is enhanced by incorporating the PKI Block from PKINet to improve the C2f module, effectively reducing the model complexity and enhancing feature extraction capabilities. Secondly, Adown from YOLOv9 is employed in the backbone and neck for downsampling, which retains more feature information while reducing the feature map size, thus improving the detection accuracy. Additionally, Focaler-SIoU is used as the bounding-box regression loss function to improve model performance by focusing on different regression samples. Finally, pruning is applied to the improved model to further reduce its size. The experimental results show that PAL-YOLOv8 achieves an mAP50 of 95.0%, which represents increases of 5.5% and 2.6% over YOLOv8n and YOLOv9t, respectively. Furthermore, GFLOPs is only 3.9, the model size is just 2.7 MB, and the parameter count is only 1.24 × 106. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

23 pages, 916 KiB  
Article
Fake Base Station Detection and Link Routing Defense
by Sourav Purification, Jinoh Kim, Jonghyun Kim and Sang-Yoon Chang
Electronics 2024, 13(17), 3474; https://doi.org/10.3390/electronics13173474 - 1 Sep 2024
Viewed by 208
Abstract
Fake base stations comprise a critical security issue in mobile networking. A fake base station exploits vulnerabilities in the broadcast message announcing a base station’s presence, which is called SIB1 in 4G LTE and 5G NR, to get user equipment to connect to [...] Read more.
Fake base stations comprise a critical security issue in mobile networking. A fake base station exploits vulnerabilities in the broadcast message announcing a base station’s presence, which is called SIB1 in 4G LTE and 5G NR, to get user equipment to connect to the fake base station. Once connected, the fake base station can deprive the user of connectivity and access to the Internet/cloud. We discovered that a fake base station can disable the victim user equipment’s connectivity for an indefinite period of time, which we validated using our threat prototype against current 4G/5G practices. We designed and built a defense scheme which detects and blacklists a fake base station and then, informed by the detection, avoids it through link routing for connectivity availability. For detection and blacklisting, our scheme uses the real-time information of both the time duration and the number of request transmissions, the features of which are directly impacted by the fake base station’s threat and which have not been studied in previous research. Upon detection, our scheme takes an active measure called link routing, which is a novel concept in mobile/4G/5G networking, where the user equipment routes the connectivity request to another base station. To defend against a Sybil-capable fake base station, we use a history–reputation-based link routing scheme for routing and base station selection. We implemented both the base station and the user on software-defined radios using open-source 5G software (srsRAN v23.10 and Open5GS v2.6.6) for validation. We varied the base station implementation to simulate legitimate vs. faulty but legitimate vs. fake and malicious base stations, where a faulty base station notifies the user of the connectivity disruption and releases the session, while a fake base station continues to hold the session. We empirically analyzed the detection and identification thresholds, which vary with the fake base station’s power and the channel condition. By strategically selecting the threshold parameters, our scheme provides zero errors, including zero false positives, to avoid blacklisting a temporarily faulty base station that cannot provide connectivity at the time. Furthermore, our link routing scheme enables the base station to switch in order to restore the connectivity availability and limit the threat impact. We also discuss future directions to facilitate and encourage R&D in securing telecommunications and base station security. Full article
(This article belongs to the Special Issue Multimedia in Radio Communication and Teleinformatics)
Show Figures

Figure 1

12 pages, 297 KiB  
Article
Cross-Domain Document Summarization Model via Two-Stage Curriculum Learning
by Seungsoo Lee, Gyunyeop Kim and Sangwoo Kang
Electronics 2024, 13(17), 3425; https://doi.org/10.3390/electronics13173425 - 29 Aug 2024
Viewed by 268
Abstract
Generative document summarization is a natural language processing technique that generates short summary sentences while preserving the content of long texts. Various fine-tuned pre-trained document summarization models have been proposed using a specific single text-summarization dataset. However, each text-summarization dataset usually specializes in [...] Read more.
Generative document summarization is a natural language processing technique that generates short summary sentences while preserving the content of long texts. Various fine-tuned pre-trained document summarization models have been proposed using a specific single text-summarization dataset. However, each text-summarization dataset usually specializes in a particular downstream task. Therefore, it is difficult to treat all cases involving multiple domains using a single dataset. Accordingly, when a generative document summarization model is fine-tuned to a specific dataset, it performs well, whereas the performance is degraded by up to 45% for datasets that are not used during learning. In short, summarization models perform well with in-domain cases, as the dataset domain during training and evaluation is the same but perform poorly with out-domain inputs. In this paper, we propose a new curriculum-learning method using mixed datasets while training a generative summarization model to be more robust on out-domain datasets. Our method performed better than XSum with 10%, 20%, and 10% lower performance degradation in CNN/DM, which comprised one of two test datasets used, compared to baseline model performance. Full article
(This article belongs to the Special Issue Natural Language Processing Method: Deep Learning and Deep Semantics)
Show Figures

Figure 1

23 pages, 76553 KiB  
Article
3DRecNet: A 3D Reconstruction Network with Dual Attention and Human-Inspired Memory
by Muhammad Awais Shoukat, Allah Bux Sargano, Lihua You and Zulfiqar Habib
Electronics 2024, 13(17), 3391; https://doi.org/10.3390/electronics13173391 - 26 Aug 2024
Viewed by 356
Abstract
Humans inherently perceive 3D scenes using prior knowledge and visual perception, but 3D reconstruction in computer graphics is challenging due to complex object geometries, noisy backgrounds, and occlusions, leading to high time and space complexity. To addresses these challenges, this study introduces 3DRecNet, [...] Read more.
Humans inherently perceive 3D scenes using prior knowledge and visual perception, but 3D reconstruction in computer graphics is challenging due to complex object geometries, noisy backgrounds, and occlusions, leading to high time and space complexity. To addresses these challenges, this study introduces 3DRecNet, a compact 3D reconstruction architecture optimized for both efficiency and accuracy through five key modules. The first module, the Human-Inspired Memory Network (HIMNet), is designed for initial point cloud estimation, assisting in identifying and localizing objects in occluded and complex regions while preserving critical spatial information. Next, separate image and 3D encoders perform feature extraction from input images and initial point clouds. These features are combined using a dual attention-based feature fusion module, which emphasizes features from the image branch over those from the 3D encoding branch. This approach ensures independence from proposals at inference time and filters out irrelevant information, leading to more accurate and detailed reconstructions. Finally, a Decoder Branch transforms the fused features into a 3D representation. The integration of attention-based fusion with the memory network in 3DRecNet significantly enhances the overall reconstruction process. Experimental results on the benchmark datasets, such as ShapeNet, ObjectNet3D, and Pix3D, demonstrate that 3DRecNet outperforms existing methods. Full article
(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)
Show Figures

Figure 1

16 pages, 3072 KiB  
Article
A Learner-Centric Explainable Educational Metaverse for Cyber–Physical Systems Engineering
by Seong-Jin Yun, Jin-Woo Kwon, Young-Hoon Lee, Jae-Heon Kim and Won-Tae Kim
Electronics 2024, 13(17), 3359; https://doi.org/10.3390/electronics13173359 - 23 Aug 2024
Viewed by 313
Abstract
Cyber–physical systems have become critical across industries. They have driven investments in education services to develop well-trained engineers. Education services for cyber–physical systems require the hiring of expert tutors with multidisciplinary knowledge, as well as acquiring expensive facilities/equipment. In response to the challenges [...] Read more.
Cyber–physical systems have become critical across industries. They have driven investments in education services to develop well-trained engineers. Education services for cyber–physical systems require the hiring of expert tutors with multidisciplinary knowledge, as well as acquiring expensive facilities/equipment. In response to the challenges posed by the need for the equipment and facilities, a metaverse-based education service that incorporates digital twins has been explored as a solution. However, the issue of recruiting expert tutors who can enhance students’ achievements remains unresolved, making it difficult to effectively cultivate talent. This paper proposes a reference architecture for a learner-centric educational metaverse with an intelligent tutoring framework as its core feature to address these issues. We develop a novel explainable artificial intelligence scheme for multi-class object detection models to assess learners’ achievements within the intelligent tutoring framework. Additionally, a genetic algorithm-based improvement search method is applied to the framework to derive personalized feedback. The proposed metaverse architecture and framework are evaluated through a case study on drone education. The experimental results show that the explainable AI scheme demonstrates an approximately 30% improvement in the explanation accuracy compared to existing methods. The survey results indicate that over 70% of learners significantly improved their skills based on the provided feedback. Full article
(This article belongs to the Special Issue Applied Machine Learning in Intelligent Systems)
Show Figures

Figure 1

13 pages, 1061 KiB  
Article
Swin-Fake: A Consistency Learning Transformer-Based Deepfake Video Detector
by Liang Yu Gong, Xue Jun Li and Peter Han Joo Chong
Electronics 2024, 13(15), 3045; https://doi.org/10.3390/electronics13153045 - 1 Aug 2024
Viewed by 439
Abstract
Deepfake has become an emerging technology affecting cyber-security with its illegal applications in recent years. Most deepfake detectors utilize CNN-based models such as the Xception Network to distinguish real or fake media; however, their performance on cross-datasets is not ideal because they suffer [...] Read more.
Deepfake has become an emerging technology affecting cyber-security with its illegal applications in recent years. Most deepfake detectors utilize CNN-based models such as the Xception Network to distinguish real or fake media; however, their performance on cross-datasets is not ideal because they suffer from over-fitting in the current stage. Therefore, this paper proposed a spatial consistency learning method to relieve this issue in three aspects. Firstly, we increased the selections of data augmentation methods to 5, which is more than our previous study’s data augmentation methods. Specifically, we captured several equal video frames of one video and randomly selected five different data augmentations to obtain different data views to enrich the input variety. Secondly, we chose Swin Transformer as the feature extractor instead of a CNN-based backbone, which means that our approach did not utilize it for downstream tasks, and could encode these data using an end-to-end Swin Transformer, aiming to learn the correlation between different image patches. Finally, this was combined with consistency learning in our study, and consistency learning was able to determine more data relationships than supervised classification. We explored the consistency of video frames’ features by calculating their cosine distance and applied traditional cross-entropy loss to regulate this classification loss. Extensive in-dataset and cross-dataset experiments demonstrated that Swin-Fake could produce relatively good results on some open-source deepfake datasets, including FaceForensics++, DFDC, Celeb-DF and FaceShifter. By comparing our model with several benchmark models, our approach shows relatively strong robustness in detecting deepfake media. Full article
(This article belongs to the Special Issue Neural Networks and Deep Learning in Computer Vision)
Show Figures

Figure 1

21 pages, 8540 KiB  
Article
LBCNIN: Local Binary Convolution Network with Intra-Class Normalization for Texture Recognition with Applications in Tactile Internet
by Nikolay Neshov, Krasimir Tonchev and Agata Manolova
Electronics 2024, 13(15), 2942; https://doi.org/10.3390/electronics13152942 - 25 Jul 2024
Viewed by 454
Abstract
Texture recognition is a pivotal task in computer vision, crucial for applications in material sciences, medicine, and agriculture. Leveraging advancements in Deep Neural Networks (DNNs), researchers seek robust methods to discern intricate patterns in images. In the context of the burgeoning Tactile Internet [...] Read more.
Texture recognition is a pivotal task in computer vision, crucial for applications in material sciences, medicine, and agriculture. Leveraging advancements in Deep Neural Networks (DNNs), researchers seek robust methods to discern intricate patterns in images. In the context of the burgeoning Tactile Internet (TI), efficient texture recognition algorithms are essential for real-time applications. This paper introduces a method named Local Binary Convolution Network with Intra-class Normalization (LBCNIN) for texture recognition. Incorporating features from the last layer of the backbone, LBCNIN employs a non-trainable Local Binary Convolution (LBC) layer, inspired by Local Binary Patterns (LBP), without fine-tuning the backbone. The encoded feature vector is fed into a linear Support Vector Machine (SVM) for classification, serving as the only trainable component. In the context of TI, the availability of images from multiple views, such as in 3D object semantic segmentation, allows for more data per object. Consequently, LBCNIN processes batches where each batch contains images from the same material class, with batch normalization employed as an intra-class normalization method, aiming to produce better results than single images. Comprehensive evaluations across texture benchmarks demonstrate LBCNIN’s ability to achieve very good results under different resource constraints, attributed to the variability in backbone architectures. Full article
(This article belongs to the Section Electronic Multimedia)
Show Figures

Figure 1

13 pages, 2929 KiB  
Article
Increasing Offline Handwritten Chinese Character Recognition Using Separated Pre-Training Models: A Computer Vision Approach
by Xiaoli He, Bo Zhang and Yuan Long
Electronics 2024, 13(15), 2893; https://doi.org/10.3390/electronics13152893 - 23 Jul 2024
Viewed by 493
Abstract
Offline handwritten Chinese character recognition involves the application of computer vision techniques to recognize individual handwritten Chinese characters. This technology has significantly advanced the research in online handwriting recognition. Despite its widespread application across various fields, offline recognition faces numerous challenges. These challenges [...] Read more.
Offline handwritten Chinese character recognition involves the application of computer vision techniques to recognize individual handwritten Chinese characters. This technology has significantly advanced the research in online handwriting recognition. Despite its widespread application across various fields, offline recognition faces numerous challenges. These challenges include the diversity of glyphs resulting from different writers’ styles and habits, the vast number of Chinese character labels, and the presence of morphological similarities among characters. To address these challenges, an optimization method based on a separated pre-training model was proposed. The method aims to enhance the accuracy and robustness of recognizing similar character images by exploring potential correlations among them. In experiments, the HWDB and Chinese Calligraphy Styles by Calligraphers datasets were employed, utilizing precision, recall, and the Macro-F1 value as evaluation metrics. We employ a convolutional self-encoder model characterized by high recognition accuracy and robust performance. The experimental results demonstrated that the separated pre-training models improved the performance of the convolutional auto-encoder model, particularly in handling error-prone characters, resulting in an approximate 6% increase in precision. Full article
(This article belongs to the Special Issue Recent Advances in Image Processing and Computer Vision)
Show Figures

Figure 1

17 pages, 39229 KiB  
Article
Thumbnail-Preserving Encryption Technology Based on Digital Processing
by Dan Li, Ziming Zhang, Xu Dai and Erfu Wang
Electronics 2024, 13(14), 2682; https://doi.org/10.3390/electronics13142682 - 9 Jul 2024
Viewed by 580
Abstract
In recent years, the security of cloud storage has become a topic attracting significant attention due to a series of features such as large storage space, high availability, and low cost. Although traditional plain text images can withstand external attacks, the usability of [...] Read more.
In recent years, the security of cloud storage has become a topic attracting significant attention due to a series of features such as large storage space, high availability, and low cost. Although traditional plain text images can withstand external attacks, the usability of images is completely lost. In order to balance the usability and privacy of images, some scholars have proposed the thumbnail-preserving encryption (TPE) scheme. The ideal TPE algorithm can keep the same thumbnail before and after encryption, which reduces the time cost and strengthens the resistance to attacks, but the existing schemes cannot fulfill the above criteria. In this paper, we propose a new TPE scheme that combines bit-transform encryption and improved hierarchical encryption. By constructing a chaotic system, both encryption and decryption times are shortened, while the randomness of the selected cells is enhanced. In addition, the Hamming distance is introduced to classify and scramble the binary encryption units. The experimental results show that when the number of thumbnail chunks is 16 × 16, the encryption and decryption time decreases to 4 s, and the SSIM value after encryption is close to 1, which indicates that the thumbnail before and after satisfying the encryption basically remains the same, and when the number of chunks is gradually increased, the success rate of the face detection tends to be close to 0. In addition, as the number of experimental iterations increases, the encryption effect improves with an increasing ability to resist attacks. Full article
(This article belongs to the Section Electronic Multimedia)
Show Figures

Figure 1

19 pages, 15069 KiB  
Article
Enhanced Remote Sensing Image Compression Method Using Large Network with Sparse Extracting Strategy
by Hui Li, Tianpeng Pan and Lili Zhang
Electronics 2024, 13(13), 2677; https://doi.org/10.3390/electronics13132677 - 8 Jul 2024
Viewed by 485
Abstract
Deep neural networks based on hyper-encoders play a critical role in estimating prior distributions in remote sensing image compression issues. However, most of the existing encoding methods suffer from a problem on the hyper-encoding side, namely the mismatch of extraction ability with the [...] Read more.
Deep neural networks based on hyper-encoders play a critical role in estimating prior distributions in remote sensing image compression issues. However, most of the existing encoding methods suffer from a problem on the hyper-encoding side, namely the mismatch of extraction ability with the encoder. This ability bias results in likelihood features that fail to extract sufficient information from latent representations. To solve this problem, the feature extraction capabilities of the hyper-encoder are enhanced to better estimate the Gaussian likelihood of the latent representation in end-to-end network optimization. Specifically, residual blocks and a parameter estimation module are incorporated to balance the performance of the encoder and the hyper-encoder. Furthermore, it is observed that the well-trained compression model tends to generate a fixed pattern of latent representations. Therefore, we incorporate a nonlocal cross-channel graph (NCG) on the backside of the encoder. Specifically, it aggregates features between similar latent representations in a graphical manner to further enhance the side information extraction capability of the hyper-encoder. Considering the computational cost, a sparse graph strategy is further developed to dynamically select the most relevant latent representations for aggregation operations, which greatly reduces the computational effort. The proposed algorithm is named nonlocal cross-channel efficient graph (NCEG). A long-dependent residual network is selected as the backbone, and a sparse attention module is inserted into the encoder/decoder side to enhance the perceptual field of the network. The experimental results on two evaluation datasets demonstrate that the proposed method achieves satisfactory results compared to other learning-based methods. Full article
(This article belongs to the Special Issue Image and Video Processing Based on Deep Learning)
Show Figures

Figure 1

Back to TopTop