Big Data and Cognitive Computing

33 pages, 918 KiB

Open AccessArticle

The Relative Importance of Key Factors for Integrating Enterprise Resource Planning (ERP) Systems and Performance Management Practices in the UAE Healthcare Sector

by Karam Al-Assaf, Wadhah Alzahmi, Ryan Alshaikh, Zied Bahroun and Vian Ahmed

Big Data Cogn. Comput. 2024, 8(9), 122; https://doi.org/10.3390/bdcc8090122 - 13 Sep 2024

Abstract

This study examines integrating Enterprise Resource Planning (ERP) systems with performance management (PM) practices in the UAE healthcare sector, identifying key factors for successful adoption. It addresses a critical gap by analyzing the interplay between ERP systems and PM to enhance operational efficiency, [...] Read more.

This study examines integrating Enterprise Resource Planning (ERP) systems with performance management (PM) practices in the UAE healthcare sector, identifying key factors for successful adoption. It addresses a critical gap by analyzing the interplay between ERP systems and PM to enhance operational efficiency, patient care, and administrative processes. A literature review identified thirty-six critical factors, refined through expert interviews to highlight nine weak integration areas and two new factors. An online survey with 81 experts, who rated the 38 factors on a five-point Likert scale, provided data to calculate the Relative Importance Index (RII). The results reveal that employee involvement in performance metrics and effective organizational measures significantly impact system effectiveness and alignment. Mid-tier factors such as leadership and managerial support are essential for integration momentum, while foundational elements like infrastructure, scalability, security, and compliance are crucial for long-term success. The study recommends a holistic approach to these factors to maximize ERP benefits, offering insights for healthcare administrators and policymakers. Additionally, it highlights the need to address the challenges, opportunities, and ethical considerations associated with using digital health technology in healthcare. Future research should explore ERP integration challenges in public and private healthcare settings, tailoring systems to specific organizational needs. Full article

(This article belongs to the Special Issue Revolutionizing Healthcare: Exploring the Latest Advances in Digital Health Technology)

► Show Figures

Figure 1

28 pages, 2936 KiB

Open AccessSystematic Review

Medical IoT Record Security and Blockchain: Systematic Review of Milieu, Milestones, and Momentum

by Simeon Okechukwu Ajakwe, Igboanusi Ikechi Saviour, Vivian Ukamaka Ihekoronye, Odinachi U. Nwankwo, Mohamed Abubakar Dini, Izuazu Urslla Uchechi, Dong-Seong Kim and Jae Min Lee

Big Data Cogn. Comput. 2024, 8(9), 121; https://doi.org/10.3390/bdcc8090121 - 12 Sep 2024

Abstract

The sensitivity and exclusivity attached to personal health records make such records a prime target for cyber intruders, as unauthorized access causes unfathomable repudiation and public defamation. In reality, most medical records are micro-managed by different healthcare providers, exposing them to various security [...] Read more.

The sensitivity and exclusivity attached to personal health records make such records a prime target for cyber intruders, as unauthorized access causes unfathomable repudiation and public defamation. In reality, most medical records are micro-managed by different healthcare providers, exposing them to various security issues, especially unauthorized third-party access. Over time, substantial progress has been made in preventing unauthorized access to this critical and highly classified information. This review investigated the mainstream security challenges associated with the transmissibility of medical records, the evolutionary security strategies for maintaining confidentiality, and the existential enablers of trustworthy and transparent authorization and authentication before data transmission can be carried out. The review adopted the PRSIMA-SPIDER methodology for a systematic review of 122 articles, comprising 9 surveys (7.37%) for qualitative analysis, 109 technical papers (89.34%), and 4 online reports (3.27%) for quantitative studies. The review outcome indicates that the sensitivity and confidentiality of a highly classified document, such as a medical record, demand unabridged authorization by the owner, unquestionable preservation by the host, untainted transparency in transmission, unbiased traceability, and ubiquitous security, which blockchain technology guarantees, although at the infancy stage. Therefore, developing blockchain-assisted frameworks for digital medical record preservation and addressing inherent technological hitches in blockchain will further accelerate transparent and trustworthy preservation, user authorization, and authentication of medical records before they are transmitted by the host for third-party access. Full article

(This article belongs to the Special Issue Research on Privacy and Data Security)

► Show Figures

Figure 1

14 pages, 871 KiB

Open AccessArticle

An Efficient Green AI Approach to Time Series Forecasting Based on Deep Learning

by Luis Balderas, Miguel Lastra and José M. Benítez

Big Data Cogn. Comput. 2024, 8(9), 120; https://doi.org/10.3390/bdcc8090120 - 11 Sep 2024

Abstract

Time series forecasting is undoubtedly a key area in machine learning due to the numerous fields where it is crucial to estimate future data points of sequences based on a set of previously observed values. Deep learning has been successfully applied to this [...] Read more.

Time series forecasting is undoubtedly a key area in machine learning due to the numerous fields where it is crucial to estimate future data points of sequences based on a set of previously observed values. Deep learning has been successfully applied to this area. On the other hand, growing concerns about the steady increase in the amount of resources required by deep learning-based tools have made Green AI gain traction as a move towards making machine learning more sustainable. In this paper, we present a deep learning-based time series forecasting methodology called GreeNNTSF, which aims to reduce the size of the resulting model, thereby diminishing the associated computational and energetic costs without giving up adequate forecasting performance. The methodology, based on the ODF2NNA algorithm, produces models that outperform state-of-the-art techniques not only in terms of prediction accuracy but also in terms of computational costs and memory footprint. To prove this claim, after presenting the main state-of-the-art methods that utilize deep learning for time series forecasting and introducing our methodology we test GreeNNTSF on a selection of real-world forecasting problems that are commonly used as benchmarks, such as SARS-CoV-2 and PhysioNet (medicine), Brazilian Weather (climate), WTI and Electricity (economics), and Traffic (smart cities). The results of each experiment conducted objectively demonstrate, rigorously following the experimentation presented in the original papers that addressed these problems, that our method is more competitive than other state-of-the-art approaches, producing more accurate and efficient models. Full article

► Show Figures

Figure 1

18 pages, 13182 KiB

Open AccessArticle

Hierarchical Progressive Image Forgery Detection and Localization Method Based on UNet

by Yang Liu, Xiaofei Li, Jun Zhang, Shuohao Li, Shengze Hu and Jun Lei

Big Data Cogn. Comput. 2024, 8(9), 119; https://doi.org/10.3390/bdcc8090119 - 10 Sep 2024

Abstract

The rapid development of generative technologies has made the production of forged products easier, and AI-generated forged images are increasingly difficult to accurately detect, posing serious privacy risks and cognitive obstacles to individuals and society. Therefore, constructing an effective method that can accurately [...] Read more.

The rapid development of generative technologies has made the production of forged products easier, and AI-generated forged images are increasingly difficult to accurately detect, posing serious privacy risks and cognitive obstacles to individuals and society. Therefore, constructing an effective method that can accurately detect and locate forged regions has become an important task. This paper proposes a hierarchical and progressive forged image detection and localization method called HPUNet. This method assigns more reasonable hierarchical multi-level labels to the dataset as supervisory information at different levels, following cognitive laws. Secondly, multiple types of features are extracted from AI-generated images for detection and localization, and the detection and localization results are combined to enhance the task-relevant features. Subsequently, HPUNet expands the obtained image features into four different resolutions and performs detection and localization at different levels in a coarse-to-fine cognitive order. To address the limited feature field of view caused by inconsistent forgery sizes, we employ three sets of densely cross-connected hierarchical networks for sufficient interaction between feature images at different resolutions. Finally, a UNet network with a soft-threshold-constrained feature enhancement module is used to achieve detection and localization at different scales, and the reliance on a progressive mechanism establishes relationships between different branches. We use ACC and F1 as evaluation metrics, and extensive experiments on our method and the baseline methods demonstrate the effectiveness of our approach. Full article

► Show Figures

Figure 1

18 pages, 1889 KiB

Open AccessArticle

DBSCAN SMOTE LSTM: Effective Strategies for Distributed Denial of Service Detection in Imbalanced Network Environments

by Rissal Efendi, Teguh Wahyono and Indrastanti Ratna Widiasari

Big Data Cogn. Comput. 2024, 8(9), 118; https://doi.org/10.3390/bdcc8090118 - 10 Sep 2024

Abstract

In detecting Distributed Denial of Service (DDoS), deep learning faces challenges and difficulties such as high computational demands, long training times, and complex model interpretation. This research focuses on overcoming these challenges by proposing an effective strategy for detecting DDoS attacks in imbalanced [...] Read more.

In detecting Distributed Denial of Service (DDoS), deep learning faces challenges and difficulties such as high computational demands, long training times, and complex model interpretation. This research focuses on overcoming these challenges by proposing an effective strategy for detecting DDoS attacks in imbalanced network environments. This research employed DBSCAN and SMOTE to increase the class distribution of the dataset by allowing models using LSTM to learn time anomalies effectively when DDoS attacks occur. The experiments carried out revealed significant improvement in the performance of the LSTM model when integrated with DBSCAN and SMOTE. These include validation loss results of 0.048 for LSTM DBSCAN and SMOTE and 0.1943 for LSTM without DBSCAN and SMOTE, with accuracy of 99.50 and 97.50. Apart from that, there was an increase in the F1 score from 93.4% to 98.3%. This research proved that DBSCAN and SMOTE can be used as an effective strategy to improve model performance in detecting DDoS attacks on heterogeneous networks, as well as increasing model robustness and reliability. Full article

(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)

► Show Figures

Figure 1

40 pages, 4095 KiB

Open AccessArticle

An End-to-End Scene Text Recognition for Bilingual Text

by Bayan M. Albalawi, Amani T. Jamal, Lama A. Al Khuzayem and Olaa A. Alsaedi

Big Data Cogn. Comput. 2024, 8(9), 117; https://doi.org/10.3390/bdcc8090117 - 9 Sep 2024

Abstract

Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily [...] Read more.

Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily focused on recognizing English text, whereas Arabic text has been underrepresented, and (2) most prior research has adopted separate approaches for scene text localization and recognition, as opposed to one integrated framework. To address these gaps, we propose a novel bilingual end-to-end approach that localizes and recognizes both Arabic and English text within a single natural scene image. Specifically, our approach utilizes pre-trained CNN models (ResNet and EfficientNetV2) with kernel representation for localization text and RNN models (LSTM and BiLSTM) with an attention mechanism for text recognition. In addition, the AraElectra Arabic language model was incorporated to enhance Arabic text recognition. Experimental results on the EvArest, ICDAR2017, and ICDAR2019 datasets demonstrated that our model not only achieves superior performance in recognizing horizontally oriented text but also in recognizing multi-oriented and curved Arabic and English text in natural scene images. Full article

(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)

► Show Figures

Figure 1

23 pages, 3337 KiB

Open AccessArticle

Attention-Driven Transfer Learning Model for Improved IoT Intrusion Detection

by Salma Abdelhamid, Islam Hegazy, Mostafa Aref and Mohamed Roushdy

Big Data Cogn. Comput. 2024, 8(9), 116; https://doi.org/10.3390/bdcc8090116 - 9 Sep 2024

Abstract

The proliferation of Internet of Things (IoT) devices has become inevitable in contemporary life, significantly affecting myriad applications. Nevertheless, the pervasive use of heterogeneous IoT gadgets introduces vulnerabilities to malicious cyber-attacks, resulting in data breaches that jeopardize the network’s integrity and resilience. This [...] Read more.

The proliferation of Internet of Things (IoT) devices has become inevitable in contemporary life, significantly affecting myriad applications. Nevertheless, the pervasive use of heterogeneous IoT gadgets introduces vulnerabilities to malicious cyber-attacks, resulting in data breaches that jeopardize the network’s integrity and resilience. This study proposes an Intrusion Detection System (IDS) for IoT environments that leverages Transfer Learning (TL) and the Convolutional Block Attention Module (CBAM). We extensively evaluate four prominent pre-trained models, each integrated with an independent CBAM at the uppermost layer. Our methodology is validated using the BoT-IoT dataset, which undergoes preprocessing to rectify the imbalanced data distribution, eliminate redundancy, and reduce dimensionality. Subsequently, the tabular dataset is transformed into RGB images to enhance the interpretation of complex patterns. Our evaluation results demonstrate that integrating TL models with the CBAM significantly improves classification accuracy and reduces false-positive rates. Additionally, to further enhance the system performance, we employ an Ensemble Learning (EL) technique to aggregate predictions from the two best-performing models. The final findings prove that our TL-CBAM-EL model achieves superior performance, attaining an accuracy of 99.93% as well as high recall, precision, and F1-score. Henceforth, the proposed IDS is a robust and efficient solution for securing IoT networks. Full article

(This article belongs to the Special Issue Advances in Intelligent Defense Systems for the Internet of Things)

► Show Figures

Figure 1

15 pages, 3809 KiB

Open AccessArticle

QA-RAG: Exploring LLM Reliance on External Knowledge

by Aigerim Mansurova, Aiganym Mansurova and Aliya Nugumanova

Big Data Cogn. Comput. 2024, 8(9), 115; https://doi.org/10.3390/bdcc8090115 - 9 Sep 2024

Abstract

Large language models (LLMs) can store factual knowledge within their parameters and have achieved superior results in question-answering tasks. However, challenges persist in providing provenance for their decisions and keeping their knowledge up to date. Some approaches aim to address these challenges by [...] Read more.

Large language models (LLMs) can store factual knowledge within their parameters and have achieved superior results in question-answering tasks. However, challenges persist in providing provenance for their decisions and keeping their knowledge up to date. Some approaches aim to address these challenges by combining external knowledge with parametric memory. In contrast, our proposed QA-RAG solution relies solely on the data stored within an external knowledge base, specifically a dense vector index database. In this paper, we compare RAG configurations using two LLMs—Llama 2b and 13b—systematically examining their performance in three key RAG capabilities: noise robustness, knowledge gap detection, and external truth integration. The evaluation reveals that while our approach achieves an accuracy of 83.3%, showcasing its effectiveness across all baselines, the model still struggles significantly in terms of external truth integration. These findings suggest that considerable work is still required to fully leverage RAG in question-answering tasks. Full article

(This article belongs to the Special Issue Generative AI and Large Language Models)

► Show Figures

Figure 1

11 pages, 3275 KiB

Open AccessArticle

Analysis of Highway Vehicle Lane Change Duration Based on Survival Model

by Sheng Zhao, Shengwen Huang, Huiying Wen and Weiming Liu

Big Data Cogn. Comput. 2024, 8(9), 114; https://doi.org/10.3390/bdcc8090114 - 6 Sep 2024

Abstract

To investigate highway vehicle lane-changing behavior, we utilized the publicly available naturalistic driving dataset, HighD, to extract the movement data of vehicles involved in lane changes and their proximate counterparts. We employed univariate and multivariate Cox proportional hazards models alongside random survival forest [...] Read more.

To investigate highway vehicle lane-changing behavior, we utilized the publicly available naturalistic driving dataset, HighD, to extract the movement data of vehicles involved in lane changes and their proximate counterparts. We employed univariate and multivariate Cox proportional hazards models alongside random survival forest models to analyze the influence of various factors on lane change duration, assess their statistical significance, and compare the performance of multiple random survival forest models. Our findings indicate that several variables significantly impact lane change duration, including the standard deviation of lane-changing vehicles, lane-changing vehicle speed, distance to the following vehicle in the target lane, lane-changing vehicle length, and distance to the following vehicle in the current lane. Notably, the standard deviation and vehicle length act as protective factors, with increases in these variables correlating with longer lane change durations. Conversely, higher lane-changing vehicle speeds and shorter distances to following vehicles in both the current and target lanes are associated with shorter lane change durations, indicating their role as risk factors. Feature variable selection did not substantially improve the training performance of the random survival forest model based on our findings. However, validation set evaluation showed that careful feature variable selection can enhance model accuracy, leading to improved AUC values. These insights lay the groundwork for advancing research in predicting lane-changing behaviors, understanding lane-changing intentions, and developing pre-emptive safety measures against hazardous lane changes. Full article

► Show Figures

Figure 1

25 pages, 632 KiB

Open AccessArticle

Detection of Hate Speech, Racism and Misogyny in Digital Social Networks: Colombian Case Study

by Luis Gabriel Moreno-Sandoval, Alexandra Pomares-Quimbaya, Sergio Andres Barbosa-Sierra and Liliana Maria Pantoja-Rojas

Big Data Cogn. Comput. 2024, 8(9), 113; https://doi.org/10.3390/bdcc8090113 - 6 Sep 2024

Abstract

The growing popularity of social networking platforms worldwide has substantially increased the presence of offensive language on these platforms. To date, most of the systems developed to mitigate this challenge focus primarily on English content. However, this issue is a global concern, and [...] Read more.

The growing popularity of social networking platforms worldwide has substantially increased the presence of offensive language on these platforms. To date, most of the systems developed to mitigate this challenge focus primarily on English content. However, this issue is a global concern, and therefore, other languages, such as Spanish, are involved. This article addresses the task of identifying hate speech, racism, and misogyny in Spanish within the Colombian context on social networks, and introduces a gold standard dataset specifically developed for this purpose. Indeed, the experiment compares the performance of TLM models from Deep Learning methods, such as BERT, Roberta, XLM, and BETO adjusted to the Colombian slang domain, then compares the best TLM model against a GPT, having a significant impact on achieving more accurate predictions in this task. Finally, this study provides a detailed understanding of the different components used in the system, including the architecture of the models and the selection of functions. The best results show that the BERT model achieves an accuracy of 83.6% for hate speech detection, while the GPT model achieves an accuracy of 90.8% for racism speech and 90.4% for misogyny detection. Full article

► Show Figures

Figure 1

16 pages, 840 KiB

Open AccessArticle

Sentiment Informed Sentence BERT-Ensemble Algorithm for Depression Detection

by Bayode Ogunleye, Hemlata Sharma and Olamilekan Shobayo

Big Data Cogn. Comput. 2024, 8(9), 112; https://doi.org/10.3390/bdcc8090112 - 5 Sep 2024

Abstract

The World Health Organisation (WHO) revealed approximately 280 million people in the world suffer from depression. Yet, existing studies on early-stage depression detection using machine learning (ML) techniques are limited. Prior studies have applied a single stand-alone algorithm, which is unable to deal [...] Read more.

The World Health Organisation (WHO) revealed approximately 280 million people in the world suffer from depression. Yet, existing studies on early-stage depression detection using machine learning (ML) techniques are limited. Prior studies have applied a single stand-alone algorithm, which is unable to deal with data complexities, prone to overfitting, and limited in generalization. To this end, our paper examined the performance of several ML algorithms for early-stage depression detection using two benchmark social media datasets (D1 and D2). More specifically, we incorporated sentiment indicators to improve our model performance. Our experimental results showed that sentence bidirectional encoder representations from transformers (SBERT) numerical vectors fitted into the stacking ensemble model achieved comparable F1 scores of 69% in the dataset (D1) and 76% in the dataset (D2). Our findings suggest that utilizing sentiment indicators as an additional feature for depression detection yields an improved model performance, and thus, we recommend the development of a depressive term corpus for future work. Full article

► Show Figures

Figure 1

19 pages, 7056 KiB

Open AccessArticle

A Data-Centric Approach to Understanding the 2020 U.S. Presidential Election

by Satish Mahadevan Srinivasan and Yok-Fong Paat

Big Data Cogn. Comput. 2024, 8(9), 111; https://doi.org/10.3390/bdcc8090111 - 4 Sep 2024

Abstract

The application of analytics on Twitter feeds is a very popular field for research. A tweet with a 280-character limitation can reveal a wealth of information on how individuals express their sentiments and emotions within their network or community. Upon collecting, cleaning, and [...] Read more.

The application of analytics on Twitter feeds is a very popular field for research. A tweet with a 280-character limitation can reveal a wealth of information on how individuals express their sentiments and emotions within their network or community. Upon collecting, cleaning, and mining tweets from different individuals on a particular topic, we can capture not only the sentiments and emotions of an individual but also the sentiments and emotions expressed by a larger group. Using the well-known Lexicon-based NRC classifier, we classified nearly seven million tweets across seven battleground states in the U.S. to understand the emotions and sentiments expressed by U.S. citizens toward the 2020 presidential candidates. We used the emotions and sentiments expressed within these tweets as proxies for their votes and predicted the swing directions of each battleground state. When compared to the outcome of the 2020 presidential candidates, we were able to accurately predict the swing directions of four battleground states (Arizona, Michigan, Texas, and North Carolina), thus revealing the potential of this approach in predicting future election outcomes. The week-by-week analysis of the tweets using the NRC classifier corroborated well with the various political events that took place before the election, making it possible to understand the dynamics of the emotions and sentiments of the supporters in each camp. These research strategies and evidence-based insights may be translated into real-world settings and practical interventions to improve election outcomes. Full article

(This article belongs to the Special Issue Machine Learning in Data Mining for Knowledge Discovery)

► Show Figures

Figure 1

19 pages, 714 KiB

Open AccessArticle

Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews

by Goran Mitrov, Boris Stanoev, Sonja Gievska, Georgina Mirceva and Eftim Zdravevski

Big Data Cogn. Comput. 2024, 8(9), 110; https://doi.org/10.3390/bdcc8090110 - 4 Sep 2024

Abstract

The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting [...] Read more.

The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting an urgent need for efficient information retrieval. This challenge extends beyond academia to many organizations where numerous documents must be reviewed in relation to specific user queries. This paper focuses on improving document ranking to enhance the retrieval of relevant articles, thereby reducing the time and effort required by researchers. By applying a range of natural language processing (NLP) techniques, including rule-based matching, statistical text analysis, word embeddings, and transformer- and LLM-based approaches like Mistral LLM, we assess the article’s similarities to user-specific inputs and prioritize them according to relevance. We propose a novel methodology, Weighted Semantic Matching (WSM) + MiniLM, combining the strengths of the different methodologies. For validation, we employ global metrics such as precision at K, recall at K, average rank, median rank, and pairwise comparison metrics, including higher rank count, average rank difference, and median rank difference. Our proposed algorithm achieves optimal performance, with an average recall at 1000 of 95% and an average median rank of 185 for selected articles across the five datasets evaluated. These findings give promising results in pinpointing the relevant articles and reducing the manual work. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

29 pages, 4437 KiB

Open AccessArticle

Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems

by Atena Jalali Mojahed, Mohammad Hossein Moattar and Hamidreza Ghaffari

Big Data Cogn. Comput. 2024, 8(9), 109; https://doi.org/10.3390/bdcc8090109 - 4 Sep 2024

Abstract

Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the [...] Read more.

Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance. Full article

► Show Figures

Figure 1

24 pages, 7001 KiB

Open AccessArticle

Appendicitis Diagnosis: Ensemble Machine Learning and Explainable Artificial Intelligence-Based Comprehensive Approach

by Mohammed Gollapalli, Atta Rahman, Sheriff A. Kudos, Mohammed S. Foula, Abdullah Mahmoud Alkhalifa, Hassan Mohammed Albisher, Mohammed Taha Al-Hariri and Nazeeruddin Mohammad

Big Data Cogn. Comput. 2024, 8(9), 108; https://doi.org/10.3390/bdcc8090108 - 4 Sep 2024

Abstract

Appendicitis is a condition wherein the appendix becomes inflamed, and it can be difficult to diagnose accurately. The type of appendicitis can also be hard to determine, leading to misdiagnosis and difficulty in managing the condition. To avoid complications and reduce mortality, early [...] Read more.

Appendicitis is a condition wherein the appendix becomes inflamed, and it can be difficult to diagnose accurately. The type of appendicitis can also be hard to determine, leading to misdiagnosis and difficulty in managing the condition. To avoid complications and reduce mortality, early diagnosis and treatment are crucial. While Alvarado’s clinical scoring system is not sufficient, ultrasound and computed tomography (CT) imaging are effective but have downsides such as operator-dependency and radiation exposure. This study proposes the use of machine learning methods and a locally collected reliable dataset to enhance the identification of acute appendicitis while detecting the differences between complicated and non-complicated appendicitis. Machine learning can help reduce diagnostic errors and improve treatment decisions. This study conducted four different experiments using various ML algorithms, including K-nearest neighbors (KNN), DT, bagging, and stacking. The experimental results showed that the stacking model had the highest training accuracy, test set accuracy, precision, and F1 score, which were 97.51%, 92.63%, 95.29%, and 92.04%, respectively. Feature importance and explainable AI (XAI) identified neutrophils, WBC_Count, Total_LOS, P_O_LOS, and Symptoms_Days as the principal features that significantly affected the performance of the model. Based on the outcomes and feedback from medical health professionals, the scheme is promising in terms of its effectiveness in diagnosing of acute appendicitis. Full article

(This article belongs to the Special Issue Machine Learning Applications and Big Data Challenges)

► Show Figures

Figure 1

15 pages, 450 KiB

Open AccessArticle

A Comparative Study of Sentiment Classification Models for Greek Reviews

by Panagiotis D. Michailidis

Big Data Cogn. Comput. 2024, 8(9), 107; https://doi.org/10.3390/bdcc8090107 - 4 Sep 2024

Abstract

In recent years, people have expressed their opinions and sentiments about products, services, and other issues on social media platforms and review websites. These sentiments are typically classified as either positive or negative based on their text content. Research interest in sentiment analysis [...] Read more.

In recent years, people have expressed their opinions and sentiments about products, services, and other issues on social media platforms and review websites. These sentiments are typically classified as either positive or negative based on their text content. Research interest in sentiment analysis for text reviews written in Greek is limited compared to that in English. Existing studies conducted for the Greek language have focused more on posts collected from social media platforms rather than on consumer reviews from e-commerce websites and have primarily used traditional machine learning (ML) methods, with little to no work utilizing advanced methods like neural networks, transfer learning, and large language models. This study addresses this gap by testing the hypothesis that modern methods for sentiment classification, including artificial neural networks (ANNs), transfer learning (TL), and large language models (LLMs), perform better than traditional ML models in analyzing a Greek consumer review dataset. Several classification methods, namely, ML, ANNs, TL, and LLMs, were evaluated and compared using performance metrics on a large collection of Greek product reviews. The empirical findings showed that the GreekBERT and GPT-4 models perform significantly better than traditional ML classifiers, with BERT achieving an accuracy of 96% and GPT-4 reaching 95%, while ANNs showed similar performance to ML models. This study confirms the hypothesis, with the BERT model achieving the highest classification accuracy. Full article

► Show Figures

Figure 1

18 pages, 2336 KiB

Open AccessArticle

Performance and Board Diversity: A Practical AI Perspective

by Lee-Wen Yang, Thi Thanh Binh Nguyen and Wei-Ju Young

Big Data Cogn. Comput. 2024, 8(9), 106; https://doi.org/10.3390/bdcc8090106 - 4 Sep 2024

Abstract

The face of corporate governance is changing as new technologies in the scope of artificial intelligence and data analytics are used to make better future-oriented decisions on performance management. This study attempts to provide empirical results to analyze when the impact of diversity [...] Read more.

The face of corporate governance is changing as new technologies in the scope of artificial intelligence and data analytics are used to make better future-oriented decisions on performance management. This study attempts to provide empirical results to analyze when the impact of diversity on the board of directors is most evident through the multi-breaks model and artificial neural networks. The input data for the simulation includes 853 electronic companies listed on the Taiwan Stock Exchange from 2000 to 2021. The empirical results show that the higher the percentage of female board members, the more influential the company’s performance is, which is only evident when the company is in good business condition. By integrating ANNs with multi-breakpoint regression, this study introduces a novel approach to management research, providing a detailed perspective on how board diversity impacts firm performance across different conditions. The ANN results show that using the number of business board members for predicting Return on Assets yields the highest accuracy, with female board members following closely in predictive effectiveness. The presence of women on the board contributes positively to ROA, particularly when the company is experiencing favorable business conditions and high profitability. Our analysis also reveals that a higher percentage of male board members improves company performance, but this benefit is observed only in highly favorable and unfavorable business conditions. Conversely, a higher percentage of business members tends to affect performance during periods of high profitability negatively. The power of the board of directors and significant shareholders is positively correlated with performance, whereas CEO power positively impacts performance only when it is not extremely low. Independent board members generally do not have a significant effect on profits. Additionally, the company’s asset value positively influences performance primarily when the return on assets is high, and increased financial leverage is associated with reduced profitability. Full article

(This article belongs to the Special Issue Machine Learning Applications and Big Data Challenges)

► Show Figures

Figure 1

18 pages, 723 KiB

Open AccessArticle

Ethical AI in Financial Inclusion: The Role of Algorithmic Fairness on User Satisfaction and Recommendation

by Qin Yang and Young-Chan Lee

Big Data Cogn. Comput. 2024, 8(9), 105; https://doi.org/10.3390/bdcc8090105 - 3 Sep 2024

Abstract

This study investigates the impact of artificial intelligence (AI) on financial inclusion satisfaction and recommendation, with a focus on the ethical dimensions and perceived algorithmic fairness. Drawing upon organizational justice theory and the heuristic–systematic model, we examine how algorithm transparency, accountability, and legitimacy [...] Read more.

This study investigates the impact of artificial intelligence (AI) on financial inclusion satisfaction and recommendation, with a focus on the ethical dimensions and perceived algorithmic fairness. Drawing upon organizational justice theory and the heuristic–systematic model, we examine how algorithm transparency, accountability, and legitimacy influence users’ perceptions of fairness and, subsequently, their satisfaction with and likelihood to recommend AI-driven financial inclusion services. Through a survey-based quantitative analysis of 675 users in China, our results reveal that perceived algorithmic fairness acts as a significant mediating factor between the ethical attributes of AI systems and the user responses. Specifically, higher levels of transparency, accountability, and legitimacy enhance users’ perceptions of fairness, which, in turn, significantly increases both their satisfaction with AI-facilitated financial inclusion services and their likelihood to recommend them. This research contributes to the literature on AI ethics by empirically demonstrating the critical role of transparent, accountable, and legitimate AI practices in fostering positive user outcomes. Moreover, it addresses a significant gap in the understanding of the ethical implications of AI in financial inclusion contexts, offering valuable insights for both researchers and practitioners in this rapidly evolving field. Full article

► Show Figures

Figure 1

23 pages, 4464 KiB

Open AccessArticle

A Hybrid Segmentation Algorithm for Rheumatoid Arthritis Diagnosis Using X-ray Images

by Govindan Rajesh, Nandagopal Malarvizhi and Man-Fai Leung

Big Data Cogn. Comput. 2024, 8(9), 104; https://doi.org/10.3390/bdcc8090104 - 2 Sep 2024

Abstract

Rheumatoid Arthritis (RA) is a chronic autoimmune illness that occurs in the joints, resulting in inflammation, pain, and stiffness. X-ray examination is one of the most common diagnostic procedures for RA, but manual X-ray image analysis has limitations because it is a time-consuming [...] Read more.

Rheumatoid Arthritis (RA) is a chronic autoimmune illness that occurs in the joints, resulting in inflammation, pain, and stiffness. X-ray examination is one of the most common diagnostic procedures for RA, but manual X-ray image analysis has limitations because it is a time-consuming procedure and is prone to errors. A specific algorithm aims to a lay stable and accurate segmenting of carpal bones from hand bone images, which is vitally important for identifying rheumatoid arthritis. The algorithm demonstrates several stages, starting with Carpal bone Region of Interest (CROI) specification, dynamic thresholding, and Gray Level Co-occurrence Matrix (GLCM) application for texture analysis. To get the clear edges of the image, the component is first converted to the greyscale function and thresholding is carried out to separate the hand from the background. The pad region is identified to obtain the contours of it, and the CROI is defined by the bounding box of the largest contour. The threshold value used in the CROI method is given a dynamic feature that can separate the carpal bones from the surrounding tissue. Then the GLCM texture analysis is carried out, calculating the number of pixel neighbors, with the specific intensity and neighbor relations of the pixels. The resulting feature matrix is then employed to extract features such as contrast and energy, which are later used to categorize the images of the affected carpal bone into inflamed and normal. The proposed technique is tested on a rheumatoid arthritis image dataset, and the results show its contribution to diagnosis of the disease. The algorithm efficiently divides carpal bones and extracts the signature parameters that are critical for correct classification of the inflammation in the cartilage images. Full article

► Show Figures

Figure 1

Journal Description

Big Data and Cognitive Computing

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI