Search | arXiv e-print repository

Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency

Authors: Sakib Shahriar, Brady Lund, Nishith Reddy Mannuru, Muhammad Arbab Arshad, Kadhim Hayawi, Ravi Varma Kumar Bevara, Aashrith Mannuru, Laiba Batool

Abstract: As large language models (LLMs) continue to advance, evaluating their comprehensive capabilities becomes significant for their application in various fields. This research study comprehensively evaluates the language, vision, speech, and multimodal capabilities of GPT-4o. The study employs standardized exam questions, reasoning tasks, and translation assessments to assess the model's language capa… ▽ More As large language models (LLMs) continue to advance, evaluating their comprehensive capabilities becomes significant for their application in various fields. This research study comprehensively evaluates the language, vision, speech, and multimodal capabilities of GPT-4o. The study employs standardized exam questions, reasoning tasks, and translation assessments to assess the model's language capability. Additionally, GPT-4o's vision and speech capabilities are tested through image classification and object recognition tasks, as well as accent classification. The multimodal evaluation assesses the model's performance in integrating visual and linguistic data. Our findings reveal that GPT-4o demonstrates high accuracy and efficiency across multiple domains in language and reasoning capabilities, excelling in tasks that require few-shot learning. GPT-4o also provides notable improvements in multimodal tasks compared to its predecessors. However, the model shows variability and faces limitations in handling complex and ambiguous inputs, particularly in audio and vision capabilities. This paper highlights the need for more comprehensive benchmarks and robust evaluation frameworks, encompassing qualitative assessments involving human judgment as well as error analysis. Future work should focus on expanding datasets, investigating prompt-based assessment, and enhancing few-shot learning techniques to test the model's practical applicability and performance in real-world scenarios. △ Less

Submitted 19 June, 2024; originally announced July 2024.

arXiv:2402.18139 [pdf, other]

Cause and Effect: Can Large Language Models Truly Understand Causality?

Authors: Swagata Ashwani, Kshiteesh Hegde, Nishith Reddy Mannuru, Mayank Jindal, Dushyant Singh Sengar, Krishna Chaitanya Rao Kathala, Dishant Banga, Vinija Jain, Aman Chadha

Abstract: With the rise of Large Language Models(LLMs), it has become crucial to understand their capabilities and limitations in deciphering and explaining the complex web of causal relationships that language entails. Current methods use either explicit or implicit causal reasoning, yet there is a strong need for a unified approach combining both to tackle a wide array of causal relationships more effecti… ▽ More With the rise of Large Language Models(LLMs), it has become crucial to understand their capabilities and limitations in deciphering and explaining the complex web of causal relationships that language entails. Current methods use either explicit or implicit causal reasoning, yet there is a strong need for a unified approach combining both to tackle a wide array of causal relationships more effectively. This research proposes a novel architecture called Context Aware Reasoning Enhancement with Counterfactual Analysis(CARE CA) framework to enhance causal reasoning and explainability. The proposed framework incorporates an explicit causal detection module with ConceptNet and counterfactual statements, as well as implicit causal detection through LLMs. Our framework goes one step further with a layer of counterfactual explanations to accentuate LLMs understanding of causality. The knowledge from ConceptNet enhances the performance of multiple causal reasoning tasks such as causal discovery, causal identification and counterfactual reasoning. The counterfactual sentences add explicit knowledge of the not caused by scenarios. By combining these powerful modules, our model aims to provide a deeper understanding of causal relationships, enabling enhanced interpretability. Evaluation of benchmark datasets shows improved performance across all metrics, such as accuracy, precision, recall, and F1 scores. We also introduce CausalNet, a new dataset accompanied by our code, to facilitate further research in this domain. △ Less

Submitted 15 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2401.02358 [pdf]

A novel method to enhance pneumonia detection via a model-level ensembling of CNN and vision transformer

Authors: Sandeep Angara, Nishith Reddy Mannuru, Aashrith Mannuru, Sharath Thirunagaru

Abstract: Pneumonia remains a leading cause of morbidity and mortality worldwide. Chest X-ray (CXR) imaging is a fundamental diagnostic tool, but traditional analysis relies on time-intensive expert evaluation. Recently, deep learning has shown immense potential for automating pneumonia detection from CXRs. This paper explores applying neural networks to improve CXR-based pneumonia diagnosis. We developed a… ▽ More Pneumonia remains a leading cause of morbidity and mortality worldwide. Chest X-ray (CXR) imaging is a fundamental diagnostic tool, but traditional analysis relies on time-intensive expert evaluation. Recently, deep learning has shown immense potential for automating pneumonia detection from CXRs. This paper explores applying neural networks to improve CXR-based pneumonia diagnosis. We developed a novel model fusing Convolution Neural networks (CNN) and Vision Transformer networks via model-level ensembling. Our fusion architecture combines a ResNet34 variant and a Multi-Axis Vision Transformer small model. Both base models are initialized with ImageNet pre-trained weights. The output layers are removed, and features are combined using a flattening layer before final classification. Experiments used the Kaggle pediatric pneumonia dataset containing 1,341 normal and 3,875 pneumonia CXR images. We compared our model against standalone ResNet34, Vision Transformer, and Swin Transformer Tiny baseline models using identical training procedures. Extensive data augmentation, Adam optimization, learning rate warmup, and decay were employed. The fusion model achieved a state-of-the-art accuracy of 94.87%, surpassing the baselines. We also attained excellent sensitivity, specificity, kappa score, and positive predictive value. Confusion matrix analysis confirms fewer misclassifications. The ResNet34 and Vision Transformer combination enables jointly learning robust features from CNNs and Transformer paradigms. This model-level ensemble technique effectively integrates their complementary strengths for enhanced pneumonia classification. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: NA

arXiv:2308.02030 [pdf]

Perceptions of the Fourth Industrial Revolution and Artificial Intelligence Impact on Society

Authors: Daniel Agbaji, Brady Lund, Nishith Reddy Mannuru

Abstract: The Fourth Industrial Revolution, particularly Artificial Intelligence (AI), has had a profound impact on society, raising concerns about its implications and ethical considerations. The emergence of text generative AI tools like ChatGPT has further intensified concerns regarding ethics, security, privacy, and copyright. This study aims to examine the perceptions of individuals in different inform… ▽ More The Fourth Industrial Revolution, particularly Artificial Intelligence (AI), has had a profound impact on society, raising concerns about its implications and ethical considerations. The emergence of text generative AI tools like ChatGPT has further intensified concerns regarding ethics, security, privacy, and copyright. This study aims to examine the perceptions of individuals in different information flow categorizations toward AI. The results reveal key themes in participant-supplied definitions of AI and the fourth industrial revolution, emphasizing the replication of human intelligence, machine learning, automation, and the integration of digital technologies. Participants expressed concerns about job replacement, privacy invasion, and inaccurate information provided by AI. However, they also recognized the benefits of AI, such as solving complex problems and increasing convenience. Views on government involvement in shaping the fourth industrial revolution varied, with some advocating for strict regulations and others favoring support and development. The anticipated changes brought by the fourth industrial revolution include automation, potential job impacts, increased social disconnect, and reliance on technology. Understanding these perceptions is crucial for effectively managing the challenges and opportunities associated with AI in the evolving digital landscape. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: 17 pages, 7 figures

MSC Class: 68-XX(Primary) 68TXX; 68T42 (Secondary) ACM Class: I.2.1; H.4

arXiv:2303.13367 [pdf]

doi 10.1002/asi.24750

ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing

Authors: Brady Lund, Ting Wang, Nishith Reddy Mannuru, Bing Nie, Somipam Shimray, Ziang Wang

Abstract: This paper discusses OpenAIs ChatGPT, a generative pre-trained transformer, which uses natural language processing to fulfill text-based user requests (i.e., a chatbot). The history and principles behind ChatGPT and similar models are discussed. This technology is then discussed in relation to its potential impact on academia and scholarly research and publishing. ChatGPT is seen as a potential mo… ▽ More This paper discusses OpenAIs ChatGPT, a generative pre-trained transformer, which uses natural language processing to fulfill text-based user requests (i.e., a chatbot). The history and principles behind ChatGPT and similar models are discussed. This technology is then discussed in relation to its potential impact on academia and scholarly research and publishing. ChatGPT is seen as a potential model for the automated preparation of essays and other types of scholarly manuscripts. Potential ethical issues that could arise with the emergence of large language models like GPT-3, the underlying technology behind ChatGPT, and its usage by academics and researchers, are discussed and situated within the context of broader advancements in artificial intelligence, machine learning, and natural language processing for research and scholarly publishing. △ Less

Submitted 31 March, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Journal ref: Journal of the Association for Information Science and Technology (2023)

Showing 1–5 of 5 results for author: Mannuru, N R