Zum Hauptinhalt springen

Showing 1–25 of 25 results for author: Raman, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12870  [pdf, other

    cs.AI cs.CV

    Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?

    Authors: Pritam Sil, Parag Chaudhuri, Bhaskaran Raman

    Abstract: With recent advancements in artificial intelligence (AI), there has been growing interest in using state of the art (SOTA) AI solutions to provide assistance in grading handwritten answer sheets. While a few commercial products exist, the question of whether AI-assistance can actually reduce grading effort and time has not yet been carefully considered in published literature. This work introduces… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  2. arXiv:2408.04940  [pdf, other

    cs.CV

    Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy

    Authors: Palak Handa, Amirreza Mahbod, Florian Schwarzhans, Ramona Woitek, Nidhi Goel, Deepti Chhabra, Shreshtha Jha, Manas Dhir, Deepak Gunjan, Jagadeesh Kakarla, Balasubramanian Raman

    Abstract: We present the Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy. It is being virtually organized by the Research Center for Medical Image Analysis and Artificial Intelligence (MIAAI), Department of Medicine, Danube Private University, Krems, Austria and Medical Imaging and Signal Analysis Hub (MISAHUB) in collaboration with the 9th International Con… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 6 pages

  3. arXiv:2407.12818  [pdf, other

    cs.CL cs.AI cs.CY

    "I understand why I got this grade": Automatic Short Answer Grading with Feedback

    Authors: Dishank Aggarwal, Pushpak Bhattacharyya, Bhaskaran Raman

    Abstract: The demand for efficient and accurate assessment methods has intensified as education systems transition to digital platforms. Providing feedback is essential in educational settings and goes beyond simply conveying marks as it justifies the assigned marks. In this context, we present a significant advancement in automated grading by introducing Engineering Short Answer Feedback (EngSAF) -- a data… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  4. arXiv:2405.10256  [pdf, other

    cs.CV

    Biasing & Debiasing based Approach Towards Fair Knowledge Transfer for Equitable Skin Analysis

    Authors: Anshul Pundhir, Balasubramanian Raman, Pravendra Singh

    Abstract: Deep learning models, particularly Convolutional Neural Networks (CNNs), have demonstrated exceptional performance in diagnosing skin diseases, often outperforming dermatologists. However, they have also unveiled biases linked to specific demographic traits, notably concerning diverse skin tones or gender, prompting concerns regarding fairness and limiting their widespread deployment. Researchers… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  5. arXiv:2403.02833  [pdf, other

    cs.LG cs.NE

    SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix

    Authors: Mrinmay Sen, A. K. Qin, Gayathri C, Raghu Kishore N, Yen-Wei Chen, Balasubramanian Raman

    Abstract: This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and… ▽ More

    Submitted 1 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  6. arXiv:2402.07640  [pdf, other

    cs.MM cs.AI

    CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis

    Authors: Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li

    Abstract: The Controllable Multimodal Feedback Synthesis (CMFeed) dataset enables the generation of sentiment-controlled feedback from multimodal inputs. It contains images, text, human comments, comments' metadata and sentiment labels. Existing datasets for related tasks such as multimodal summarization, visual question answering, visual dialogue, and sentiment-aware text generation do not incorporate trai… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  7. arXiv:2307.00324  [pdf, other

    cs.CV cs.LG

    DeepMediX: A Deep Learning-Driven Resource-Efficient Medical Diagnosis Across the Spectrum

    Authors: Kishore Babu Nampalle, Pradeep Singh, Uppala Vivek Narayan, Balasubramanian Raman

    Abstract: In the rapidly evolving landscape of medical imaging diagnostics, achieving high accuracy while preserving computational efficiency remains a formidable challenge. This work presents \texttt{DeepMediX}, a groundbreaking, resource-efficient model that significantly addresses this challenge. Built on top of the MobileNetV2 architecture, DeepMediX excels in classifying brain MRI scans and skin cancer… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: 23 pages, 3 figures, 4 tables, 1 algorithm

    ACM Class: I.2.1

  8. arXiv:2306.17794  [pdf, other

    cs.LG cs.CR

    Vision Through the Veil: Differential Privacy in Federated Learning for Medical Image Classification

    Authors: Kishore Babu Nampalle, Pradeep Singh, Uppala Vivek Narayan, Balasubramanian Raman

    Abstract: The proliferation of deep learning applications in healthcare calls for data aggregation across various institutions, a practice often associated with significant privacy concerns. This concern intensifies in medical image analysis, where privacy-preserving mechanisms are paramount due to the data being sensitive in nature. Federated learning, which enables cooperative model training without direc… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: 18 pages, 3 figures, 1 table, 1 algorithm

    MSC Class: 68U10 ACM Class: I.2.1

  9. arXiv:2306.15574  [pdf, other

    cs.CV cs.LG

    See Through the Fog: Curriculum Learning with Progressive Occlusion in Medical Imaging

    Authors: Pradeep Singh, Kishore Babu Nampalle, Uppala Vivek Narayan, Balasubramanian Raman

    Abstract: In recent years, deep learning models have revolutionized medical image interpretation, offering substantial improvements in diagnostic accuracy. However, these models often struggle with challenging images where critical features are partially or fully occluded, which is a common scenario in clinical practice. In this paper, we propose a novel curriculum learning-based approach to train deep lear… ▽ More

    Submitted 30 June, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: 25 pages, 3 figures, 1 table (supplementary section added)

    MSC Class: 68T05; 68T10; 92C55 ACM Class: I.5.1

  10. arXiv:2305.15426  [pdf, other

    cs.CV cs.LG

    Transcending Grids: Point Clouds and Surface Representations Powering Neurological Processing

    Authors: Kishore Babu Nampalle, Pradeep Singh, Vivek Narayan Uppala, Sumit Gangwar, Rajesh Singh Negi, Balasubramanian Raman

    Abstract: In healthcare, accurately classifying medical images is vital, but conventional methods often hinge on medical data with a consistent grid structure, which may restrict their overall performance. Recent medical research has been focused on tweaking the architectures to attain better performance without giving due consideration to the representation of data. In this paper, we present a novel approa… ▽ More

    Submitted 2 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  11. arXiv:2208.11868  [pdf, other

    cs.CV cs.SD eess.AS

    Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data

    Authors: Puneet Kumar, Sarthak Malik, Balasubramanian Raman

    Abstract: This paper proposes a multimodal emotion recognition system based on hybrid fusion that classifies the emotions depicted by speech utterances and corresponding images into discrete classes. A new interpretability technique has been developed to identify the important speech & image features leading to the prediction of particular emotion classes. The proposed system's architecture has been determi… ▽ More

    Submitted 7 January, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: text overlap with arXiv:2208.11450

  12. arXiv:2208.11450  [pdf, other

    cs.CV

    VISTANet: VIsual Spoken Textual Additive Net for Interpretable Multimodal Emotion Recognition

    Authors: Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li

    Abstract: This paper proposes a multimodal emotion recognition system, VIsual Spoken Textual Additive Net (VISTANet), to classify emotions reflected by input containing image, speech, and text into discrete classes. A new interpretability technique, K-Average Additive exPlanation (KAAP), has been developed that identifies important visual, spoken, and textual features leading to predicting a particular emot… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 August, 2022; originally announced August 2022.

  13. arXiv:2203.12692  [pdf, other

    cs.MM cs.CV

    Affective Feedback Synthesis Towards Multimodal Text and Image Data

    Authors: Puneet Kumar, Gaurav Bhat, Omkar Ingle, Daksh Goyal, Balasubramanian Raman

    Abstract: In this paper, we have defined a novel task of affective feedback synthesis that deals with generating feedback for input text & corresponding image in a similar way as humans respond towards the multimodal data. A feedback synthesis system has been proposed and trained using ground-truth human comments along with image-text input. We have also constructed a large-scale dataset consisting of image… ▽ More

    Submitted 31 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Submitted to ACM Transactions on Multimedia Computing, Communications, and Applications

  14. arXiv:2105.14512  [pdf, other

    cs.CR

    SHELBRS: Location Based Recommendation Services using Switchable Homomorphic Encryption

    Authors: Mishel Jain, Priyanka Singh, Balasubramanian Raman

    Abstract: Location-Based Recommendation Services (LBRS) has seen an unprecedented rise in its usage in recent years. LBRS facilitates a user by recommending services based on his location and past preferences. However, leveraging such services comes at a cost of compromising one's sensitive information like their shopping preferences, lodging places, food habits, recently visited places, etc. to the third-p… ▽ More

    Submitted 30 May, 2021; originally announced May 2021.

    Comments: 9 pages, 6 figures, 3 tables

  15. arXiv:2103.12523  [pdf, other

    cs.CV

    Region extraction based approach for cigarette usage classification using deep learning

    Authors: Anshul Pundhir, Deepak Verma, Puneet Kumar, Balasubramanian Raman

    Abstract: This paper has proposed a novel approach to classify the subjects' smoking behavior by extracting relevant regions from a given image using deep learning. After the classification, we have proposed a conditional detection module based on Yolo-v3, which improves model's performance and reduces its complexity. As per the best of our knowledge, we are the first to work on this dataset. This dataset c… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 5 pages, 16 figures. To appear in the proceedings of the 28th IEEE International Conference on Image Processing (IEEE - ICIP), September 19-22, 2021, Anchorage, Alaska, USA

  16. arXiv:2011.08388  [pdf, other

    cs.CV cs.LG

    Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions

    Authors: Puneet Kumar, Balasubramanian Raman

    Abstract: This paper proposes a feature-based domain adaptation technique for identifying emotions in generic images, encompassing both facial and non-facial objects, as well as non-human components. This approach addresses the challenge of the limited availability of pre-trained models and well-annotated datasets for Image Emotion Recognition (IER). Initially, a deep-learning-based Facial Expression Recogn… ▽ More

    Submitted 28 August, 2024; v1 submitted 16 November, 2020; originally announced November 2020.

  17. arXiv:2010.06200  [pdf, other

    cs.SD eess.AS

    End-to-end Triplet Loss based Emotion Embedding System for Speech Emotion Recognition

    Authors: Puneet Kumar, Sidharth Jain, Balasubramanian Raman, Partha Pratim Roy, Masakazu Iwamura

    Abstract: In this paper, an end-to-end neural embedding system based on triplet loss and residual learning has been proposed for speech emotion recognition. The proposed system learns the embeddings from the emotional information of the speech utterances. The learned embeddings are used to recognize the emotions portrayed by given speech samples of various lengths. The proposed system implements Residual Ne… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: Accepted in ICPR 2020

  18. arXiv:2009.02661  [pdf, other

    cs.LG cs.CY stat.ML

    Computational Models for Academic Performance Estimation

    Authors: Vipul Bansal, Himanshu Buckchash, Balasubramanian Raman

    Abstract: Evaluation of students' performance for the completion of courses has been a major problem for both students and faculties during the work-from-home period in this COVID pandemic situation. To this end, this paper presents an in-depth analysis of deep learning and machine learning approaches for the formulation of an automated students' performance estimation system that works on partially availab… ▽ More

    Submitted 6 September, 2020; originally announced September 2020.

    Comments: 10 pages, 3 figures

  19. arXiv:2007.05764  [pdf, ps, other

    eess.AS cs.MM

    Fast Griffin Lim based Waveform Generation Strategy for Text-to-Speech Synthesis

    Authors: Ankit Sharma, Puneet Kumar, Vikas Maddukuri, Nagasai Madamshettib, Kishore KG, Sahit Sai Sriram Kavurub, Balasubramanian Raman, Partha Pratim Roy

    Abstract: The performance of text-to-speech (TTS) systems heavily depends on spectrogram to waveform generation, also known as the speech reconstruction phase. The time required for the same is known as synthesis delay. In this paper, an approach to reduce speech synthesis delay has been proposed. It aims to enhance the TTS systems for real-time applications such as digital assistants, mobile phones, embedd… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

    Comments: Accepted for publication in Springer Multimedia Tools and Applications Journal

  20. arXiv:1909.12948  [pdf, ps, other

    cs.CV eess.IV

    Video Skimming: Taxonomy and Comprehensive Survey

    Authors: Vivekraj V. K., Debashis Sen, Balasubramanian Raman

    Abstract: Video skimming, also known as dynamic video summarization, generates a temporally abridged version of a given video. Skimming can be achieved by identifying significant components either in uni-modal or multi-modal features extracted from the video. Being dynamic in nature, video skimming, through temporal connectivity, allows better understanding of the video from its summary. Having this obvious… ▽ More

    Submitted 21 September, 2019; originally announced September 2019.

    Journal ref: ACM Computing Surveys (CSUR), Volume 52, Issue 5, 2019

  21. arXiv:1811.00936  [pdf, other

    cs.SD eess.AS

    Acoustic Features Fusion using Attentive Multi-channel Deep Architecture

    Authors: Gaurav Bhatt, Akshita Gupta, Aditya Arora, Balasubramanian Raman

    Abstract: In this paper, we present a novel deep fusion architecture for audio classification tasks. The multi-channel model presented is formed using deep convolution layers where different acoustic features are passed through each channel. To enable dissemination of information across the channels, we introduce attention feature maps that aid in the alignment of frames. The output of each channel is merge… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted in CHiME'18 (Interspeech Workshop)

  22. arXiv:1801.06792  [pdf, other

    cs.CL

    Attentive Recurrent Tensor Model for Community Question Answering

    Authors: Gaurav Bhatt, Shivam Sharma, Balasubramanian Raman

    Abstract: A major challenge to the problem of community question answering is the lexical and semantic gap between the sentence representations. Some solutions to minimize this gap includes the introduction of extra parameters to deep models or augmenting the external handcrafted features. In this paper, we propose a novel attentive recurrent tensor network for solving the lexical and semantic gap in commun… ▽ More

    Submitted 21 January, 2018; originally announced January 2018.

  23. arXiv:1712.03935  [pdf, other

    cs.CL

    On the Benefit of Combining Neural, Statistical and External Features for Fake News Identification

    Authors: Gaurav Bhatt, Aman Sharma, Shivam Sharma, Ankush Nagpal, Balasubramanian Raman, Ankush Mittal

    Abstract: Identifying the veracity of a news article is an interesting problem while automating this process can be a challenging task. Detection of a news article as fake is still an open question as it is contingent on many factors which the current state-of-the-art models fail to incorporate. In this paper, we explore a subtask to fake news identification, and that is stance detection. Given a news artic… ▽ More

    Submitted 11 December, 2017; originally announced December 2017.

    Comments: Source code available at - www.deeplearn-ai.com

  24. arXiv:1711.00003  [pdf, other

    cs.CV

    Common Representation Learning Using Step-based Correlation Multi-Modal CNN

    Authors: Gaurav Bhatt, Piyush Jha, Balasubramanian Raman

    Abstract: Deep learning techniques have been successfully used in learning a common representation for multi-view data, wherein the different modalities are projected onto a common subspace. In a broader perspective, the techniques used to investigate common representation learning falls under the categories of canonical correlation-based approaches and autoencoder based approaches. In this paper, we invest… ▽ More

    Submitted 31 October, 2017; originally announced November 2017.

    Comments: Accepted in Asian Conference of Pattern Recognition (ACPR-2017)

  25. arXiv:0906.4415  [pdf

    cs.CR cs.IT cs.MM

    Robust Watermarking in Multiresolution Walsh-Hadamard Transform

    Authors: Gaurav Bhatnagar, Balasubramanian Raman

    Abstract: In this paper, a newer version of Walsh-Hadamard Transform namely multiresolution Walsh-Hadamard Transform (MR-WHT) is proposed for images. Further, a robust watermarking scheme is proposed for copyright protection using MRWHT and singular value decomposition. The core idea of the proposed scheme is to decompose an image using MR-WHT and then middle singular values of high frequency sub-band at… ▽ More

    Submitted 24 June, 2009; originally announced June 2009.

    Comments: 6 Pages, 16 Figure, 2 Tables

    Journal ref: Proc. of IEEE International Advance Computing Conference (IACC 2009), Patiala, India, 6-7 March 2009, pp. 894-899