Search | arXiv e-print repository

CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities

Authors: Pranav Poudel, Prashant Shrestha, Sanskar Amgain, Yash Raj Shrestha, Prashnna Gyawali, Binod Bhattarai

Abstract: Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centr… ▽ More Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centralizing sensitive data, thus maintaining privacy and security. Yet, research in multimodal federated learning, particularly in scenarios with missing modalities a common issue in healthcare datasets remains scarce, highlighting a critical area for future exploration. Toward this, we propose a novel method for multimodal federated learning with missing modalities. Our contribution lies in a novel cross-modal data augmentation by retrieval, leveraging the small publicly available dataset to fill the missing modalities in the clients. Our method learns the parameters in a federated manner, ensuring privacy protection and improving performance in multiple challenging multimodal benchmarks in the medical domain, surpassing several competitive baselines. Code Available: https://github.com/bhattarailab/CAR-MFL △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted at MICCAI 2024

arXiv:2402.16734 [pdf, other]

Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

Authors: Bidur Khanal, Prashant Shrestha, Sanskar Amgain, Bishesh Khanal, Binod Bhattarai, Cristian A. Linte

Abstract: Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of… ▽ More Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image classification. In this paper, we investigate the architectural robustness of ViT against label noise and compare it to that of CNNs. We use two medical image classification datasets -- COVID-DU-Ex, and NCT-CRC-HE-100K -- both corrupted by injecting label noise at various rates. Additionally, we show that pretraining is crucial for ensuring ViT's improved robustness against label noise in supervised training. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.10035 [pdf, other]

Investigation of Federated Learning Algorithms for Retinal Optical Coherence Tomography Image Classification with Statistical Heterogeneity

Authors: Sanskar Amgain, Prashant Shrestha, Sophia Bano, Ignacio del Valle Torres, Michael Cunniffe, Victor Hernandez, Phil Beales, Binod Bhattarai

Abstract: Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely. Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy conc… ▽ More Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely. Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy concerns associated with centralizing data. We partitioned a publicly available OCT dataset across multiple clients under IID and Non-IID settings and conducted local training on the subsets for each client. We evaluated two federated learning methods, FedAvg and FedProx for these settings. Results: Our experiments on the dataset suggest that under IID settings, both methods perform on par with training on a central data pool. However, the performance of both algorithms declines as we increase the statistical heterogeneity across the client data, while FedProx consistently performs better than FedAvg in the increased heterogeneity settings. Conclusion: Despite the effectiveness of federated learning in the utilization of private data across multiple medical institutions, the large number of clients and heterogeneous distribution of labels deteriorate the performance of both algorithms. Notably, FedProx appears to be more robust to the increased heterogeneity. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2312.07435 [pdf, other]

Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval

Authors: Love Panta, Prashant Shrestha, Brabeem Sapkota, Amrita Bhattarai, Suresh Manandhar, Anand Kumar Sah

Abstract: Video moment retrieval is a challenging task requiring fine-grained interactions between video and text modalities. Recent work in image-text pretraining has demonstrated that most existing pretrained models suffer from information asymmetry due to the difference in length between visual and textual sequences. We question whether the same problem also exists in the video-text domain with an auxili… ▽ More Video moment retrieval is a challenging task requiring fine-grained interactions between video and text modalities. Recent work in image-text pretraining has demonstrated that most existing pretrained models suffer from information asymmetry due to the difference in length between visual and textual sequences. We question whether the same problem also exists in the video-text domain with an auxiliary need to preserve both spatial and temporal information. Thus, we evaluate a recently proposed solution involving the addition of an asymmetric co-attention network for video grounding tasks. Additionally, we incorporate momentum contrastive loss for robust, discriminative representation learning in both modalities. We note that the integration of these supplementary modules yields better performance compared to state-of-the-art models on the TACoS dataset and comparable results on ActivityNet Captions, all while utilizing significantly fewer parameters with respect to baseline. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06224 [pdf, other]

Medical Vision Language Pretraining: A survey

Authors: Prashant Shrestha, Sanskar Amgain, Bidur Khanal, Cristian A. Linte, Binod Bhattarai

Abstract: Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations. Such pretrained models have the potential to enhance multiple downstream medica… ▽ More Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations. Such pretrained models have the potential to enhance multiple downstream medical tasks simultaneously, reducing the dependency on labeled data. However, despite recent progress and its potential, there is no such comprehensive survey paper that has explored the various aspects and advancements in medical VLP. In this paper, we specifically review existing works through the lens of different pretraining objectives, architectures, downstream evaluation tasks, and datasets utilized for pretraining and downstream tasks. Subsequently, we delve into current challenges in medical VLP, discussing existing and potential solutions, and conclude by highlighting future directions. To the best of our knowledge, this is the first survey focused on medical VLP. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.15087 [pdf, other]

doi 10.1007/978-3-031-43999-5_74

X-Ray to CT Rigid Registration Using Scene Coordinate Regression

Authors: Pragyan Shrestha, Chun Xie, Hidehiko Shishido, Yuichi Yoshii, Itary Kitahara

Abstract: Intraoperative fluoroscopy is a frequently used modality in minimally invasive orthopedic surgeries. Aligning the intraoperatively acquired X-ray image with the preoperatively acquired 3D model of a computed tomography (CT) scan reduces the mental burden on surgeons induced by the overlapping anatomical structures in the acquired images. This paper proposes a fully automatic registration method th… ▽ More Intraoperative fluoroscopy is a frequently used modality in minimally invasive orthopedic surgeries. Aligning the intraoperatively acquired X-ray image with the preoperatively acquired 3D model of a computed tomography (CT) scan reduces the mental burden on surgeons induced by the overlapping anatomical structures in the acquired images. This paper proposes a fully automatic registration method that is robust to extreme viewpoints and does not require manual annotation of landmark points during training. It is based on a fully convolutional neural network (CNN) that regresses the scene coordinates for a given X-ray image. The scene coordinates are defined as the intersection of the back-projected rays from a pixel toward the 3D model. Training data for a patient-specific model were generated through a realistic simulation of a C-arm device using preoperative CT scans. In contrast, intraoperative registration was achieved by solving the perspective-n-point (PnP) problem with a random sample and consensus (RANSAC) algorithm. Experiments were conducted using a pelvic CT dataset that included several real fluoroscopic (X-ray) images with ground truth annotations. The proposed method achieved an average mean target registration error (mTRE) of 3.79 mm in the 50th percentile of the simulated test dataset and projected mTRE of 9.65 mm in the 50th percentile of real fluoroscopic images for pelvis registration. △ Less

Submitted 25 November, 2023; originally announced November 2023.

Journal ref: Medical Image Computing and Computer Assisted Intervention MICCAI 2023. Lecture Notes in Computer Science, vol 14229

arXiv:2305.01503 [pdf, other]

NewsPanda: Media Monitoring for Timely Conservation Action

Authors: Sedrick Scott Keh, Zheyuan Ryan Shi, David J. Patterson, Nirmal Bhagabati, Karun Dewan, Areendran Gopala, Pablo Izquierdo, Debojyoti Mallick, Ambika Sharma, Pooja Shrestha, Fei Fang

Abstract: Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes onlin… ▽ More Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes online articles related to environmental conservation and infrastructure construction. We fine-tune a BERT-based model using active learning methods and noise correction algorithms to identify articles that are relevant to conservation and infrastructure construction. For the identified articles, we perform further analysis, extracting keywords and finding potentially related sources. NewsPanda has been successfully deployed by the World Wide Fund for Nature teams in the UK, India, and Nepal since February 2022. It currently monitors over 80,000 websites and 1,074 conservation sites across India and Nepal, saving more than 30 hours of human efforts weekly. We have now scaled it up to cover 60,000 conservation sites globally. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: Accepted to IAAI-23: 35th Annual Conference on Innovative Applications of Artificial Intelligence. Winner of IAAI Deployed Application Award. Code at https://github.com/NewsPanda-WWF-CMU/weekly-pipeline

arXiv:2207.08338 [pdf, other]

MobileCodec: Neural Inter-frame Video Compression on Mobile Devices

Authors: Hoang Le, Liang Zhang, Amir Said, Guillaume Sautiere, Yang Yang, Pranav Shrestha, Fei Yin, Reza Pourreza, Auke Wiggers

Abstract: Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time opera… ▽ More Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time operation on a mobile device powered by Snapdragon technology. We show the first-ever inter-frame neural video decoder running on a commercial mobile phone, decoding high-definition videos in real-time while maintaining a low bitrate and high visual quality. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: ACM MMSys 2022

arXiv:2110.10129 [pdf, other]

Gummy Browsers: Targeted Browser Spoofing against State-of-the-Art Fingerprinting Techniques

Authors: Zengrui Liu, Prakash Shrestha, Nitesh Saxena

Abstract: We present a simple yet potentially devastating and hard-to-detect threat, called Gummy Browsers, whereby the browser fingerprinting information can be collected and spoofed without the victim's awareness, thereby compromising the privacy and security of any application that uses browser fingerprinting. The idea is that attacker A first makes the user U connect to his website (or to a well-known s… ▽ More We present a simple yet potentially devastating and hard-to-detect threat, called Gummy Browsers, whereby the browser fingerprinting information can be collected and spoofed without the victim's awareness, thereby compromising the privacy and security of any application that uses browser fingerprinting. The idea is that attacker A first makes the user U connect to his website (or to a well-known site the attacker controls) and transparently collects the information from U that is used for fingerprinting purposes. Then, A orchestrates a browser on his own machine to replicate and transmit the same fingerprinting information when connecting to W, fooling W to think that U is the one requesting the service rather than A. This will allow the attacker to profile U and compromise U's privacy. We design and implement the Gummy Browsers attack using three orchestration methods based on script injection, browser settings and debugging tools, and script modification, that can successfully spoof a wide variety of fingerprinting features to mimic many different browsers (including mobile browsers and the Tor browser). We then evaluate the attack against two state-of-the-art browser fingerprinting systems, FPStalker and Panopticlick. Our results show that A can accurately match his own manipulated browser fingerprint with that of any targeted victim user U's fingerprint for a long period of time, without significantly affecting the tracking of U and when only collecting U's fingerprinting information only once. The TPR (true positive rate) for the tracking of the benign user in the presence of the attack is larger than 0.9 in most cases. The FPR (false positive rate) for the tracking of the attacker is also high, larger than 0.9 in all cases. We also argue that the attack can remain completely oblivious to the user and the website, thus making it extremely difficult to thwart in practice. △ Less

Submitted 19 October, 2021; originally announced October 2021.

arXiv:2012.02164 [pdf, other]

People Still Care About Facts: Twitter Users Engage More with Factual Discourse than Misinformation--A Comparison Between COVID and General Narratives on Twitter

Authors: Mirela Silva, Fabrício Ceschin, Prakash Shrestha, Christopher Brant, Shlok Gilda, Juliana Fernandes, Catia S. Silva, André Grégio, Daniela Oliveira, Luiz Giovanini

Abstract: Misinformation entails the dissemination of falsehoods that leads to the slow fracturing of society via decreased trust in democratic processes, institutions, and science. The public has grown aware of the role of social media as a superspreader of untrustworthy information, where even pandemics have not been immune. In this paper, we focus on COVID-19 misinformation and examine a subset of 2.1M t… ▽ More Misinformation entails the dissemination of falsehoods that leads to the slow fracturing of society via decreased trust in democratic processes, institutions, and science. The public has grown aware of the role of social media as a superspreader of untrustworthy information, where even pandemics have not been immune. In this paper, we focus on COVID-19 misinformation and examine a subset of 2.1M tweets to understand misinformation as a function of engagement, tweet content (COVID-19- vs. non-COVID-19-related), and veracity (misleading or factual). Using correlation analysis, we show the most relevant feature subsets among over 126 features that most heavily correlate with misinformation or facts. We found that (i) factual tweets, regardless of whether COVID-related, were more engaging than misinformation tweets; and (ii) features that most heavily correlated with engagement varied depending on the veracity and content of the tweet. △ Less

Submitted 9 September, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: 22 pages

arXiv:2004.07993 [pdf, other]

CrossCheck: Rapid, Reproducible, and Interpretable Model Evaluation

Authors: Dustin Arendt, Zhuanyi Huang, Prasha Shrestha, Ellyn Ayton, Maria Glenski, Svitlana Volkova

Abstract: Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We… ▽ More Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We then present three use cases (named entity recognition, reading comprehension, and clickbait detection) that show the benefits of using the tool for model evaluation. CrossCheck allows data scientists to make informed decisions to choose between multiple models, identify when the models are correct and for which examples, investigate whether the models are making the same mistakes as humans, evaluate models' generalizability and highlight models' limitations, strengths and weaknesses. Furthermore, CrossCheck is implemented as a Jupyter widget, which allows rapid and convenient integration into data scientists' model development workflows. △ Less

Submitted 16 April, 2020; originally announced April 2020.

arXiv:1811.07143 [pdf, other]

High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures

Authors: Iddo Drori, Isht Dwivedi, Pranav Shrestha, Jeffrey Wan, Yueqi Wang, Yunchu He, Anthony Mazza, Hugh Krogh-Freeman, Dimitri Leggas, Kendal Sandridge, Linyong Nan, Kaveri Thakoor, Chinmay Joshi, Sonam Goenka, Chen Keasar, Itsik Pe'er

Abstract: We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the applicatio… ▽ More We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the application domain. We focus on the Q8 resolution of secondary structure, an active area for continuously improving methods. We use an ensemble of strong predictors to achieve accuracy of 70.7% (on the CB513 test set using the CB6133filtered training set). These results are statistically indistinguishable from those of the top existing predictors. In the spirit of reproducible research we make our data, models and code available, aiming to set a gold standard for purity of training and testing sets. Such good practices lower entry barriers to this domain and facilitate reproducible, extendable research. △ Less

Submitted 17 November, 2018; originally announced November 2018.

Comments: NIPS 2018 Workshop on Machine Learning for Molecules and Materials, 10 pages

arXiv:1506.02354

Secure Ad-hoc Routing Scheme

Authors: Anish Prasad Shrestha, Kyung Sup Kwak

Abstract: This paper investigates on the problem of combining routing scheme and physical layer security in multihop wireless networks with cooperative diversity. We propose an ad-hoc natured hop-by-hop best secure relay selection in a multihop network with several relays and an eavesdropper at each hop which provides a safe routing scheme to transmit confidential message from transmitter to legitimate rece… ▽ More This paper investigates on the problem of combining routing scheme and physical layer security in multihop wireless networks with cooperative diversity. We propose an ad-hoc natured hop-by-hop best secure relay selection in a multihop network with several relays and an eavesdropper at each hop which provides a safe routing scheme to transmit confidential message from transmitter to legitimate receiver. The selection is based on the instantaneous channel conditions of relay and eavesdropper at each hop. A theoretical analysis is performed to derive new closed form expressions for probability of non-zero secrecy capacity along with the exact end to end secrecy outage probability at a normalized secrecy rate. Furthermore, we provide the asymptotic expression to gain insights on the diversity gain. △ Less

Submitted 19 July, 2018; v1 submitted 8 June, 2015; originally announced June 2015.

Comments: There are some errors that needs to be fixed

arXiv:1505.05779 [pdf, ps, other]

Pitfalls in Designing Zero-Effort Deauthentication: Opportunistic Human Observation Attacks

Authors: O. Huhta, P. Shrestha, S. Udar, M. Juuti, N. Saxena, N. Asokan

Abstract: Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEB… ▽ More Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEBRA, which provides an interesting and a useful solution to a difficult problem as demonstrated in the original paper. We identify a subtle incorrect assumption in its adversary model that leads to a fundamental design flaw. We exploit this to break the scheme with a class of attacks that are much easier for a human to perform in a realistic adversary model, compared to the naïve attacks studied in the ZEBRA paper. For example, one of our main attacks, where the human attacker has to opportunistically mimic only the victim's keyboard typing activity at a nearby terminal, is significantly more successful compared to the naïve attack that requires mimicking keyboard and mouse activities as well as keyboard-mouse movements. Further, by understanding the design flaws in ZEBRA as cases of tainted input, we show that we can draw on well-understood design principles to improve ZEBRA's security. △ Less

Submitted 14 February, 2016; v1 submitted 21 May, 2015; originally announced May 2015.

ACM Class: K.6.5

arXiv:1311.1565

On Maximal Ratio Diversity with Weighting Errors for Physical Layer Security

Authors: Anish Prasad Shrestha, Kyung Sup Kwak

Abstract: In this letter, we introduce the performance of maximal ratio combining (MRC) with weighting errors for physical layer security. We assume both legitimate user and eavesdropper each equipped with multiple antennas employ non ideal MRC. The non ideal MRC is designed in terms of power correlation between the estimated and actual fadings. We derive new closedform and generalized expressions for secre… ▽ More In this letter, we introduce the performance of maximal ratio combining (MRC) with weighting errors for physical layer security. We assume both legitimate user and eavesdropper each equipped with multiple antennas employ non ideal MRC. The non ideal MRC is designed in terms of power correlation between the estimated and actual fadings. We derive new closedform and generalized expressions for secrecy outage probability. Next, we investigate the asymptotic behavior of secrecy outage probability for high signal-to-noise ratio in the main channel between legitimate user and transmitter. The asymptotic analysis provides the insights about actual diversity provided by MRC with weighting errors. We substantiate our claims with the analytic results and numerical evaluations. △ Less

Submitted 9 November, 2013; v1 submitted 6 November, 2013; originally announced November 2013.

Comments: It requires some major corrections in equations and numerical results

Showing 1–15 of 15 results for author: Shrestha, P