-
CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities
Authors:
Pranav Poudel,
Prashant Shrestha,
Sanskar Amgain,
Yash Raj Shrestha,
Prashnna Gyawali,
Binod Bhattarai
Abstract:
Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centr…
▽ More
Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centralizing sensitive data, thus maintaining privacy and security. Yet, research in multimodal federated learning, particularly in scenarios with missing modalities a common issue in healthcare datasets remains scarce, highlighting a critical area for future exploration. Toward this, we propose a novel method for multimodal federated learning with missing modalities. Our contribution lies in a novel cross-modal data augmentation by retrieval, leveraging the small publicly available dataset to fill the missing modalities in the clients. Our method learns the parameters in a federated manner, ensuring privacy protection and improving performance in multiple challenging multimodal benchmarks in the medical domain, surpassing several competitive baselines. Code Available: https://github.com/bhattarailab/CAR-MFL
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification
Authors:
Bidur Khanal,
Prashant Shrestha,
Sanskar Amgain,
Bishesh Khanal,
Binod Bhattarai,
Cristian A. Linte
Abstract:
Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of…
▽ More
Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image classification. In this paper, we investigate the architectural robustness of ViT against label noise and compare it to that of CNNs. We use two medical image classification datasets -- COVID-DU-Ex, and NCT-CRC-HE-100K -- both corrupted by injecting label noise at various rates. Additionally, we show that pretraining is crucial for ensuring ViT's improved robustness against label noise in supervised training.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Investigation of Federated Learning Algorithms for Retinal Optical Coherence Tomography Image Classification with Statistical Heterogeneity
Authors:
Sanskar Amgain,
Prashant Shrestha,
Sophia Bano,
Ignacio del Valle Torres,
Michael Cunniffe,
Victor Hernandez,
Phil Beales,
Binod Bhattarai
Abstract:
Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely.
Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy conc…
▽ More
Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely.
Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy concerns associated with centralizing data. We partitioned a publicly available OCT dataset across multiple clients under IID and Non-IID settings and conducted local training on the subsets for each client. We evaluated two federated learning methods, FedAvg and FedProx for these settings.
Results: Our experiments on the dataset suggest that under IID settings, both methods perform on par with training on a central data pool. However, the performance of both algorithms declines as we increase the statistical heterogeneity across the client data, while FedProx consistently performs better than FedAvg in the increased heterogeneity settings.
Conclusion: Despite the effectiveness of federated learning in the utilization of private data across multiple medical institutions, the large number of clients and heterogeneous distribution of labels deteriorate the performance of both algorithms. Notably, FedProx appears to be more robust to the increased heterogeneity.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval
Authors:
Love Panta,
Prashant Shrestha,
Brabeem Sapkota,
Amrita Bhattarai,
Suresh Manandhar,
Anand Kumar Sah
Abstract:
Video moment retrieval is a challenging task requiring fine-grained interactions between video and text modalities. Recent work in image-text pretraining has demonstrated that most existing pretrained models suffer from information asymmetry due to the difference in length between visual and textual sequences. We question whether the same problem also exists in the video-text domain with an auxili…
▽ More
Video moment retrieval is a challenging task requiring fine-grained interactions between video and text modalities. Recent work in image-text pretraining has demonstrated that most existing pretrained models suffer from information asymmetry due to the difference in length between visual and textual sequences. We question whether the same problem also exists in the video-text domain with an auxiliary need to preserve both spatial and temporal information. Thus, we evaluate a recently proposed solution involving the addition of an asymmetric co-attention network for video grounding tasks. Additionally, we incorporate momentum contrastive loss for robust, discriminative representation learning in both modalities. We note that the integration of these supplementary modules yields better performance compared to state-of-the-art models on the TACoS dataset and comparable results on ActivityNet Captions, all while utilizing significantly fewer parameters with respect to baseline.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Medical Vision Language Pretraining: A survey
Authors:
Prashant Shrestha,
Sanskar Amgain,
Bidur Khanal,
Cristian A. Linte,
Binod Bhattarai
Abstract:
Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations. Such pretrained models have the potential to enhance multiple downstream medica…
▽ More
Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations. Such pretrained models have the potential to enhance multiple downstream medical tasks simultaneously, reducing the dependency on labeled data. However, despite recent progress and its potential, there is no such comprehensive survey paper that has explored the various aspects and advancements in medical VLP. In this paper, we specifically review existing works through the lens of different pretraining objectives, architectures, downstream evaluation tasks, and datasets utilized for pretraining and downstream tasks. Subsequently, we delve into current challenges in medical VLP, discussing existing and potential solutions, and conclude by highlighting future directions. To the best of our knowledge, this is the first survey focused on medical VLP.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
X-Ray to CT Rigid Registration Using Scene Coordinate Regression
Authors:
Pragyan Shrestha,
Chun Xie,
Hidehiko Shishido,
Yuichi Yoshii,
Itary Kitahara
Abstract:
Intraoperative fluoroscopy is a frequently used modality in minimally invasive orthopedic surgeries. Aligning the intraoperatively acquired X-ray image with the preoperatively acquired 3D model of a computed tomography (CT) scan reduces the mental burden on surgeons induced by the overlapping anatomical structures in the acquired images. This paper proposes a fully automatic registration method th…
▽ More
Intraoperative fluoroscopy is a frequently used modality in minimally invasive orthopedic surgeries. Aligning the intraoperatively acquired X-ray image with the preoperatively acquired 3D model of a computed tomography (CT) scan reduces the mental burden on surgeons induced by the overlapping anatomical structures in the acquired images. This paper proposes a fully automatic registration method that is robust to extreme viewpoints and does not require manual annotation of landmark points during training. It is based on a fully convolutional neural network (CNN) that regresses the scene coordinates for a given X-ray image. The scene coordinates are defined as the intersection of the back-projected rays from a pixel toward the 3D model. Training data for a patient-specific model were generated through a realistic simulation of a C-arm device using preoperative CT scans. In contrast, intraoperative registration was achieved by solving the perspective-n-point (PnP) problem with a random sample and consensus (RANSAC) algorithm. Experiments were conducted using a pelvic CT dataset that included several real fluoroscopic (X-ray) images with ground truth annotations. The proposed method achieved an average mean target registration error (mTRE) of 3.79 mm in the 50th percentile of the simulated test dataset and projected mTRE of 9.65 mm in the 50th percentile of real fluoroscopic images for pelvis registration.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
NewsPanda: Media Monitoring for Timely Conservation Action
Authors:
Sedrick Scott Keh,
Zheyuan Ryan Shi,
David J. Patterson,
Nirmal Bhagabati,
Karun Dewan,
Areendran Gopala,
Pablo Izquierdo,
Debojyoti Mallick,
Ambika Sharma,
Pooja Shrestha,
Fei Fang
Abstract:
Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes onlin…
▽ More
Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes online articles related to environmental conservation and infrastructure construction. We fine-tune a BERT-based model using active learning methods and noise correction algorithms to identify articles that are relevant to conservation and infrastructure construction. For the identified articles, we perform further analysis, extracting keywords and finding potentially related sources. NewsPanda has been successfully deployed by the World Wide Fund for Nature teams in the UK, India, and Nepal since February 2022. It currently monitors over 80,000 websites and 1,074 conservation sites across India and Nepal, saving more than 30 hours of human efforts weekly. We have now scaled it up to cover 60,000 conservation sites globally.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
-
MobileCodec: Neural Inter-frame Video Compression on Mobile Devices
Authors:
Hoang Le,
Liang Zhang,
Amir Said,
Guillaume Sautiere,
Yang Yang,
Pranav Shrestha,
Fei Yin,
Reza Pourreza,
Auke Wiggers
Abstract:
Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time opera…
▽ More
Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time operation on a mobile device powered by Snapdragon technology. We show the first-ever inter-frame neural video decoder running on a commercial mobile phone, decoding high-definition videos in real-time while maintaining a low bitrate and high visual quality.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Gummy Browsers: Targeted Browser Spoofing against State-of-the-Art Fingerprinting Techniques
Authors:
Zengrui Liu,
Prakash Shrestha,
Nitesh Saxena
Abstract:
We present a simple yet potentially devastating and hard-to-detect threat, called Gummy Browsers, whereby the browser fingerprinting information can be collected and spoofed without the victim's awareness, thereby compromising the privacy and security of any application that uses browser fingerprinting. The idea is that attacker A first makes the user U connect to his website (or to a well-known s…
▽ More
We present a simple yet potentially devastating and hard-to-detect threat, called Gummy Browsers, whereby the browser fingerprinting information can be collected and spoofed without the victim's awareness, thereby compromising the privacy and security of any application that uses browser fingerprinting. The idea is that attacker A first makes the user U connect to his website (or to a well-known site the attacker controls) and transparently collects the information from U that is used for fingerprinting purposes. Then, A orchestrates a browser on his own machine to replicate and transmit the same fingerprinting information when connecting to W, fooling W to think that U is the one requesting the service rather than A. This will allow the attacker to profile U and compromise U's privacy. We design and implement the Gummy Browsers attack using three orchestration methods based on script injection, browser settings and debugging tools, and script modification, that can successfully spoof a wide variety of fingerprinting features to mimic many different browsers (including mobile browsers and the Tor browser). We then evaluate the attack against two state-of-the-art browser fingerprinting systems, FPStalker and Panopticlick. Our results show that A can accurately match his own manipulated browser fingerprint with that of any targeted victim user U's fingerprint for a long period of time, without significantly affecting the tracking of U and when only collecting U's fingerprinting information only once. The TPR (true positive rate) for the tracking of the benign user in the presence of the attack is larger than 0.9 in most cases. The FPR (false positive rate) for the tracking of the attacker is also high, larger than 0.9 in all cases. We also argue that the attack can remain completely oblivious to the user and the website, thus making it extremely difficult to thwart in practice.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
People Still Care About Facts: Twitter Users Engage More with Factual Discourse than Misinformation--A Comparison Between COVID and General Narratives on Twitter
Authors:
Mirela Silva,
Fabrício Ceschin,
Prakash Shrestha,
Christopher Brant,
Shlok Gilda,
Juliana Fernandes,
Catia S. Silva,
André Grégio,
Daniela Oliveira,
Luiz Giovanini
Abstract:
Misinformation entails the dissemination of falsehoods that leads to the slow fracturing of society via decreased trust in democratic processes, institutions, and science. The public has grown aware of the role of social media as a superspreader of untrustworthy information, where even pandemics have not been immune. In this paper, we focus on COVID-19 misinformation and examine a subset of 2.1M t…
▽ More
Misinformation entails the dissemination of falsehoods that leads to the slow fracturing of society via decreased trust in democratic processes, institutions, and science. The public has grown aware of the role of social media as a superspreader of untrustworthy information, where even pandemics have not been immune. In this paper, we focus on COVID-19 misinformation and examine a subset of 2.1M tweets to understand misinformation as a function of engagement, tweet content (COVID-19- vs. non-COVID-19-related), and veracity (misleading or factual). Using correlation analysis, we show the most relevant feature subsets among over 126 features that most heavily correlate with misinformation or facts. We found that (i) factual tweets, regardless of whether COVID-related, were more engaging than misinformation tweets; and (ii) features that most heavily correlated with engagement varied depending on the veracity and content of the tweet.
△ Less
Submitted 9 September, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
CrossCheck: Rapid, Reproducible, and Interpretable Model Evaluation
Authors:
Dustin Arendt,
Zhuanyi Huang,
Prasha Shrestha,
Ellyn Ayton,
Maria Glenski,
Svitlana Volkova
Abstract:
Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We…
▽ More
Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We then present three use cases (named entity recognition, reading comprehension, and clickbait detection) that show the benefits of using the tool for model evaluation. CrossCheck allows data scientists to make informed decisions to choose between multiple models, identify when the models are correct and for which examples, investigate whether the models are making the same mistakes as humans, evaluate models' generalizability and highlight models' limitations, strengths and weaknesses. Furthermore, CrossCheck is implemented as a Jupyter widget, which allows rapid and convenient integration into data scientists' model development workflows.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures
Authors:
Iddo Drori,
Isht Dwivedi,
Pranav Shrestha,
Jeffrey Wan,
Yueqi Wang,
Yunchu He,
Anthony Mazza,
Hugh Krogh-Freeman,
Dimitri Leggas,
Kendal Sandridge,
Linyong Nan,
Kaveri Thakoor,
Chinmay Joshi,
Sonam Goenka,
Chen Keasar,
Itsik Pe'er
Abstract:
We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the applicatio…
▽ More
We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the application domain. We focus on the Q8 resolution of secondary structure, an active area for continuously improving methods. We use an ensemble of strong predictors to achieve accuracy of 70.7% (on the CB513 test set using the CB6133filtered training set). These results are statistically indistinguishable from those of the top existing predictors. In the spirit of reproducible research we make our data, models and code available, aiming to set a gold standard for purity of training and testing sets. Such good practices lower entry barriers to this domain and facilitate reproducible, extendable research.
△ Less
Submitted 17 November, 2018;
originally announced November 2018.
-
Secure Ad-hoc Routing Scheme
Authors:
Anish Prasad Shrestha,
Kyung Sup Kwak
Abstract:
This paper investigates on the problem of combining routing scheme and physical layer security in multihop wireless networks with cooperative diversity. We propose an ad-hoc natured hop-by-hop best secure relay selection in a multihop network with several relays and an eavesdropper at each hop which provides a safe routing scheme to transmit confidential message from transmitter to legitimate rece…
▽ More
This paper investigates on the problem of combining routing scheme and physical layer security in multihop wireless networks with cooperative diversity. We propose an ad-hoc natured hop-by-hop best secure relay selection in a multihop network with several relays and an eavesdropper at each hop which provides a safe routing scheme to transmit confidential message from transmitter to legitimate receiver. The selection is based on the instantaneous channel conditions of relay and eavesdropper at each hop. A theoretical analysis is performed to derive new closed form expressions for probability of non-zero secrecy capacity along with the exact end to end secrecy outage probability at a normalized secrecy rate. Furthermore, we provide the asymptotic expression to gain insights on the diversity gain.
△ Less
Submitted 19 July, 2018; v1 submitted 8 June, 2015;
originally announced June 2015.
-
Pitfalls in Designing Zero-Effort Deauthentication: Opportunistic Human Observation Attacks
Authors:
O. Huhta,
P. Shrestha,
S. Udar,
M. Juuti,
N. Saxena,
N. Asokan
Abstract:
Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEB…
▽ More
Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEBRA, which provides an interesting and a useful solution to a difficult problem as demonstrated in the original paper. We identify a subtle incorrect assumption in its adversary model that leads to a fundamental design flaw. We exploit this to break the scheme with a class of attacks that are much easier for a human to perform in a realistic adversary model, compared to the naïve attacks studied in the ZEBRA paper. For example, one of our main attacks, where the human attacker has to opportunistically mimic only the victim's keyboard typing activity at a nearby terminal, is significantly more successful compared to the naïve attack that requires mimicking keyboard and mouse activities as well as keyboard-mouse movements. Further, by understanding the design flaws in ZEBRA as cases of tainted input, we show that we can draw on well-understood design principles to improve ZEBRA's security.
△ Less
Submitted 14 February, 2016; v1 submitted 21 May, 2015;
originally announced May 2015.
-
On Maximal Ratio Diversity with Weighting Errors for Physical Layer Security
Authors:
Anish Prasad Shrestha,
Kyung Sup Kwak
Abstract:
In this letter, we introduce the performance of maximal ratio combining (MRC) with weighting errors for physical layer security. We assume both legitimate user and eavesdropper each equipped with multiple antennas employ non ideal MRC. The non ideal MRC is designed in terms of power correlation between the estimated and actual fadings. We derive new closedform and generalized expressions for secre…
▽ More
In this letter, we introduce the performance of maximal ratio combining (MRC) with weighting errors for physical layer security. We assume both legitimate user and eavesdropper each equipped with multiple antennas employ non ideal MRC. The non ideal MRC is designed in terms of power correlation between the estimated and actual fadings. We derive new closedform and generalized expressions for secrecy outage probability. Next, we investigate the asymptotic behavior of secrecy outage probability for high signal-to-noise ratio in the main channel between legitimate user and transmitter. The asymptotic analysis provides the insights about actual diversity provided by MRC with weighting errors. We substantiate our claims with the analytic results and numerical evaluations.
△ Less
Submitted 9 November, 2013; v1 submitted 6 November, 2013;
originally announced November 2013.