-
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Authors:
Sayna Ebrahimi,
Sercan O. Arik,
Tejas Nama,
Tomas Pfister
Abstract:
Multimodal Large Language Models (MLLMs) demonstrate remarkable image-language capabilities, but their widespread use faces challenges in cost-effective training and adaptation. Existing approaches often necessitate expensive language model retraining and limited adaptability. Additionally, the current focus on zero-shot performance improvements offers insufficient guidance for task-specific tunin…
▽ More
Multimodal Large Language Models (MLLMs) demonstrate remarkable image-language capabilities, but their widespread use faces challenges in cost-effective training and adaptation. Existing approaches often necessitate expensive language model retraining and limited adaptability. Additionally, the current focus on zero-shot performance improvements offers insufficient guidance for task-specific tuning. We propose CROME, an efficient vision-language instruction tuning framework. It features a novel gated cross-modal adapter that effectively combines visual and textual representations prior to input into a frozen LLM. This lightweight adapter, trained with minimal parameters, enables efficient cross-modal understanding. Notably, CROME demonstrates superior zero-shot performance on standard visual question answering and instruction-following benchmarks. Moreover, it yields fine-tuning with exceptional parameter efficiency, competing with task-specific specialist state-of-the-art methods. CROME demonstrates the potential of pre-LM alignment for building scalable, adaptable, and parameter-efficient multimodal models.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Enhancing Diversity in Multi-objective Feature Selection
Authors:
Sevil Zanjani Miyandoab,
Shahryar Rahnamayan,
Azam Asilian Bidgoli,
Sevda Ebrahimi,
Masoud Makrehchi
Abstract:
Feature selection plays a pivotal role in the data preprocessing and model-building pipeline, significantly enhancing model performance, interpretability, and resource efficiency across diverse domains. In population-based optimization methods, the generation of diverse individuals holds utmost importance for adequately exploring the problem landscape, particularly in highly multi-modal multi-obje…
▽ More
Feature selection plays a pivotal role in the data preprocessing and model-building pipeline, significantly enhancing model performance, interpretability, and resource efficiency across diverse domains. In population-based optimization methods, the generation of diverse individuals holds utmost importance for adequately exploring the problem landscape, particularly in highly multi-modal multi-objective optimization problems. Our study reveals that, in line with findings from several prior research papers, commonly employed crossover and mutation operations lack the capability to generate high-quality diverse individuals and tend to become confined to limited areas around various local optima. This paper introduces an augmentation to the diversity of the population in the well-established multi-objective scheme of the genetic algorithm, NSGA-II. This enhancement is achieved through two key components: the genuine initialization method and the substitution of the worst individuals with new randomly generated individuals as a re-initialization approach in each generation. The proposed multi-objective feature selection method undergoes testing on twelve real-world classification problems, with the number of features ranging from 2,400 to nearly 50,000. The results demonstrate that replacing the last front of the population with an equivalent number of new random individuals generated using the genuine initialization method and featuring a limited number of features substantially improves the population's quality and, consequently, enhances the performance of the multi-objective algorithm.
△ Less
Submitted 18 August, 2024; v1 submitted 25 July, 2024;
originally announced July 2024.
-
Sharif-STR at SemEval-2024 Task 1: Transformer as a Regression Model for Fine-Grained Scoring of Textual Semantic Relations
Authors:
Seyedeh Fatemeh Ebrahimi,
Karim Akhavan Azari,
Amirmasoud Iravani,
Hadi Alizadeh,
Zeinab Sadat Taghavi,
Hossein Sameti
Abstract:
Semantic Textual Relatedness holds significant relevance in Natural Language Processing, finding applications across various domains. Traditionally, approaches to STR have relied on knowledge-based and statistical methods. However, with the emergence of Large Language Models, there has been a paradigm shift, ushering in new methodologies. In this paper, we delve into the investigation of sentence-…
▽ More
Semantic Textual Relatedness holds significant relevance in Natural Language Processing, finding applications across various domains. Traditionally, approaches to STR have relied on knowledge-based and statistical methods. However, with the emergence of Large Language Models, there has been a paradigm shift, ushering in new methodologies. In this paper, we delve into the investigation of sentence-level STR within Track A (Supervised) by leveraging fine-tuning techniques on the RoBERTa transformer. Our study focuses on assessing the efficacy of this approach across different languages. Notably, our findings indicate promising advancements in STR performance, particularly in Latin languages. Specifically, our results demonstrate notable improvements in English, achieving a correlation of 0.82 and securing a commendable 19th rank. Similarly, in Spanish, we achieved a correlation of 0.67, securing the 15th position. However, our approach encounters challenges in languages like Arabic, where we observed a correlation of only 0.38, resulting in a 20th rank.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text
Authors:
Seyedeh Fatemeh Ebrahimi,
Karim Akhavan Azari,
Amirmasoud Iravani,
Arian Qazvini,
Pouya Sadeghi,
Zeinab Sadat Taghavi,
Hossein Sameti
Abstract:
Detecting Machine-Generated Text (MGT) has emerged as a significant area of study within Natural Language Processing. While language models generate text, they often leave discernible traces, which can be scrutinized using either traditional feature-based methods or more advanced neural language models. In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a pow…
▽ More
Detecting Machine-Generated Text (MGT) has emerged as a significant area of study within Natural Language Processing. While language models generate text, they often leave discernible traces, which can be scrutinized using either traditional feature-based methods or more advanced neural language models. In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a powerful neural architecture, to address MGT detection as a binary classification task. Focusing specifically on Subtask A (Monolingual-English) within the SemEval-2024 competition framework, our proposed system achieves an accuracy of 78.9% on the test dataset, positioning us at 57th among participants. Our study addresses this challenge while considering the limited hardware resources, resulting in a system that excels at identifying human-written texts but encounters challenges in accurately discerning MGTs.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Mitigating Object Hallucination via Data Augmented Contrastive Tuning
Authors:
Pritam Sarkar,
Sayna Ebrahimi,
Ali Etemad,
Ahmad Beirami,
Sercan Ö. Arık,
Tomas Pfister
Abstract:
Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to hallucinate factually inaccurate information. In this work, we address object hallucinations in MLLMs, where information is offered about an object that is not present in the model input. We introduce a contrastive tuning method that can be applied to a pretrained off-the-shelf MLLM for mitigating hallucinations wh…
▽ More
Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to hallucinate factually inaccurate information. In this work, we address object hallucinations in MLLMs, where information is offered about an object that is not present in the model input. We introduce a contrastive tuning method that can be applied to a pretrained off-the-shelf MLLM for mitigating hallucinations while preserving its general vision-language capabilities. For a given factual token, we create a hallucinated token through generative data augmentation by selectively altering the ground-truth information. The proposed contrastive tuning is applied at the token level to improve the relative likelihood of the factual token compared to the hallucinated one. Our thorough evaluation confirms the effectiveness of contrastive tuning in mitigating hallucination. Moreover, the proposed contrastive tuning is simple, fast, and requires minimal training with no additional overhead at inference.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Continual Learning of Large Language Models: A Comprehensive Survey
Authors:
Haizhou Shi,
Zihao Xu,
Hengyi Wang,
Weiyi Qin,
Wenyuan Wang,
Yibin Wang,
Zifeng Wang,
Sayna Ebrahimi,
Hao Wang
Abstract:
The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant…
▽ More
The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at https://github.com/Wang-ML-Lab/llm-continual-learning-survey.
△ Less
Submitted 29 June, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models
Authors:
Sana Ebrahimi,
Nima Shahbazi,
Abolfazl Asudeh
Abstract:
The extensive scope of large language models (LLMs) across various domains underscores the critical importance of responsibility in their application, beyond natural language processing. In particular, the randomized nature of LLMs, coupled with inherent biases and historical stereotypes in data, raises critical concerns regarding reliability and equity. Addressing these challenges are necessary b…
▽ More
The extensive scope of large language models (LLMs) across various domains underscores the critical importance of responsibility in their application, beyond natural language processing. In particular, the randomized nature of LLMs, coupled with inherent biases and historical stereotypes in data, raises critical concerns regarding reliability and equity. Addressing these challenges are necessary before using LLMs for applications with societal impact. Towards addressing this gap, we introduce REQUAL-LM, a novel method for finding reliable and equitable LLM outputs through aggregation. Specifically, we develop a Monte Carlo method based on repeated sampling to find a reliable output close to the mean of the underlying distribution of possible outputs. We formally define the terms such as reliability and bias, and design an equity-aware aggregation to minimize harmful bias while finding a highly reliable output. REQUAL-LM does not require specialized hardware, does not impose a significant computing load, and uses LLMs as a blackbox. This design choice enables seamless scalability alongside the rapid advancement of LLM technologies. Our system does not require retraining the LLMs, which makes it deployment ready and easy to adapt. Our comprehensive experiments using various tasks and datasets demonstrate that REQUAL- LM effectively mitigates bias and selects a more equitable response, specifically the outputs that properly represents minority groups.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs
Authors:
Sana Ebrahimi,
Kaiwen Chen,
Abolfazl Asudeh,
Gautam Das,
Nick Koudas
Abstract:
Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we intro…
▽ More
Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we introduce AXOLOTL, a novel post-processing framework, which operates agnostically across tasks and models, leveraging public APIs to interact with LLMs without direct access to internal parameters. Through a three-step process resembling zero-shot learning, AXOLOTL identifies biases, proposes resolutions, and guides the model to self-debias its outputs. This approach minimizes computational costs and preserves model performance, making AXOLOTL a promising tool for debiasing LLM outputs with broad applicability and ease of use.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry
Authors:
Shiva Ebrahimi,
Xuan Guo
Abstract:
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor…
▽ More
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DiaTrans.
△ Less
Submitted 26 June, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Comparing Methods for Creating a National Random Sample of Twitter Users
Authors:
Meysam Alizadeh,
Darya Zare,
Zeynab Samei,
Mohammadamin Alizadeh,
Mael Kubli,
Mohammadhadi Aliahmadi,
Sarvenaz Ebrahimi,
Fabrizio Gilardi
Abstract:
Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to collect a…
▽ More
Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to collect a random sample of Twitter users in the US: 1% Stream, Bounding Box, Location Query, and Language Query. Then, we compare the methods according to their tweet- and user-level metrics as well as their accuracy in estimating US population with and without using inclusion probabilities of various demographics. Our results show that the 1% Stream method performs differently than others in tweet- and user-level metrics, and best for the construction of a population representative sample. We discuss the conditions under which the 1% Stream method may not be suitable and suggest the Bounding Box method as the second-best method to use.
△ Less
Submitted 11 March, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
Authors:
James Enouen,
Hootan Nakhost,
Sayna Ebrahimi,
Sercan O Arik,
Yan Liu,
Tomas Pfister
Abstract:
Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developm…
▽ More
Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developments in the explainability of neural network models over the past decade. Among them, post-hoc explainability methods, especially Shapley values, have proven effective for interpreting deep learning models. However, there are major challenges in scaling up Shapley values for LLMs, particularly when dealing with long input contexts containing thousands of tokens and autoregressively generated output sequences. Furthermore, it is often unclear how to effectively utilize generated explanations to improve the performance of LLMs. In this paper, we introduce TextGenSHAP, an efficient post-hoc explanation method incorporating LM-specific techniques. We demonstrate that this leads to significant increases in speed compared to conventional Shapley value computations, reducing processing times from hours to minutes for token-level explanations, and to just seconds for document-level explanations. In addition, we demonstrate how real-time Shapley values can be utilized in two important scenarios, providing better understanding of long-document question answering by localizing important words and sentences; and improving existing document retrieval systems through enhancing the accuracy of selected passages and ultimately the final responses.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs
Authors:
Jiefeng Chen,
Jinsung Yoon,
Sayna Ebrahimi,
Sercan O Arik,
Tomas Pfister,
Somesh Jha
Abstract:
Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions wh…
▽ More
Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%.
△ Less
Submitted 11 November, 2023; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications
Authors:
Azim Akhtarshenas,
Mohammad Ali Vahedifar,
Navid Ayoobi,
Behrouz Maham,
Tohid Alizadeh,
Sina Ebrahimi,
David López-Pérez
Abstract:
Robust machine learning (ML) models can be developed by leveraging large volumes of data and distributing the computational tasks across numerous devices or servers. Federated learning (FL) is a technique in the realm of ML that facilitates this goal by utilizing cloud infrastructure to enable collaborative model training among a network of decentralized devices. Beyond distributing the computatio…
▽ More
Robust machine learning (ML) models can be developed by leveraging large volumes of data and distributing the computational tasks across numerous devices or servers. Federated learning (FL) is a technique in the realm of ML that facilitates this goal by utilizing cloud infrastructure to enable collaborative model training among a network of decentralized devices. Beyond distributing the computational load, FL targets the resolution of privacy issues and the reduction of communication costs simultaneously. To protect user privacy, FL requires users to send model updates rather than transmitting large quantities of raw and potentially confidential data. Specifically, individuals train ML models locally using their own data and then upload the results in the form of weights and gradients to the cloud for aggregation into the global model. This strategy is also advantageous in environments with limited bandwidth or high communication costs, as it prevents the transmission of large data volumes. With the increasing volume of data and rising privacy concerns, alongside the emergence of large-scale ML models like Large Language Models (LLMs), FL presents itself as a timely and relevant solution. It is therefore essential to review current FL algorithms to guide future research that meets the rapidly evolving ML demands. This survey provides a comprehensive analysis and comparison of the most recent FL algorithms, evaluating them on various fronts including mathematical frameworks, privacy protection, resource allocation, and applications. Beyond summarizing existing FL methods, this survey identifies potential gaps, open areas, and future challenges based on the performance reports and algorithms used in recent studies. This survey enables researchers to readily identify existing limitations in the FL field for further exploration.
△ Less
Submitted 25 May, 2024; v1 submitted 8 October, 2023;
originally announced October 2023.
-
PAITS: Pretraining and Augmentation for Irregularly-Sampled Time Series
Authors:
Nicasia Beebe-Wang,
Sayna Ebrahimi,
Jinsung Yoon,
Sercan O. Arik,
Tomas Pfister
Abstract:
Real-world time series data that commonly reflect sequential human behavior are often uniquely irregularly sampled and sparse, with highly nonuniform sampling over time and entities. Yet, commonly-used pretraining and augmentation methods for time series are not specifically designed for such scenarios. In this paper, we present PAITS (Pretraining and Augmentation for Irregularly-sampled Time Seri…
▽ More
Real-world time series data that commonly reflect sequential human behavior are often uniquely irregularly sampled and sparse, with highly nonuniform sampling over time and entities. Yet, commonly-used pretraining and augmentation methods for time series are not specifically designed for such scenarios. In this paper, we present PAITS (Pretraining and Augmentation for Irregularly-sampled Time Series), a framework for identifying suitable pretraining strategies for sparse and irregularly sampled time series datasets. PAITS leverages a novel combination of NLP-inspired pretraining tasks and augmentations, and a random search to identify an effective strategy for a given dataset. We demonstrate that different datasets benefit from different pretraining choices. Compared with prior methods, our approach is better able to consistently improve pretraining across multiple datasets and domains. Our code is available at \url{https://github.com/google-research/google-research/tree/master/irregular_timeseries_pretraining}.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
[Experiments & Analysis] Evaluating the Feasibility of Sampling-Based Techniques for Training Multilayer Perceptrons
Authors:
Sana Ebrahimi,
Rishi Advani,
Abolfazl Asudeh
Abstract:
The training process of neural networks is known to be time-consuming, and having a deep architecture only aggravates the issue. This process consists mostly of matrix operations, among which matrix multiplication is the bottleneck. Several sampling-based techniques have been proposed for speeding up the training time of deep neural networks by approximating the matrix products. These techniques f…
▽ More
The training process of neural networks is known to be time-consuming, and having a deep architecture only aggravates the issue. This process consists mostly of matrix operations, among which matrix multiplication is the bottleneck. Several sampling-based techniques have been proposed for speeding up the training time of deep neural networks by approximating the matrix products. These techniques fall under two categories: (i) sampling a subset of nodes in every hidden layer as active at every iteration and (ii) sampling a subset of nodes from the previous layer to approximate the current layer's activations using the edges from the sampled nodes. In both cases, the matrix products are computed using only the selected samples. In this paper, we evaluate the feasibility of these approaches on CPU machines with limited computational resources. Making a connection between the two research directions as special cases of approximating matrix multiplications in the context of neural networks, we provide a negative theoretical analysis that shows feedforward approximation is an obstacle against scalability. We conduct comprehensive experimental evaluations that demonstrate the most pressing challenges and limitations associated with the studied approaches. We observe that the hashing-based node selection method is not scalable to a large number of layers, confirming our theoretical analysis. Finally, we identify directions for future research.
△ Less
Submitted 20 June, 2024; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Sensor Fault Detection and Compensation with Performance Prescription for Robotic Manipulators
Authors:
S. Mohammadreza Ebrahimi,
Farid Norouzi,
Hossein Dastres,
Reza Faieghi,
Mehdi Naderi,
Milad Malekzadeh
Abstract:
This paper focuses on sensor fault detection and compensation for robotic manipulators. The proposed method features a new adaptive observer and a new terminal sliding mode control law established on a second-order integral sliding surface. The method enables sensor fault detection without the need to know the bounds on fault value and/or its derivative. It also enables fast and fixed-time fault-t…
▽ More
This paper focuses on sensor fault detection and compensation for robotic manipulators. The proposed method features a new adaptive observer and a new terminal sliding mode control law established on a second-order integral sliding surface. The method enables sensor fault detection without the need to know the bounds on fault value and/or its derivative. It also enables fast and fixed-time fault-tolerant control whose performance can be prescribed beforehand by defining funnel bounds on the tracking error. The ultimate boundedness of the estimation errors for the proposed observer and the fixed-time stability of the control system are shown using Lyapunov stability analysis. The effectiveness of the proposed method is verified using numerical simulations on two different robotic manipulators, and the results are compared with existing methods. Our results demonstrate performance gains obtained by the proposed method compared to the existing results.
△ Less
Submitted 18 March, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
LANISTR: Multimodal Learning from Structured and Unstructured Data
Authors:
Sayna Ebrahimi,
Sercan O. Arik,
Yihe Dong,
Tomas Pfister
Abstract:
Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured…
▽ More
Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datasets, MIMIC-IV (from healthcare) and Amazon Product Review (from retail), LANISTR demonstrates remarkable improvements, 6.6\% (in AUROC) and 14\% (in accuracy) when fine-tuned with 0.1\% and 0.01\% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even with very high ratio of samples (35.7\% and 99.8\% respectively) not containing all modalities, underlining the robustness of LANISTR to practical missing modality challenge. Our code and models will be available at https://github.com/google-research/lanistr
△ Less
Submitted 24 April, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction
Authors:
Jiefeng Chen,
Jinsung Yoon,
Sayna Ebrahimi,
Sercan Arik,
Somesh Jha,
Tomas Pfister
Abstract:
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependenc…
▽ More
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST$\to$SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop.
△ Less
Submitted 29 February, 2024; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations
Authors:
Qingyao Sun,
Kevin Murphy,
Sayna Ebrahimi,
Alexander D'Amour
Abstract:
Changes in the data distribution at test time can have deleterious effects on the performance of predictive models $p(y|x)$. We consider situations where there are additional meta-data labels (such as group labels), denoted by $z$, that can account for such changes in the distribution. In particular, we assume that the prior distribution $p(y, z)$, which models the dependence between the class lab…
▽ More
Changes in the data distribution at test time can have deleterious effects on the performance of predictive models $p(y|x)$. We consider situations where there are additional meta-data labels (such as group labels), denoted by $z$, that can account for such changes in the distribution. In particular, we assume that the prior distribution $p(y, z)$, which models the dependence between the class label $y$ and the "nuisance" factors $z$, may change across domains, either due to a change in the correlation between these terms, or a change in one of their marginals. However, we assume that the generative model for features $p(x|y,z)$ is invariant across domains. We note that this corresponds to an expanded version of the widely used "label shift" assumption, where the labels now also include the nuisance factors $z$. Based on this observation, we propose a test-time label shift correction that adapts to changes in the joint distribution $p(y, z)$ using EM applied to unlabeled samples from the target domain distribution, $p_t(x)$. Importantly, we are able to avoid fitting a generative model $p(x|y, z)$, and merely need to reweight the outputs of a discriminative model $p_s(y, z|x)$ trained on the source distribution. We evaluate our method, which we call "Test-Time Label-Shift Adaptation" (TTLSA), on several standard image and text datasets, as well as the CheXpert chest X-ray dataset, and show that it improves performance over methods that target invariance to changes in the distribution, as well as baseline empirical risk minimization methods. Code for reproducing experiments is available at https://github.com/nalzok/test-time-label-shift .
△ Less
Submitted 28 November, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Maximizing Fair Content Spread via Edge Suggestion in Social Networks
Authors:
Ian P. Swift,
Sana Ebrahimi,
Azade Nova,
Abolfazl Asudeh
Abstract:
Content spread inequity is a potential unfairness issue in online social networks, disparately impacting minority groups. In this paper, we view friendship suggestion, a common feature in social network platforms, as an opportunity to achieve an equitable spread of content. In particular, we propose to suggest a subset of potential edges (currently not existing in the network but likely to be acce…
▽ More
Content spread inequity is a potential unfairness issue in online social networks, disparately impacting minority groups. In this paper, we view friendship suggestion, a common feature in social network platforms, as an opportunity to achieve an equitable spread of content. In particular, we propose to suggest a subset of potential edges (currently not existing in the network but likely to be accepted) that maximizes content spread while achieving fairness. Instead of re-engineering the existing systems, our proposal builds a fairness wrapper on top of the existing friendship suggestion components.
We prove the problem is NP-hard and inapproximable in polynomial time unless P = NP. Therefore, allowing relaxation of the fairness constraint, we propose an algorithm based on LP-relaxation and randomized rounding with fixed approximation ratios on fairness and content spread. We provide multiple optimizations, further improving the performance of our algorithm in practice. Besides, we propose a scalable algorithm that dynamically adds subsets of nodes, chosen via iterative sampling, and solves smaller problems corresponding to these nodes. Besides theoretical analysis, we conduct comprehensive experiments on real and synthetic data sets. Across different settings, our algorithms found solutions with nearzero unfairness while significantly increasing the content spread. Our scalable algorithm could process a graph with half a million nodes on a single machine, reducing the unfairness to around 0.0004 while lifting content spread by 43%.
△ Less
Submitted 20 December, 2022; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Test-Time Adaptation for Visual Document Understanding
Authors:
Sayna Ebrahimi,
Sercan O. Arik,
Tomas Pfister
Abstract:
For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document…
▽ More
For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively. Our benchmark datasets are available at \url{https://saynaebrahimi.github.io/DocTTA.html}.
△ Less
Submitted 23 August, 2023; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Contrastive Test-Time Adaptation
Authors:
Dian Chen,
Dequan Wang,
Trevor Darrell,
Sayna Ebrahimi
Abstract:
Test-time adaptation is a special setting of unsupervised domain adaptation where a trained model on the source domain has to adapt to the target domain without accessing source data. We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels. Th…
▽ More
Test-time adaptation is a special setting of unsupervised domain adaptation where a trained model on the source domain has to adapt to the target domain without accessing source data. We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels. The contrastive learning task is applied jointly with pseudo labeling, contrasting positive and negative pairs constructed similarly as MoCo but with source-initialized encoder, and excluding same-class negative pairs indicated by pseudo labels. Meanwhile, we produce pseudo labels online and refine them via soft voting among their nearest neighbors in the target feature space, enabled by maintaining a memory queue. Our method, AdaContrast, achieves state-of-the-art performance on major benchmarks while having several desirable properties compared to existing works, including memory efficiency, insensitivity to hyper-parameters, and better model calibration. Project page: sites.google.com/view/adacontrast.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning
Authors:
Zifeng Wang,
Zizhao Zhang,
Sayna Ebrahimi,
Ruoxi Sun,
Han Zhang,
Chen-Yu Lee,
Xiaoqi Ren,
Guolong Su,
Vincent Perot,
Jennifer Dy,
Tomas Pfister
Abstract:
Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting. Top-performing methods usually require a rehearsal buffer to store past pristine examples for experience replay, which, however, limits their practical value due to privacy and memory constraints. In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny s…
▽ More
Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting. Top-performing methods usually require a rehearsal buffer to store past pristine examples for experience replay, which, however, limits their practical value due to privacy and memory constraints. In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny set of parameters, called prompts, to properly instruct a pre-trained model to learn tasks arriving sequentially without buffering past examples. DualPrompt presents a novel approach to attach complementary prompts to the pre-trained backbone, and then formulates the objective as learning task-invariant and task-specific "instructions". With extensive experimental validation, DualPrompt consistently sets state-of-the-art performance under the challenging class-incremental setting. In particular, DualPrompt outperforms recent advanced continual learning methods with relatively large buffer sizes. We also introduce a more challenging benchmark, Split ImageNet-R, to help generalize rehearsal-free continual learning research. Source code is available at https://github.com/google-research/l2p.
△ Less
Submitted 5 August, 2022; v1 submitted 10 April, 2022;
originally announced April 2022.
-
RC-RNN: Reconfigurable Cache Architecture for Storage Systems Using Recurrent Neural Networks
Authors:
Shahriar Ebrahimi,
Reza Salkhordeh,
Seyed Ali Osia,
Ali Taheri,
Hamid Reza Rabiee,
Hossein Asadi
Abstract:
Solid-State Drives (SSDs) have significant performance advantages over traditional Hard Disk Drives (HDDs) such as lower latency and higher throughput. Significantly higher price per capacity and limited lifetime, however, prevents designers to completely substitute HDDs by SSDs in enterprise storage systems. SSD-based caching has recently been suggested for storage systems to benefit from higher…
▽ More
Solid-State Drives (SSDs) have significant performance advantages over traditional Hard Disk Drives (HDDs) such as lower latency and higher throughput. Significantly higher price per capacity and limited lifetime, however, prevents designers to completely substitute HDDs by SSDs in enterprise storage systems. SSD-based caching has recently been suggested for storage systems to benefit from higher performance of SSDs while minimizing the overall cost. While conventional caching algorithms such as Least Recently Used (LRU) provide high hit ratio in processors, due to the highly random behavior of Input/Output (I/O) workloads, they hardly provide the required performance level for storage systems. In addition to poor performance, inefficient algorithms also shorten SSD lifetime with unnecessary cache replacements. Such shortcomings motivate us to benefit from more complex non-linear algorithms to achieve higher cache performance and extend SSD lifetime. In this paper, we propose RC-RNN, the first reconfigurable SSD-based cache architecture for storage systems that utilizes machine learning to identify performance-critical data pages for I/O caching. The proposed architecture uses Recurrent Neural Networks (RNN) to characterize ongoing workloads and optimize itself towards higher cache performance while improving SSD lifetime. RC-RNN attempts to learn characteristics of the running workload to predict its behavior and then uses the collected information to identify performance-critical data pages to fetch into the cache. Experimental results show that RC-RNN characterizes workloads with an accuracy up to 94.6% for SNIA I/O workloads. RC-RNN can perform similarly to the optimal cache algorithm by an accuracy of 95% on average, and outperforms previous SSD caching architectures by providing up to 7x higher hit ratio and decreasing cache replacements by up to 2x.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
Enhancing Cold Wallet Security with Native Multi-Signature schemes in Centralized Exchanges
Authors:
Shahriar Ebrahimi,
Parisa Hasanizadeh,
Seyed Mohammad Aghamirmohammadali,
Amirali Akbari
Abstract:
Currently, one of the most widely used protocols to secure cryptocurrency assets in centralized exchanges is categorizing wallets into cold and hot. While cold wallets hold user deposits, hot} wallets are responsible for addressing withdrawal requests. However, this method has some shortcomings such as: 1) availability of private keys in at least one cold device, and~2) exposure of all private key…
▽ More
Currently, one of the most widely used protocols to secure cryptocurrency assets in centralized exchanges is categorizing wallets into cold and hot. While cold wallets hold user deposits, hot} wallets are responsible for addressing withdrawal requests. However, this method has some shortcomings such as: 1) availability of private keys in at least one cold device, and~2) exposure of all private keys to one trusted cold wallet admin. To overcome such issues, we design a new protocol for managing cold wallet assets by employing native multi-signature schemes. The proposed cold wallet system, involves at least two distinct devices and their corresponding admins for both wallet creation and signature generation. The method ensures that no final private key is stored on any device. To this end, no individual authority can spend from exchange assets. Moreover, we provide details regarding practical implementation of the proposed method and compare it against state-of-the-art. Furthermore, we extend the application of the proposed method to an scalable scenario where users are directly involved in wallet generation and signing process of cold wallets in an MPC manner.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
On-target Adaptation
Authors:
Dequan Wang,
Shaoteng Liu,
Sayna Ebrahimi,
Evan Shelhamer,
Trevor Darrell
Abstract:
Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain. Most adaptation methods rely on the source data by joint optimization over source data and target data. Source-free methods replace the source data with a source model by fine-tuning it on target. Either way, the majority of the parameter updates for the model represe…
▽ More
Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain. Most adaptation methods rely on the source data by joint optimization over source data and target data. Source-free methods replace the source data with a source model by fine-tuning it on target. Either way, the majority of the parameter updates for the model representation and the classifier are derived from the source, and not the target. However, target accuracy is the goal, and so we argue for optimizing as much as possible on the target data. We show significant improvement by on-target adaptation, which learns the representation purely from target data while taking only the source predictions for supervision. In the long-tailed classification setting, we show further improvement by on-target class distribution learning, which learns the (im)balance of classes from target data.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Region-level Active Detector Learning
Authors:
Michael Laielli,
Giscard Biamby,
Dian Chen,
Ritwik Gupta,
Adam Loeffler,
Phat Dat Nguyen,
Ross Luo,
Trevor Darrell,
Sayna Ebrahimi
Abstract:
Active learning for object detection is conventionally achieved by applying techniques developed for classification in a way that aggregates individual detections into image-level selection criteria. This is typically coupled with the costly assumption that every image selected for labelling must be exhaustively annotated. This yields incremental improvements on well-curated vision datasets and st…
▽ More
Active learning for object detection is conventionally achieved by applying techniques developed for classification in a way that aggregates individual detections into image-level selection criteria. This is typically coupled with the costly assumption that every image selected for labelling must be exhaustively annotated. This yields incremental improvements on well-curated vision datasets and struggles in the presence of data imbalance and visual clutter that occurs in real-world imagery. Alternatives to the image-level approach are surprisingly under-explored in the literature. In this work, we introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach that promotes spatial-diversity by avoiding nearby redundant queries from the same image and minimizes context-switching for the labeler. We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
△ Less
Submitted 17 January, 2022; v1 submitted 20 August, 2021;
originally announced August 2021.
-
Predicting with Confidence on Unseen Distributions
Authors:
Devin Guillory,
Vaishaal Shankar,
Sayna Ebrahimi,
Trevor Darrell,
Ludwig Schmidt
Abstract:
Recent work has shown that the performance of machine learning models can vary substantially when models are evaluated on data drawn from a distribution that is close to but different from the training distribution. As a result, predicting model performance on unseen distributions is an important challenge. Our work connects techniques from domain adaptation and predictive uncertainty literature,…
▽ More
Recent work has shown that the performance of machine learning models can vary substantially when models are evaluated on data drawn from a distribution that is close to but different from the training distribution. As a result, predicting model performance on unseen distributions is an important challenge. Our work connects techniques from domain adaptation and predictive uncertainty literature, and allows us to predict model accuracy on challenging unseen distributions without access to labeled data. In the context of distribution shift, distributional distances are often used to adapt models and improve their performance on new domains, however accuracy estimation, or other forms of predictive uncertainty, are often neglected in these investigations. Through investigating a wide range of established distributional distances, such as Frechet distance or Maximum Mean Discrepancy, we determine that they fail to induce reliable estimates of performance under distribution shift. On the other hand, we find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts. We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference. $DoC$ reduces predictive error by almost half ($46\%$) on several realistic and challenging distribution shifts, e.g., on the ImageNet-Vid-Robust and ImageNet-Rendition datasets.
△ Less
Submitted 19 August, 2021; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Self-Supervised Pretraining Improves Self-Supervised Pretraining
Authors:
Colorado J. Reed,
Xiangyu Yue,
Ani Nrusimha,
Sayna Ebrahimi,
Vivek Vijaykumar,
Richard Mao,
Bo Li,
Shanghang Zhang,
Devin Guillory,
Sean Metzger,
Kurt Keutzer,
Trevor Darrell
Abstract:
While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the r…
▽ More
While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the resources to pretrain must use existing models with lower performance. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data. Taken together, HPT provides a simple framework for obtaining better pretrained representations with less computational resources.
△ Less
Submitted 24 March, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Minimax Active Learning
Authors:
Sayna Ebrahimi,
William Gan,
Dian Chen,
Giscard Biamby,
Kamyar Salahi,
Michael Laielli,
Shizhan Zhu,
Trevor Darrell
Abstract:
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator. Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples. While uncertainty-based strategies are susceptible to outliers, so…
▽ More
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator. Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples. While uncertainty-based strategies are susceptible to outliers, solely relying on sample diversity does not capture the information available on the main task. In this work, we develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner. Our model consists of an entropy minimizing feature encoding network followed by an entropy maximizing classification layer. This minimax formulation reduces the distribution gap between the labeled/unlabeled data, while a discriminator is simultaneously trained to distinguish the labeled/unlabeled data. The highest entropy samples from the classifier that the discriminator predicts as unlabeled are selected for labeling. We evaluate our method on various image classification and semantic segmentation benchmark datasets and show superior performance over the state-of-the-art methods.
△ Less
Submitted 30 March, 2021; v1 submitted 18 December, 2020;
originally announced December 2020.
-
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting
Authors:
Sayna Ebrahimi,
Suzanne Petryk,
Akash Gokul,
William Gan,
Joseph E. Gonzalez,
Marcus Rohrbach,
Trevor Darrell
Abstract:
The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting. Previous work has shown that leveraging memory in the form of a replay buffer can reduce performance degradation on prior tasks. We hypothesize that forgetting can be further reduced when the model is encouraged to remember the \textit{evidence} for previously made…
▽ More
The goal of continual learning (CL) is to learn a sequence of tasks without suffering from the phenomenon of catastrophic forgetting. Previous work has shown that leveraging memory in the form of a replay buffer can reduce performance degradation on prior tasks. We hypothesize that forgetting can be further reduced when the model is encouraged to remember the \textit{evidence} for previously made decisions. As a first step towards exploring this hypothesis, we propose a simple novel training paradigm, called Remembering for the Right Reasons (RRR), that additionally stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions by encouraging its explanations to remain consistent with those used to make decisions at training time. Without this constraint, there is a drift in explanations and increase in forgetting as conventional continual learning algorithms learn new tasks. We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting, and more importantly, improved model explanations. We have evaluated our approach in the standard and few-shot settings and observed a consistent improvement across various CL approaches using different architectures and techniques to generate model explanations and demonstrated our approach showing a promising connection between explainability and continual learning. Our code is available at \url{https://github.com/SaynaEbrahimi/Remembering-for-the-Right-Reasons}.
△ Less
Submitted 2 May, 2021; v1 submitted 4 October, 2020;
originally announced October 2020.
-
Adversarial Continual Learning
Authors:
Sayna Ebrahimi,
Franziska Meier,
Roberto Calandra,
Trevor Darrell,
Marcus Rohrbach
Abstract:
Continual learning aims to learn new tasks without forgetting previously learned ones. We hypothesize that representations learned to solve each task in a sequence have a shared structure while containing some task-specific properties. We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representatio…
▽ More
Continual learning aims to learn new tasks without forgetting previously learned ones. We hypothesize that representations learned to solve each task in a sequence have a shared structure while containing some task-specific properties. We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks. Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills. We demonstrate our hybrid approach is effective in avoiding forgetting and show it is superior to both architecture-based and memory-based approaches on class incrementally learning of a single dataset as well as a sequence of multiple datasets in image classification. Our code is available at \url{https://github.com/facebookresearch/Adversarial-Continual-Learning}.
△ Less
Submitted 21 July, 2020; v1 submitted 20 March, 2020;
originally announced March 2020.
-
E2E Migration Strategies Towards 5G: Long-term Migration Plan and Evolution Roadmap
Authors:
Abulfazl Zakeri,
Narges Gholipoor,
Mohsen Tajallifar,
Sina Ebrahimi,
Mohammad Reza Javan,
Nader Mokari,
Ahmad Reza Sharafat
Abstract:
After freezing the first phase of the fifth generation of wireless networks (5G) standardization, it finally goes live now and the rollout of the commercial launch (most in fixed 5G broadband services) and migration has been started. However, some challenges are arising in the deployment, integration of each technology, and the interoperability in the network of the communication service providers…
▽ More
After freezing the first phase of the fifth generation of wireless networks (5G) standardization, it finally goes live now and the rollout of the commercial launch (most in fixed 5G broadband services) and migration has been started. However, some challenges are arising in the deployment, integration of each technology, and the interoperability in the network of the communication service providers (CSPs). At the same time, the evolution of 5G is not clear and many questions arise such as whether 5G has long-term evolution or when 5G will change to a next-generation one. This paper provides long-term migration options and paths towards 5G considering many key factors such as the cost, local/national data traffic, marketing, and the standardization trends in the radio access network (RAN), the transport network (TN), the core network (CN), and E2E network. Moreover, we outline some 5G evolution road maps emphasizing on the technologies, standards, and service time lines. The proposed migration paths can be the answer to some CSPs concerns about how to do long-term migration to 5G and beyond.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
Joint Resource and Admission Management for Slice-enabled Networks
Authors:
Sina Ebrahimi,
Abulfazl Zakeri,
Behzad Akbari,
Nader Mokari
Abstract:
Network slicing is a crucial part of the 5G networks that communication service providers (CSPs) seek to deploy. By exploiting three main enabling technologies, namely, software-defined networking (SDN), network function virtualization (NFV), and network slicing, communication services can be served to the end-users in an efficient, scalable, and flexible manner. To adopt these technologies, what…
▽ More
Network slicing is a crucial part of the 5G networks that communication service providers (CSPs) seek to deploy. By exploiting three main enabling technologies, namely, software-defined networking (SDN), network function virtualization (NFV), and network slicing, communication services can be served to the end-users in an efficient, scalable, and flexible manner. To adopt these technologies, what is highly important is how to allocate the resources and admit the customers of the CSPs based on the predefined criteria and available resources. In this regard, we propose a novel joint resource and admission management algorithm for slice-enabled networks. In the proposed algorithm, our target is to minimize the network cost of the CSP subject to the slice requests received from the tenants corresponding to the virtual machines and virtual links constraints. Our performance evaluation of the proposed method shows its efficiency in managing CSP's resources.
△ Less
Submitted 7 December, 2019; v1 submitted 30 November, 2019;
originally announced December 2019.
-
Energy-Efficient Task Offloading Under E2E Latency Constraints
Authors:
Mohsen Tajallifar,
Sina Ebrahimi,
Mohammad Reza Javan,
Nader Mokari,
Luca Chiaraviglio
Abstract:
In this paper, we propose a novel resource management scheme that jointly allocates the transmit power and computational resources in a centralized radio access network architecture. The network comprises a set of computing nodes to which the requested tasks of different users are offloaded. The optimization problem minimizes the energy consumption of task offloading while takes the end-to-end lat…
▽ More
In this paper, we propose a novel resource management scheme that jointly allocates the transmit power and computational resources in a centralized radio access network architecture. The network comprises a set of computing nodes to which the requested tasks of different users are offloaded. The optimization problem minimizes the energy consumption of task offloading while takes the end-to-end latency, i.e., the transmission, execution, and propagation latencies of each task, into account. We aim to allocate the transmit power and computational resources such that the maximum acceptable latency of each task is satisfied. Since the optimization problem is non-convex, we divide it into two sub-problems, one for transmit power allocation and another for task placement and computational resource allocation. Transmit power is allocated via the convex-concave procedure. In addition, a heuristic algorithm is proposed to jointly manage computational resources and task placement. We also propose a feasibility analysis that finds a feasible subset of tasks. Furthermore, a disjoint method that separately allocates the transmit power and the computational resources is proposed as the baseline of comparison. A lower bound on the optimal solution of the optimization problem is also derived based on exhaustive search over task placement decisions and utilizing Karush-Kuhn-Tucker conditions. Simulation results show that the joint method outperforms the disjoint method in terms of acceptance ratio. Simulations also show that the optimality gap of the joint method is less than 5%.
△ Less
Submitted 23 June, 2021; v1 submitted 30 November, 2019;
originally announced December 2019.
-
WiCV 2019: The Sixth Women In Computer Vision Workshop
Authors:
Irene Amerini,
Elena Balashova,
Sayna Ebrahimi,
Kathryn Leonard,
Arsha Nagrani,
Amaia Salvador
Abstract:
In this paper we present the Women in Computer Vision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in the computer vision field. Computer vision and machine learning have made incredible progress over the past years, but the number of female researchers is still low both in academia and in indust…
▽ More
In this paper we present the Women in Computer Vision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in the computer vision field. Computer vision and machine learning have made incredible progress over the past years, but the number of female researchers is still low both in academia and in industry. WiCV is organized especially for the following reason: to raise visibility of female researchers, to increase collaborations between them, and to provide mentorship to female junior researchers in the field. In this paper, we present a report of trends over the past years, along with a summary of statistics regarding presenters, attendees, and sponsorship for the current workshop.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
Uncertainty-guided Continual Learning with Bayesian Neural Networks
Authors:
Sayna Ebrahimi,
Mohamed Elhoseiny,
Trevor Darrell,
Marcus Rohrbach
Abstract:
Continual learning aims to learn new tasks without forgetting previously learned ones. This is especially challenging when one cannot access data from previous tasks and when the model has a fixed capacity. Current regularization-based continual learning algorithms need an external representation and extra computation to measure the parameters' \textit{importance}. In contrast, we propose Uncertai…
▽ More
Continual learning aims to learn new tasks without forgetting previously learned ones. This is especially challenging when one cannot access data from previous tasks and when the model has a fixed capacity. Current regularization-based continual learning algorithms need an external representation and extra computation to measure the parameters' \textit{importance}. In contrast, we propose Uncertainty-guided Continual Bayesian Neural Networks (UCB), where the learning rate adapts according to the uncertainty defined in the probability distribution of the weights in networks. Uncertainty is a natural way to identify \textit{what to remember} and \textit{what to change} as we continually learn, and thus mitigate catastrophic forgetting. We also show a variant of our model, which uses uncertainty for weight pruning and retains task performance after pruning by saving binary masks per tasks. We evaluate our UCB approach extensively on diverse object classification datasets with short and long sequences of tasks and report superior or on-par performance compared to existing approaches. Additionally, we show that our model does not necessarily need task information at test time, i.e. it does not presume knowledge of which task a sample belongs to.
△ Less
Submitted 19 February, 2020; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Variational Adversarial Active Learning
Authors:
Samarth Sinha,
Sayna Ebrahimi,
Trevor Darrell
Abstract:
Active learning aims to develop label-efficient algorithms by sampling the most representative queries to be labeled by an oracle. We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. Unlike conventional active learning algorithms, our approach is task agnostic, i.e., it does not depend on the performance of the…
▽ More
Active learning aims to develop label-efficient algorithms by sampling the most representative queries to be labeled by an oracle. We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. Unlike conventional active learning algorithms, our approach is task agnostic, i.e., it does not depend on the performance of the task for which we are trying to acquire labeled data. Our method learns a latent space using a variational autoencoder (VAE) and an adversarial network trained to discriminate between unlabeled and labeled data. The mini-max game between the VAE and the adversarial network is played such that while the VAE tries to trick the adversarial network into predicting that all data points are from the labeled pool, the adversarial network learns how to discriminate between dissimilarities in the latent space. We extensively evaluate our method on various image classification and semantic segmentation benchmark datasets and establish a new state of the art on $\text{CIFAR10/100}$, $\text{Caltech-256}$, $\text{ImageNet}$, $\text{Cityscapes}$, and $\text{BDD100K}$. Our results demonstrate that our adversarial approach learns an effective low dimensional latent space in large-scale settings and provides for a computationally efficient sampling method. Our code is available at https://github.com/sinhasam/vaal.
△ Less
Submitted 28 October, 2019; v1 submitted 31 March, 2019;
originally announced April 2019.
-
Large Multistream Data Analytics for Monitoring and Diagnostics in Manufacturing Systems
Authors:
Samaneh Ebrahimi,
Chitta Ranjan,
Kamran Paynabar
Abstract:
The high-dimensionality and volume of large scale multistream data has inhibited significant research progress in developing an integrated monitoring and diagnostics (M&D) approach. This data, also categorized as big data, is becoming common in manufacturing plants. In this paper, we propose an integrated M\&D approach for large scale streaming data. We developed a novel monitoring method named Ad…
▽ More
The high-dimensionality and volume of large scale multistream data has inhibited significant research progress in developing an integrated monitoring and diagnostics (M&D) approach. This data, also categorized as big data, is becoming common in manufacturing plants. In this paper, we propose an integrated M\&D approach for large scale streaming data. We developed a novel monitoring method named Adaptive Principal Component monitoring (APC) which adaptively chooses PCs that are most likely to vary due to the change for early detection. Importantly, we integrate a novel diagnostic approach, Principal Component Signal Recovery (PCSR), to enable a streamlined SPC. This diagnostics approach draws inspiration from Compressed Sensing and uses Adaptive Lasso for identifying the sparse change in the process. We theoretically motivate our approaches and do a performance evaluation of our integrated M&D method through simulations and case studies.
△ Less
Submitted 26 December, 2018;
originally announced December 2018.
-
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
Authors:
Edgar Schönfeld,
Sayna Ebrahimi,
Samarth Sinha,
Trevor Darrell,
Zeynep Akata
Abstract:
Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a mapping associated with class embeddings. In this work,…
▽ More
Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a mapping associated with class embeddings. In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders. This leaves us with the required discriminative information about the image and classes in the latent features, on which we train a softmax classifier. The key to our approach is that we align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with unseen classes. We evaluate our learned latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2, and establish a new state of the art on generalized zero-shot as well as on few-shot learning. Moreover, our results on ImageNet with various zero-shot splits show that our latent features generalize well in large-scale settings.
△ Less
Submitted 5 April, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.
-
Compositional GAN: Learning Image-Conditional Binary Composition
Authors:
Samaneh Azadi,
Deepak Pathak,
Sayna Ebrahimi,
Trevor Darrell
Abstract:
Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion,…
▽ More
Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion, or viewpoint transformation is a challenging problem. In this work, we propose a novel self-consistent Composition-by-Decomposition (CoDe) network to compose a pair of objects. Given object images from two distinct distributions, our model can generate a realistic composite image from their joint distribution following the texture and shape of the input objects. We evaluate our approach through qualitative experiments and user evaluations. Our results indicate that the learned model captures potential interactions between the two object domains, and generates realistic composed scenes at test time.
△ Less
Submitted 28 March, 2019; v1 submitted 19 July, 2018;
originally announced July 2018.
-
Resource-Efficient Neural Architect
Authors:
Yanqi Zhou,
Siavash Ebrahimi,
Sercan Ö. Arık,
Haonan Yu,
Hairong Liu,
Greg Diamos
Abstract:
Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate…
▽ More
Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate new configurations. We demonstrate RENA on image recognition and keyword spotting (KWS) problems. RENA can find novel architectures that achieve high performance even with tight resource constraints. For CIFAR10, it achieves 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size is less than 3M parameters. For Google Speech Commands Dataset, RENA achieves the state-of-the-art accuracy without resource constraints, and it outperforms the optimized architectures with tight resource constraints.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
ReCA: an Efficient Reconfigurable Cache Architecture for Storage Systems with Online Workload Characterization
Authors:
Reza Salkhordeh,
Shahriar Ebrahimi,
Hossein Asadi
Abstract:
In recent years, SSDs have gained tremendous attention in computing and storage systems due to significant performance improvement over HDDs. The cost per capacity of SSDs, however, prevents them from entirely replacing HDDs in such systems. One approach to effectively take advantage of SSDs is to use them as a caching layer to store performance critical data blocks to reduce the number of accesse…
▽ More
In recent years, SSDs have gained tremendous attention in computing and storage systems due to significant performance improvement over HDDs. The cost per capacity of SSDs, however, prevents them from entirely replacing HDDs in such systems. One approach to effectively take advantage of SSDs is to use them as a caching layer to store performance critical data blocks to reduce the number of accesses to disk subsystem. Due to characteristics of Flash-based SSDs such as limited write endurance and long latency on write operations, employing caching algorithms at the Operating System (OS) level necessitates to take such characteristics into consideration. Previous caching techniques are optimized towards only one type of application, which affects both generality and applicability. In addition, they are not adaptive when the workload pattern changes over time. This paper presents an efficient Reconfigurable Cache Architecture (ReCA) for storage systems using a comprehensive workload characterization to find an optimal cache configuration for I/O intensive applications. For this purpose, we first investigate various types of I/O workloads and classify them into five major classes. Based on this characterization, an optimal cache configuration is presented for each class of workloads. Then, using the main features of each class, we continuously monitor the characteristics of an application during system runtime and the cache organization is reconfigured if the application changes from one class to another class of workloads. The cache reconfiguration is done online and workload classes can be extended to emerging I/O workloads in order to maintain its efficiency with the characteristics of I/O requests. Experimental results obtained by implementing ReCA in a server running Linux show that the proposed architecture improves performance and lifetime up to 24\% and 33\%, respectively.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
Study of Residual Networks for Image Recognition
Authors:
Mohammad Sadegh Ebrahimi,
Hossein Karkeh Abadi
Abstract:
Deep neural networks demonstrate to have a high performance on image classification tasks while being more difficult to train. Due to the complexity and vanishing gradient problem, it normally takes a lot of time and more computational power to train deeper neural networks. Deep residual networks (ResNets) can make the training process faster and attain more accuracy compared to their equivalent n…
▽ More
Deep neural networks demonstrate to have a high performance on image classification tasks while being more difficult to train. Due to the complexity and vanishing gradient problem, it normally takes a lot of time and more computational power to train deeper neural networks. Deep residual networks (ResNets) can make the training process faster and attain more accuracy compared to their equivalent neural networks. ResNets achieve this improvement by adding a simple skip connection parallel to the layers of convolutional neural networks. In this project we first design a ResNet model that can perform the image classification task on the Tiny ImageNet dataset with a high accuracy, then we compare the performance of this ResNet model with its equivalent Convolutional Network (ConvNet). Our findings illustrate that ResNets are more prone to overfitting despite their higher accuracy. Several methods to prevent overfitting such as adding dropout layers and stochastic augmentation of the training dataset has been studied in this work.
△ Less
Submitted 21 April, 2018;
originally announced May 2018.
-
Predicting User Performance and Bitcoin Price Using Block Chain Transaction Network
Authors:
Mohammad Sadegh Ebrahimi,
Afshin Babveyh
Abstract:
This work is organized as follows. In the first section we review the prior work and we have obtained our data. Next, we will look at address reuse in the Bitcoin network. We show that a great portion of users reuse their addresses which could enable us to cluster the addresses and attribute them to single users. Next, we will categorize the nodes based on their role in the network as a customer o…
▽ More
This work is organized as follows. In the first section we review the prior work and we have obtained our data. Next, we will look at address reuse in the Bitcoin network. We show that a great portion of users reuse their addresses which could enable us to cluster the addresses and attribute them to single users. Next, we will categorize the nodes based on their role in the network as a customer or seller. Finally, we do a study of nodes and network performance.
△ Less
Submitted 21 April, 2018;
originally announced April 2018.
-
Predicting Audio Advertisement Quality
Authors:
Samaneh Ebrahimi,
Hossein Vahabi,
Matthew Prockup,
Oriol Nieto
Abstract:
Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads rankin…
▽ More
Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.
△ Less
Submitted 9 February, 2018;
originally announced February 2018.
-
Gradient-free Policy Architecture Search and Adaptation
Authors:
Sayna Ebrahimi,
Anna Rohrbach,
Trevor Darrell
Abstract:
We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to th…
▽ More
We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment.
△ Less
Submitted 16 October, 2017;
originally announced October 2017.
-
Source-Sensitive Belief Change
Authors:
Shahab Ebrahimi
Abstract:
The AGM model is the most remarkable framework for modeling belief revision. However, it is not perfect in all aspects. Paraconsistent belief revision, multi-agent belief revision and non-prioritized belief revision are three different extensions to AGM to address three important criticisms applied to it. In this article, we propose a framework based on AGM that takes a position in each of these c…
▽ More
The AGM model is the most remarkable framework for modeling belief revision. However, it is not perfect in all aspects. Paraconsistent belief revision, multi-agent belief revision and non-prioritized belief revision are three different extensions to AGM to address three important criticisms applied to it. In this article, we propose a framework based on AGM that takes a position in each of these categories. Also, we discuss some features of our framework and study the satisfiability of AGM postulates in this new context.
△ Less
Submitted 5 May, 2017; v1 submitted 11 April, 2017;
originally announced April 2017.
-
Sequence Graph Transform (SGT): A Feature Embedding Function for Sequence Data Mining
Authors:
Chitta Ranjan,
Samaneh Ebrahimi,
Kamran Paynabar
Abstract:
Sequence feature embedding is a challenging task due to the unstructuredness of sequence, i.e., arbitrary strings of arbitrary length. Existing methods are efficient in extracting short-term dependencies but typically suffer from computation issues for the long-term. Sequence Graph Transform (SGT), a feature embedding function, that can extract a varying amount of short- to long-term dependencies…
▽ More
Sequence feature embedding is a challenging task due to the unstructuredness of sequence, i.e., arbitrary strings of arbitrary length. Existing methods are efficient in extracting short-term dependencies but typically suffer from computation issues for the long-term. Sequence Graph Transform (SGT), a feature embedding function, that can extract a varying amount of short- to long-term dependencies without increasing the computation is proposed. SGT's properties are analytically proved for interpretation under normal and uniform distribution assumptions. SGT features yield significantly superior results in sequence clustering and classification with higher accuracy and lower computation as compared to the existing methods, including the state-of-the-art sequence/string Kernels and LSTM.
△ Less
Submitted 4 October, 2021; v1 submitted 11 August, 2016;
originally announced August 2016.