-
Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models
Authors:
Minh Nguyen,
Franck Dernoncourt,
Seunghyun Yoon,
Hanieh Deilamsalehy,
Hao Tan,
Ryan Rossi,
Quan Hung Tran,
Trung Bui,
Thien Huu Nguyen
Abstract:
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these ga…
▽ More
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here: \url{https://github.com/adobe-research/speaker-identification}
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Federated PCA on Grassmann Manifold for IoT Anomaly Detection
Authors:
Tung-Anh Nguyen,
Long Tan Le,
Tuan Dung Nguyen,
Wei Bao,
Suranga Seneviratne,
Choong Seon Hong,
Nguyen H. Tran
Abstract:
With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with hi…
▽ More
With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with high dimensionality. Recent unsupervised ML-IDS approaches such as AutoEncoders and Generative Adversarial Networks (GAN) offer alternative solutions but pose challenges in deployment onto resource-constrained IoT devices and in interpretability. To address these concerns, this paper proposes a novel federated unsupervised anomaly detection framework, FedPCA, that leverages Principal Component Analysis (PCA) and the Alternating Directions Method Multipliers (ADMM) to learn common representations of distributed non-i.i.d. datasets. Building on the FedPCA framework, we propose two algorithms, FEDPE in Euclidean space and FEDPG on Grassmann manifolds. Our approach enables real-time threat detection and mitigation at the device level, enhancing network resilience while ensuring privacy. Moreover, the proposed algorithms are accompanied by theoretical convergence rates even under a subsampling scheme, a novel result. Experimental results on the UNSW-NB15 and TON-IoT datasets show that our proposed methods offer performance in anomaly detection comparable to nonlinear baselines, while providing significant improvements in communication and memory efficiency, underscoring their potential for securing IoT networks.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Quantum Curriculum Learning
Authors:
Quoc Hoan Tran,
Yasuhiro Endo,
Hirotaka Oshima
Abstract:
Quantum machine learning (QML) requires significant quantum resources to achieve quantum advantage. Research should prioritize both the efficient design of quantum architectures and the development of learning strategies to optimize resource usage. We propose a framework called quantum curriculum learning (Q-CurL) for quantum data, where the curriculum introduces simpler tasks or data to the learn…
▽ More
Quantum machine learning (QML) requires significant quantum resources to achieve quantum advantage. Research should prioritize both the efficient design of quantum architectures and the development of learning strategies to optimize resource usage. We propose a framework called quantum curriculum learning (Q-CurL) for quantum data, where the curriculum introduces simpler tasks or data to the learning model before progressing to more challenging ones. We define the curriculum criteria based on the data density ratio between tasks to determine the curriculum order. We also implement a dynamic learning schedule to emphasize the significance of quantum data in optimizing the loss function. Empirical evidence shows that Q-CurL significantly enhances the training convergence and the generalization for unitary learning tasks and improves the robustness of quantum phase recognition tasks. Our framework provides a general learning strategy, bringing QML closer to realizing practical advantages.
△ Less
Submitted 11 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Empirical Tests of Optimization Assumptions in Deep Learning
Authors:
Hoang Tran,
Qinzi Zhang,
Ashok Cutkosky
Abstract:
There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on proving convergence guarantees under a variety of different assumptions, which are themselves often chosen based on a rough combination of intuitive match to practice and analytical convenience. The theory/prac…
▽ More
There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on proving convergence guarantees under a variety of different assumptions, which are themselves often chosen based on a rough combination of intuitive match to practice and analytical convenience. The theory/practice gap may then arise because of the failure to prove a theorem under such assumptions, or because the assumptions do not reflect reality. In this paper, we carefully measure the degree to which these assumptions are capable of explaining modern optimization algorithms by developing new empirical metrics that closely track the key quantities that must be controlled in theoretical analysis. All of our tested assumptions (including typical modern assumptions based on bounds on the Hessian) fail to reliably capture optimization performance. This highlights a need for new empirical verification of analytical assumptions used in theoretical analysis.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Private Zeroth-Order Nonsmooth Nonconvex Optimization
Authors:
Qinzi Zhang,
Hoang Tran,
Ashok Cutkosky
Abstract:
We introduce a new zeroth-order algorithm for private stochastic optimization on nonconvex and nonsmooth objectives. Given a dataset of size $M$, our algorithm ensures $(α,αρ^2/2)$-Rényi differential privacy and finds a $(δ,ε)$-stationary point so long as $M=\tildeΩ\left(\frac{d}{δε^3} + \frac{d^{3/2}}{ρδε^2}\right)$. This matches the optimal complexity of its non-private zeroth-order analog. Nota…
▽ More
We introduce a new zeroth-order algorithm for private stochastic optimization on nonconvex and nonsmooth objectives. Given a dataset of size $M$, our algorithm ensures $(α,αρ^2/2)$-Rényi differential privacy and finds a $(δ,ε)$-stationary point so long as $M=\tildeΩ\left(\frac{d}{δε^3} + \frac{d^{3/2}}{ρδε^2}\right)$. This matches the optimal complexity of its non-private zeroth-order analog. Notably, although the objective is not smooth, we have privacy ``for free'' whenever $ρ\ge \sqrt{d}ε$.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Trade-off between Gradient Measurement Efficiency and Expressivity in Deep Quantum Neural Networks
Authors:
Koki Chinzei,
Shinichiro Yamano,
Quoc Hoan Tran,
Yasuhiro Endo,
Hirotaka Oshima
Abstract:
Quantum neural networks (QNNs) require an efficient training algorithm to achieve practical quantum advantages. A promising approach is the use of gradient-based optimization algorithms, where gradients are estimated through quantum measurements. However, it is generally difficult to efficiently measure gradients in QNNs because the quantum state collapses upon measurement. In this work, we prove…
▽ More
Quantum neural networks (QNNs) require an efficient training algorithm to achieve practical quantum advantages. A promising approach is the use of gradient-based optimization algorithms, where gradients are estimated through quantum measurements. However, it is generally difficult to efficiently measure gradients in QNNs because the quantum state collapses upon measurement. In this work, we prove a general trade-off between gradient measurement efficiency and expressivity in a wide class of deep QNNs, elucidating the theoretical limits and possibilities of efficient gradient estimation. This trade-off implies that a more expressive QNN requires a higher measurement cost in gradient estimation, whereas we can increase gradient measurement efficiency by reducing the QNN expressivity to suit a given task. We further propose a general QNN ansatz called the stabilizer-logical product ansatz (SLPA), which can reach the upper limit of the trade-off inequality by leveraging the symmetric structure of the quantum circuit. In learning an unknown symmetric function, the SLPA drastically reduces the quantum resources required for training while maintaining accuracy and trainability compared to a well-designed symmetric circuit based on the parameter-shift method. Our results not only reveal a theoretical understanding of efficient training in QNNs but also provide a standard and broadly applicable efficient QNN design.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems
Authors:
Hung Vinh Tran,
Tong Chen,
Quoc Viet Hung Nguyen,
Zi Huang,
Lizhen Cui,
Hongzhi Yin
Abstract:
Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in c…
▽ More
Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in compressing RS embeddings. However, despite the prosperity of lightweight embedding-based RSs (LERSs), a wide diversity is seen in evaluation protocols, resulting in obstacles when relating LERS performance to real-world usability. Moreover, despite the common goal of lightweight embeddings, LERSs are evaluated with a single choice between the two main recommendation tasks -- collaborative filtering and content-based recommendation. This lack of discussions on cross-task transferability hinders the development of unified, more scalable solutions. Motivated by these issues, this study investigates various LERSs' performance, efficiency, and cross-task transferability via a thorough benchmarking process. Additionally, we propose an efficient embedding compression method using magnitude pruning, which is an easy-to-deploy yet highly competitive baseline that outperforms various complex LERSs. Our study reveals the distinct performance of LERSs across the two tasks, shedding light on their effectiveness and generalizability. To support edge-based recommendations, we tested all LERSs on a Raspberry Pi 4, where the efficiency bottleneck is exposed. Finally, we conclude this paper with critical summaries of LERS performance, model selection suggestions, and underexplored challenges around LERSs for future research. To encourage future research, we publish source codes and artifacts at \href{this link}{https://github.com/chenxing1999/recsys-benchmark}.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
ReadCtrl: Personalizing text generation with readability-controlled instruction learning
Authors:
Hieu Tran,
Zonghai Yao,
Lingxi Li,
Hong Yu
Abstract:
Content generation conditioning on users's readability is an important application for personalization. In an era of large language models (LLMs), readability-controlled text generation based on LLMs has become increasingly important. This paper introduces a novel methodology called "Readability-Controlled Instruction Learning (ReadCtrl)," which aims to instruction-tune LLMs to tailor users' reada…
▽ More
Content generation conditioning on users's readability is an important application for personalization. In an era of large language models (LLMs), readability-controlled text generation based on LLMs has become increasingly important. This paper introduces a novel methodology called "Readability-Controlled Instruction Learning (ReadCtrl)," which aims to instruction-tune LLMs to tailor users' readability levels. Unlike the traditional methods, which primarily focused on categorical readability adjustments typically classified as high, medium, and low or expert and layperson levels with limited success, ReadCtrl introduces a dynamic framework that enables LLMs to generate content at various (near continuous level) complexity levels, thereby enhancing their versatility across different applications. Our results show that the ReadCtrl-Mistral-7B models significantly outperformed strong baseline models such as GPT-4 and Claude-3, with a win rate of 52.1%:35.7% against GPT-4 in human evaluations. Furthermore, Read-Ctrl has shown significant improvements in automatic evaluations, as evidenced by better readability metrics (e.g., FOG, FKGL) and generation quality metrics (e.g., BLEU, SARI, SummaC-Factuality, UniEval-Consistency and Coherence). These results underscore Read-Ctrl's effectiveness and tenacity in producing high-quality, contextually appropriate outputs that closely align with targeted readability levels, marking a significant advancement in personalized content generation using LLMs.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature
Authors:
Julien Delaunay,
Hanh Thi Hong Tran,
Carlos-Emiliano González-Gallardo,
Georgeta Bordea,
Mathilde Ducos,
Nicolas Sidere,
Antoine Doucet,
Senja Pollak,
Olivier De Viron
Abstract:
The growing impact of climate change on coastal areas, particularly active but fragile regions, necessitates collaboration among diverse stakeholders and disciplines to formulate effective environmental protection policies. We introduce a novel specialized corpus comprising 2,491 sentences from 410 scientific abstracts concerning coastal areas, for the Automatic Term Extraction (ATE) and Classific…
▽ More
The growing impact of climate change on coastal areas, particularly active but fragile regions, necessitates collaboration among diverse stakeholders and disciplines to formulate effective environmental protection policies. We introduce a novel specialized corpus comprising 2,491 sentences from 410 scientific abstracts concerning coastal areas, for the Automatic Term Extraction (ATE) and Classification (ATC) tasks. Inspired by the ARDI framework, focused on the identification of Actors, Resources, Dynamics and Interactions, we automatically extract domain terms and their distinct roles in the functioning of coastal systems by leveraging monolingual and multilingual transformer models. The evaluation demonstrates consistent results, achieving an F1 score of approximately 80\% for automated term extraction and F1 of 70\% for extracting terms and their labels. These findings are promising and signify an initial step towards the development of a specialized Knowledge Base dedicated to coastal areas.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Hypergraph Laplacian Eigenmaps and Face Recognition Problems
Authors:
Loc Hoang Tran
Abstract:
Face recognition is a very important topic in data science and biometric security research areas. It has multiple applications in military, finance, and retail, to name a few. In this paper, the novel hypergraph Laplacian Eigenmaps will be proposed and combine with the k nearest-neighbor method and/or with the kernel ridge regression method to solve the face recognition problem. Experimental resul…
▽ More
Face recognition is a very important topic in data science and biometric security research areas. It has multiple applications in military, finance, and retail, to name a few. In this paper, the novel hypergraph Laplacian Eigenmaps will be proposed and combine with the k nearest-neighbor method and/or with the kernel ridge regression method to solve the face recognition problem. Experimental results illustrate that the accuracy of the combination of the novel hypergraph Laplacian Eigenmaps and one specific classification system is similar to the accuracy of the combination of the old symmetric normalized hypergraph Laplacian Eigenmaps method and one specific classification system.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Accelerating Transformers with Spectrum-Preserving Token Merging
Authors:
Hoai-Chau Tran,
Duy M. H. Nguyen,
Duy M. Nguyen,
Trung-Tin Nguyen,
Ngan Le,
Pengtao Xie,
Daniel Sonntag,
James Y. Zou,
Binh T. Nguyen,
Mathias Niepert
Abstract:
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Pr…
▽ More
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large clusters of similar tokens as high-energy, indicating potential candidates for merging, while smaller (unique and isolated) clusters are considered as low-energy and preserved. Experimental findings demonstrate that PiToMe saved from 40-60\% FLOPs of the base models while exhibiting superior off-the-shelf performance on image classification (0.5\% average performance drop of ViT-MAE-H compared to 2.6\% as baselines), image-text retrieval (0.3\% average performance drop of CLIP on Flickr30k compared to 4.5\% as others), and analogously in visual questions answering with LLaVa-7B. Furthermore, PiToMe is theoretically shown to preserve intrinsic spectral properties of the original token space under mild conditions
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization
Authors:
Long Tan Le,
Han Shu,
Tung-Anh Nguyen,
Choong Seon Hong,
Nguyen H. Tran
Abstract:
While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited…
▽ More
While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited by their overfitting to pre-collected hard-label datasets. In this paper, we propose a novel LLM alignment framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. Particularly, $i$REPO employs self-generated datasets labelled by empirical human (or AI annotator) preference to iteratively refine the aligned policy through a novel regression-based loss function. Furthermore, we introduce an innovative algorithm backed by theoretical guarantees for achieving optimal results under ideal assumptions and providing a practical performance-gap result without such assumptions. Experimental results with Phi-2 and Mistral-7B demonstrate that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators. Furthermore, our approach surpasses preference optimization baselines in evaluations using the Language Model Evaluation Harness and Multi-turn benchmarks.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Biometrics-Based Authenticated Key Exchange with Multi-Factor Fuzzy Extractor
Authors:
Hong Yen Tran,
Jiankun Hu,
Wen Hu
Abstract:
Existing fuzzy extractors and similar methods provide an effective way for extracting a secret key from a user's biometric data, but are susceptible to impersonation attack: once a valid biometric sample is captured, the scheme is no longer secure. We propose a novel multi-factor fuzzy extractor that integrates both a user's secret (e.g., a password) and a user's biometrics in the generation and r…
▽ More
Existing fuzzy extractors and similar methods provide an effective way for extracting a secret key from a user's biometric data, but are susceptible to impersonation attack: once a valid biometric sample is captured, the scheme is no longer secure. We propose a novel multi-factor fuzzy extractor that integrates both a user's secret (e.g., a password) and a user's biometrics in the generation and reconstruction process of a cryptographic key. We then employ this multi-factor fuzzy extractor to construct personal identity credentials which can be used in a new multi-factor authenticated key exchange protocol that possesses multiple important features. First, the protocol provides mutual authentication. Second, the user and service provider can authenticate each other without the involvement of the identity authority. Third, the protocol can prevent user impersonation from a compromised identity authority. Finally, even when both a biometric sample and the secret are captured, the user can re-register to create a new credential using a new secret (reusable/reissued identity credentials). Most existing works on multi-factor authenticated key exchange only have a subset of these features. We formally prove that the proposed protocol is semantically secure. Our experiments carried out on the finger vein dataset SDUMLA achieved a low equal error rate (EER) of 0.04%, a reasonable averaged computation time of 0.93 seconds for the user and service provider to authenticate and establish a shared session key, and a small communication overhead of only 448 bytes.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
Authors:
Kim Hoang Tran,
Phuc Vuong Do,
Ngoc Quoc Ly,
Ngan Le
Abstract:
Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuance…
▽ More
Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuances of the scene and struggle with detecting actions that occupy a small portion of the frame. In particular, they face difficulties when dealing with action classes involving small objects, such as balls or yellow/red cards in soccer, which only occupy a fraction of the screen space. To address these challenges, we introduce a novel approach that analyzes and models scene entities using an adaptive attention mechanism. Particularly, our model disentangles the scene content into the global environment feature and local relevant scene entities feature. To efficiently extract environmental features while considering temporal information with less computational cost, we propose the use of a 2D backbone network with a time-shift mechanism. To accurately capture relevant scene entities, we employ a Vision-Language model in conjunction with the adaptive attention mechanism. Our model has demonstrated outstanding performance, securing the 1st place in the SoccerNet-v2 Action Spotting, FineDiving, and FineGym challenge with a substantial performance improvement of 1.6, 2.0, and 1.3 points in avg-mAP compared to the runner-up methods. Furthermore, our approach offers interpretability capabilities in contrast to other deep learning models, which are often designed as black boxes. Our code and models are released at: https://github.com/Fsoft-AIC/unifying-global-local-feature.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning
Authors:
Quang Minh Dinh,
Minh Khoi Ho,
Anh Quan Dang,
Hung Phong Tran
Abstract:
Traffic video description and analysis have received much attention recently due to the growing demand for efficient and reliable urban surveillance systems. Most existing methods only focus on locating traffic event segments, which severely lack descriptive details related to the behaviour and context of all the subjects of interest in the events. In this paper, we present TrafficVLM, a novel mul…
▽ More
Traffic video description and analysis have received much attention recently due to the growing demand for efficient and reliable urban surveillance systems. Most existing methods only focus on locating traffic event segments, which severely lack descriptive details related to the behaviour and context of all the subjects of interest in the events. In this paper, we present TrafficVLM, a novel multi-modal dense video captioning model for vehicle ego camera view. TrafficVLM models traffic video events at different levels of analysis, both spatially and temporally, and generates long fine-grained descriptions for the vehicle and pedestrian at different phases of the event. We also propose a conditional component for TrafficVLM to control the generation outputs and a multi-task fine-tuning paradigm to enhance TrafficVLM's learning capability. Experiments show that TrafficVLM performs well on both vehicle and overhead camera views. Our solution achieved outstanding results in Track 2 of the AI City Challenge 2024, ranking us third in the challenge standings. Our code is publicly available at https://github.com/quangminhdinh/TrafficVLM.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
PAT: Pixel-wise Adaptive Training for Long-tailed Segmentation
Authors:
Khoi Do,
Duong Nguyen,
Nguyen H. Tran,
Viet Dung Nguyen
Abstract:
Beyond class frequency, we recognize the impact of class-wise relationships among various class-specific predictions and the imbalance in label masks on long-tailed segmentation learning. To address these challenges, we propose an innovative Pixel-wise Adaptive Training (PAT) technique tailored for long-tailed segmentation. PAT has two key features: 1) class-wise gradient magnitude homogenization,…
▽ More
Beyond class frequency, we recognize the impact of class-wise relationships among various class-specific predictions and the imbalance in label masks on long-tailed segmentation learning. To address these challenges, we propose an innovative Pixel-wise Adaptive Training (PAT) technique tailored for long-tailed segmentation. PAT has two key features: 1) class-wise gradient magnitude homogenization, and 2) pixel-wise class-specific loss adaptation (PCLA). First, the class-wise gradient magnitude homogenization helps alleviate the imbalance among label masks by ensuring equal consideration of the class-wise impact on model updates. Second, PCLA tackles the detrimental impact of both rare classes within the long-tailed distribution and inaccurate predictions from previous training stages by encouraging learning classes with low prediction confidence and guarding against forgetting classes with high confidence. This combined approach fosters robust learning while preventing the model from forgetting previously learned knowledge. PAT exhibits significant performance improvements, surpassing the current state-of-the-art by 2.2% in the NyU dataset. Moreover, it enhances overall pixel-wise accuracy by 2.85% and intersection over union value by 2.07%, with a particularly notable declination of 0.39% in detecting rare classes compared to Balance Logits Variation, as demonstrated on the three popular datasets, i.e., OxfordPetIII, CityScape, and NYU.
△ Less
Submitted 10 July, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Shuffling Momentum Gradient Algorithm for Convex Optimization
Authors:
Trang H. Tran,
Quoc Tran-Dinh,
Lam M. Nguyen
Abstract:
The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale applications and big datasets. In the last decades, researchers have made substantial effort to study the theoretical performance of SGD and its shuffling vari…
▽ More
The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale applications and big datasets. In the last decades, researchers have made substantial effort to study the theoretical performance of SGD and its shuffling variants. However, only limited work has investigated its shuffling momentum variants, including shuffling heavy-ball momentum schemes for non-convex problems and Nesterov's momentum for convex settings. In this work, we extend the analysis of the shuffling momentum gradient method developed in [Tran et al (2021)] to both finite-sum convex and strongly convex optimization problems. We provide the first analysis of shuffling momentum-based methods for the strongly convex setting, attaining a convergence rate of $O(1/nT^2)$, where $n$ is the number of samples and $T$ is the number of training epochs. Our analysis is a state-of-the-art, matching the best rates of existing shuffling stochastic gradient algorithms in the literature.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
SKT5SciSumm -- A Hybrid Generative Approach for Multi-Document Scientific Summarization
Authors:
Huy Quoc To,
Hung-Nghiep Tran,
Andr'e Greiner-Petter,
Felix Beierle,
Akiko Aizawa
Abstract:
Summarization for scientific text has shown significant benefits both for the research community and human society. Given the fact that the nature of scientific text is distinctive and the input of the multi-document summarization task is substantially long, the task requires sufficient embedding generation and text truncation without losing important information. To tackle these issues, in this p…
▽ More
Summarization for scientific text has shown significant benefits both for the research community and human society. Given the fact that the nature of scientific text is distinctive and the input of the multi-document summarization task is substantially long, the task requires sufficient embedding generation and text truncation without losing important information. To tackle these issues, in this paper, we propose SKT5SciSumm - a hybrid framework for multi-document scientific summarization (MDSS). We leverage the Sentence-Transformer version of Scientific Paper Embeddings using Citation-Informed Transformers (SPECTER) to encode and represent textual sentences, allowing for efficient extractive summarization using k-means clustering. We employ the T5 family of models to generate abstractive summaries using extracted sentences. SKT5SciSumm achieves state-of-the-art performance on the Multi-XScience dataset. Through extensive experiments and evaluation, we showcase the benefits of our model by using less complicated models to achieve remarkable results, thereby highlighting its potential in advancing the field of multi-document summarization for scientific text.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation
Authors:
Phuc Phan,
Hieu Tran,
Long Phan
Abstract:
We propose a straightforward approach called Distillation Contrastive Decoding (DCD) to enhance the reasoning capabilities of Large Language Models (LLMs) during inference. In contrast to previous approaches that relied on smaller amateur models or analysis of hidden state differences, DCD employs Contrastive Chain-of-thought Prompting and advanced distillation techniques, including Dropout and Qu…
▽ More
We propose a straightforward approach called Distillation Contrastive Decoding (DCD) to enhance the reasoning capabilities of Large Language Models (LLMs) during inference. In contrast to previous approaches that relied on smaller amateur models or analysis of hidden state differences, DCD employs Contrastive Chain-of-thought Prompting and advanced distillation techniques, including Dropout and Quantization. This approach effectively addresses the limitations of Contrastive Decoding (CD), which typically requires both an expert and an amateur model, thus increasing computational resource demands. By integrating contrastive prompts with distillation, DCD obviates the need for an amateur model and reduces memory usage. Our evaluations demonstrate that DCD significantly enhances LLM performance across a range of reasoning benchmarks, surpassing both CD and existing methods in the GSM8K and StrategyQA datasets.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification
Authors:
Tue M. Cao,
Nhat H. Tran,
Hieu H. Pham,
Hung T. Nguyen,
Le P. Nguyen
Abstract:
Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design an…
▽ More
Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design and yet not being able to reach the optimal architecture due to the uniqueness of each dataset. We overcome these challenges by proposing a novel multi-scale search space and a framework for Neural architecture search (NAS), which addresses both the problem of frequency and time resolution, discovering the suitable scale for a specific dataset. We further show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights. Our search space reaches the state-of-the-art performance on four datasets on four different domains while introducing more than ten highly fine-tuned models for each data.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
A Transition System Abstraction Framework for Neural Network Dynamical System Models
Authors:
Yejiang Yang,
Zihao Mo,
Hoang-Dung Tran,
Weiming Xiang
Abstract:
This paper proposes a transition system abstraction framework for neural network dynamical system models to enhance the model interpretability, with applications to complex dynamical systems such as human behavior learning and verification. To begin with, the localized working zone will be segmented into multiple localized partitions under the data-driven Maximum Entropy (ME) partitioning method.…
▽ More
This paper proposes a transition system abstraction framework for neural network dynamical system models to enhance the model interpretability, with applications to complex dynamical systems such as human behavior learning and verification. To begin with, the localized working zone will be segmented into multiple localized partitions under the data-driven Maximum Entropy (ME) partitioning method. Then, the transition matrix will be obtained based on the set-valued reachability analysis of neural networks. Finally, applications to human handwriting dynamics learning and verification are given to validate our proposed abstraction framework, which demonstrates the advantages of enhancing the interpretability of the black-box model, i.e., our proposed framework is able to abstract a data-driven neural network model into a transition system, making the neural network model interpretable through verifying specifications described in Computational Tree Logic (CTL) languages.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
How good are my search strings? Reflections on using an existing review as a quasi-gold standard
Authors:
Huynh Khanh Vi Tran,
Jürgen Börstler,
Nauman Bin Ali,
Michael Unterkalmsteiner
Abstract:
Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on se…
▽ More
Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on search validation using a quasi-gold standard (QGS). Furthermore, we aim at providing guidelines for search string validation. Method: We use a recently completed tertiary study as a case and complement our findings with the observations from other researchers studying and advancing EBSE. Results: We found that the issue of assessing QGS quality has not seen much attention in the literature, and the validation of automated searches in SLSs could be improved. Hence, we propose to extend the current search validation approach by the additional analysis step of the automated search validation results and provide recommendations for the QGS construction. Conclusion: In this paper, we report on new issues which could affect search completeness in SLSs. Furthermore, the proposed guideline and recommendations could help researchers implement a more reliable search strategy in their SLSs.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Policy Learning for Off-Dynamics RL with Deficient Support
Authors:
Linh Le Pham Van,
Hung The Tran,
Sunil Gupta
Abstract:
Reinforcement Learning (RL) can effectively learn complex policies. However, learning these policies often demands extensive trial-and-error interactions with the environment. In many real-world scenarios, this approach is not practical due to the high costs of data collection and safety concerns. As a result, a common strategy is to transfer a policy trained in a low-cost, rapid source simulator…
▽ More
Reinforcement Learning (RL) can effectively learn complex policies. However, learning these policies often demands extensive trial-and-error interactions with the environment. In many real-world scenarios, this approach is not practical due to the high costs of data collection and safety concerns. As a result, a common strategy is to transfer a policy trained in a low-cost, rapid source simulator to a real-world target environment. However, this process poses challenges. Simulators, no matter how advanced, cannot perfectly replicate the intricacies of the real world, leading to dynamics discrepancies between the source and target environments. Past research posited that the source domain must encompass all possible target transitions, a condition we term full support. However, expecting full support is often unrealistic, especially in scenarios where significant dynamics discrepancies arise. In this paper, our emphasis shifts to addressing large dynamics mismatch adaptation. We move away from the stringent full support condition of earlier research, focusing instead on crafting an effective policy for the target domain. Our proposed approach is simple but effective. It is anchored in the central concepts of the skewing and extension of source support towards target support to mitigate support deficiencies. Through comprehensive testing on a varied set of benchmarks, our method's efficacy stands out, showcasing notable improvements over previous techniques.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Assessing test artifact quality -- A tertiary study
Authors:
Huynh Khanh Vi Tran,
Michael Unterkalmsteiner,
Jürgen Börstler,
Nauman bin Ali
Abstract:
Context: Modern software development increasingly relies on software testing for an ever more frequent delivery of high quality software. This puts high demands on the quality of the central artifacts in software testing, test suites and test cases. Objective: We aim to develop a comprehensive model for capturing the dimensions of test case/suite quality, which are relevant for a variety of perspe…
▽ More
Context: Modern software development increasingly relies on software testing for an ever more frequent delivery of high quality software. This puts high demands on the quality of the central artifacts in software testing, test suites and test cases. Objective: We aim to develop a comprehensive model for capturing the dimensions of test case/suite quality, which are relevant for a variety of perspectives. Method: We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts. Results: We identified 49 relevant secondary studies. Of these 49 studies, less than half did some form of quality appraisal of the included primary studies and only 3 took into account the quality of the primary study when synthesizing the results. We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated. We also provide a comprehensive model of test case/suite quality with definitions for the quality attributes and measurements based on findings in the literature and ISO/IEC 25010:2011. Conclusion: The test artifact quality model presented in the paper can be used to support test artifact quality assessment and improvement initiatives in practice. Furtherm Information and Software Technology 139 (2021): 106620ore, the model can also be used as a framework for documenting context characteristics to make research results more accessible for research and practice.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
PINN-BO: A Black-box Optimization Algorithm using Physics-Informed Neural Networks
Authors:
Dat Phan-Trong,
Hung The Tran,
Alistair Shilton,
Sunil Gupta
Abstract:
Black-box optimization is a powerful approach for discovering global optima in noisy and expensive black-box functions, a problem widely encountered in real-world scenarios. Recently, there has been a growing interest in leveraging domain knowledge to enhance the efficacy of machine learning methods. Partial Differential Equations (PDEs) often provide an effective means for elucidating the fundame…
▽ More
Black-box optimization is a powerful approach for discovering global optima in noisy and expensive black-box functions, a problem widely encountered in real-world scenarios. Recently, there has been a growing interest in leveraging domain knowledge to enhance the efficacy of machine learning methods. Partial Differential Equations (PDEs) often provide an effective means for elucidating the fundamental principles governing the black-box functions. In this paper, we propose PINN-BO, a black-box optimization algorithm employing Physics-Informed Neural Networks that integrates the knowledge from Partial Differential Equations (PDEs) to improve the sample efficiency of the optimization. We analyze the theoretical behavior of our algorithm in terms of regret bound using advances in NTK theory and prove that the use of the PDE alongside the black-box function evaluations, PINN-BO leads to a tighter regret bound. We perform several experiments on a variety of optimization tasks and show that our algorithm is more sample-efficient compared to existing methods.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Stereographic Spherical Sliced Wasserstein Distances
Authors:
Huy Tran,
Yikun Bai,
Abihith Kothapalli,
Ashkan Shahbazi,
Xinran Liu,
Rocio Diaz Martin,
Soheil Kolouri
Abstract:
Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in developing computationally efficient variations of these distances for spheri…
▽ More
Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in developing computationally efficient variations of these distances for spherical probability measures. This paper introduces a high-speed and highly parallelizable distance for comparing spherical measures using the stereographic projection and the generalized Radon transform, which we refer to as the Stereographic Spherical Sliced Wasserstein (S3W) distance. We carefully address the distance distortion caused by the stereographic projection and provide an extensive theoretical analysis of our proposed metric and its rotationally invariant variation. Finally, we evaluate the performance of the proposed metrics and compare them with recent baselines in terms of both speed and accuracy through a wide range of numerical studies, including gradient flows and self-supervised learning. Our code is available at https://github.com/mint-vu/s3wd.
△ Less
Submitted 9 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
PresAIse, A Prescriptive AI Solution for Enterprises
Authors:
Wei Sun,
Scott McFaddin,
Linh Ha Tran,
Shivaram Subramanian,
Kristjan Greenewald,
Yeshi Tenzin,
Zack Xue,
Youssef Drissi,
Markus Ettl
Abstract:
Prescriptive AI represents a transformative shift in decision-making, offering causal insights and actionable recommendations. Despite its huge potential, enterprise adoption often faces several challenges. The first challenge is caused by the limitations of observational data for accurate causal inference which is typically a prerequisite for good decision-making. The second pertains to the inter…
▽ More
Prescriptive AI represents a transformative shift in decision-making, offering causal insights and actionable recommendations. Despite its huge potential, enterprise adoption often faces several challenges. The first challenge is caused by the limitations of observational data for accurate causal inference which is typically a prerequisite for good decision-making. The second pertains to the interpretability of recommendations, which is crucial for enterprise decision-making settings. The third challenge is the silos between data scientists and business users, hindering effective collaboration. This paper outlines an initiative from IBM Research, aiming to address some of these challenges by offering a suite of prescriptive AI solutions. Leveraging insights from various research papers, the solution suite includes scalable causal inference methods, interpretable decision-making approaches, and the integration of large language models (LLMs) to bridge communication gaps via a conversation agent. A proof-of-concept, PresAIse, demonstrates the solutions' potential by enabling non-ML experts to interact with prescriptive AI models via a natural language interface, democratizing advanced analytics for strategic decision-making.
△ Less
Submitted 12 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
A Class-aware Optimal Transport Approach with Higher-Order Moment Matching for Unsupervised Domain Adaptation
Authors:
Tuan Nguyen,
Van Nguyen,
Trung Le,
He Zhao,
Quan Hung Tran,
Dinh Phung
Abstract:
Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. In this paper, we introduce a novel approach called class-aware optimal transport (OT), which measures the OT distance between a distribution over the source class-conditional distributions and a mixture of source and target data distribution. Our class-aware OT leverages a c…
▽ More
Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. In this paper, we introduce a novel approach called class-aware optimal transport (OT), which measures the OT distance between a distribution over the source class-conditional distributions and a mixture of source and target data distribution. Our class-aware OT leverages a cost function that determines the matching extent between a given data example and a source class-conditional distribution. By optimizing this cost function, we find the optimal matching between target examples and source class-conditional distributions, effectively addressing the data and label shifts that occur between the two domains. To handle the class-aware OT efficiently, we propose an amortization solution that employs deep neural networks to formulate the transportation probabilities and the cost function. Additionally, we propose minimizing class-aware Higher-order Moment Matching (HMM) to align the corresponding class regions on the source and target domains. The class-aware HMM component offers an economical computational approach for accurately evaluating the HMM distance between the two distributions. Extensive experiments on benchmark datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art baselines.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Context-Aware Stress Monitoring using Wearable and Mobile Technologies in Everyday Settings
Authors:
Seyed Amir Hossein Aqajari,
Sina Labbaf,
Phuc Hoang Tran,
Brenda Nguyen,
Milad Asgari Mehrabadi,
Marco Levorato,
Nikil Dutt,
Amir M. Rahmani
Abstract:
Daily monitoring of stress is a critical component of maintaining optimal physical and mental health. Physiological signals and contextual information have recently emerged as promising indicators for detecting instances of heightened stress. Nonetheless, developing a real-time monitoring system that utilizes both physiological and contextual data to anticipate stress levels in everyday settings w…
▽ More
Daily monitoring of stress is a critical component of maintaining optimal physical and mental health. Physiological signals and contextual information have recently emerged as promising indicators for detecting instances of heightened stress. Nonetheless, developing a real-time monitoring system that utilizes both physiological and contextual data to anticipate stress levels in everyday settings while also gathering stress labels from participants represents a significant challenge. We present a monitoring system that objectively tracks daily stress levels by utilizing both physiological and contextual data in a daily-life environment. Additionally, we have integrated a smart labeling approach to optimize the ecological momentary assessment (EMA) collection, which is required for building machine learning models for stress detection. We propose a three-tier Internet-of-Things-based system architecture to address the challenges. We utilized a cross-validation technique to accurately estimate the performance of our stress models. We achieved the F1-score of 70\% with a Random Forest classifier using both PPG and contextual data, which is considered an acceptable score in models built for everyday settings. Whereas using PPG data alone, the highest F1-score achieved is approximately 56\%, emphasizing the significance of incorporating both PPG and contextual data in stress detection tasks.
△ Less
Submitted 14 December, 2023;
originally announced January 2024.
-
README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP
Authors:
Zonghai Yao,
Nandyala Siddharth Kantu,
Guanghao Wei,
Hieu Tran,
Zhangqi Duan,
Sunjae Kwon,
Zhichao Yang,
README annotation team,
Hong Yu
Abstract:
The advancement in healthcare has shifted focus toward patient-centric approaches, particularly in self-care and patient education, facilitated by access to Electronic Health Records (EHR). However, medical jargon in EHRs poses significant challenges in patient comprehension. To address this, we introduce a new task of automatically generating lay definitions, aiming to simplify complex medical te…
▽ More
The advancement in healthcare has shifted focus toward patient-centric approaches, particularly in self-care and patient education, facilitated by access to Electronic Health Records (EHR). However, medical jargon in EHRs poses significant challenges in patient comprehension. To address this, we introduce a new task of automatically generating lay definitions, aiming to simplify complex medical terms into patient-friendly lay language. We first created the README dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions, each offering context-aware lay definitions manually annotated by domain experts. We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality. We then used README as the training data for models and leveraged a Retrieval-Augmented Generation method to reduce hallucinations and improve the quality of model outputs. Our extensive automatic and human evaluations demonstrate that open-source mobile-friendly models, when fine-tuned with high-quality data, are capable of matching or even surpassing the performance of state-of-the-art closed-source large language models like ChatGPT. This research represents a significant stride in closing the knowledge gap in patient education and advancing patient-centric healthcare solutions.
△ Less
Submitted 16 June, 2024; v1 submitted 24 December, 2023;
originally announced December 2023.
-
IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis
Authors:
Tue Minh Cao,
Nhat Hong Tran,
Le Phi Nguyen,
Hieu Huy Pham,
Hung Thanh Nguyen
Abstract:
Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha…
▽ More
Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques that are aimed at tackling the formidable challenges of severe imbalance dataset PTB-XL and gradient corruption. By this means, we manage to set a new height for deep learning model in a supervised learning manner across the majority of tasks. Our model consistently surpasses InceptionTime by substantial margins compared to other state-of-the-arts in this domain, noticeably 0.013 AUROC score improvement in the "all" task, while also mitigating the inherent dataset fluctuations during training.
△ Less
Submitted 16 November, 2023;
originally announced December 2023.
-
Weakly-semi-supervised object detection in remotely sensed imagery
Authors:
Ji Hun Wang,
Jeremy Irvin,
Beri Kohen Behar,
Ha Tran,
Raghav Samavedam,
Quentin Hsu,
Andrew Y. Ng
Abstract:
Deep learning for detecting objects in remotely sensed imagery can enable new technologies for important applications including mitigating climate change. However, these models often require large datasets labeled with bounding box annotations which are expensive to curate, prohibiting the development of models for new tasks and geographies. To address this challenge, we develop weakly-semi-superv…
▽ More
Deep learning for detecting objects in remotely sensed imagery can enable new technologies for important applications including mitigating climate change. However, these models often require large datasets labeled with bounding box annotations which are expensive to curate, prohibiting the development of models for new tasks and geographies. To address this challenge, we develop weakly-semi-supervised object detection (WSSOD) models on remotely sensed imagery which can leverage a small amount of bounding boxes together with a large amount of point labels that are easy to acquire at scale in geospatial data. We train WSSOD models which use large amounts of point-labeled images with varying fractions of bounding box labeled images in FAIR1M and a wind turbine detection dataset, and demonstrate that they substantially outperform fully supervised models trained with the same amount of bounding box labeled images on both datasets. Furthermore, we find that the WSSOD models trained with 2-10x fewer bounding box labeled images can perform similarly to or outperform fully supervised models trained on the full set of bounding-box labeled images. We believe that the approach can be extended to other remote sensing tasks to reduce reliance on bounding box labels and increase development of models for impactful applications.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
A Supervised Contrastive Learning Pretrain-Finetune Approach for Time Series
Authors:
Trang H. Tran,
Lam M. Nguyen,
Kyongmin Yeo,
Nam Nguyen,
Roman Vaculin
Abstract:
Foundation models have recently gained attention within the field of machine learning thanks to its efficiency in broad data processing. While researchers had attempted to extend this success to time series models, the main challenge is effectively extracting representations and transferring knowledge from pretraining datasets to the target finetuning dataset. To tackle this issue, we introduce a…
▽ More
Foundation models have recently gained attention within the field of machine learning thanks to its efficiency in broad data processing. While researchers had attempted to extend this success to time series models, the main challenge is effectively extracting representations and transferring knowledge from pretraining datasets to the target finetuning dataset. To tackle this issue, we introduce a novel pretraining procedure that leverages supervised contrastive learning to distinguish features within each pretraining dataset. This pretraining phase enables a probabilistic similarity metric, which assesses the likelihood of a univariate sample being closely related to one of the pretraining datasets. Subsequently, using this similarity metric as a guide, we propose a fine-tuning procedure designed to enhance the accurate prediction of the target data by aligning it more closely with the learned dynamics of the pretraining datasets. Our experiments have shown promising results which demonstrate the efficacy of our approach.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Orthogonally weighted $\ell_{2,1}$ regularization for rank-aware joint sparse recovery: algorithm and analysis
Authors:
Armenak Petrosyan,
Konstantin Pieper,
Hoang Tran
Abstract:
We propose and analyze an efficient algorithm for solving the joint sparse recovery problem using a new regularization-based method, named orthogonally weighted $\ell_{2,1}$ ($\mathit{ow}\ell_{2,1}$), which is specifically designed to take into account the rank of the solution matrix. This method has applications in feature extraction, matrix column selection, and dictionary learning, and it is di…
▽ More
We propose and analyze an efficient algorithm for solving the joint sparse recovery problem using a new regularization-based method, named orthogonally weighted $\ell_{2,1}$ ($\mathit{ow}\ell_{2,1}$), which is specifically designed to take into account the rank of the solution matrix. This method has applications in feature extraction, matrix column selection, and dictionary learning, and it is distinct from commonly used $\ell_{2,1}$ regularization and other existing regularization-based approaches because it can exploit the full rank of the row-sparse solution matrix, a key feature in many applications. We provide a proof of the method's rank-awareness, establish the existence of solutions to the proposed optimization problem, and develop an efficient algorithm for solving it, whose convergence is analyzed. We also present numerical experiments to illustrate the theory and demonstrate the effectiveness of our method on real-life problems.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
SniffyArt: The Dataset of Smelling Persons
Authors:
Mathias Zinnen,
Azhar Hussian,
Hang Tran,
Prathmesh Madhu,
Andreas Maier,
Vincent Christlein
Abstract:
Smell gestures play a crucial role in the investigation of past smells in the visual arts yet their automated recognition poses significant challenges. This paper introduces the SniffyArt dataset, consisting of 1941 individuals represented in 441 historical artworks. Each person is annotated with a tightly fitting bounding box, 17 pose keypoints, and a gesture label. By integrating these annotatio…
▽ More
Smell gestures play a crucial role in the investigation of past smells in the visual arts yet their automated recognition poses significant challenges. This paper introduces the SniffyArt dataset, consisting of 1941 individuals represented in 441 historical artworks. Each person is annotated with a tightly fitting bounding box, 17 pose keypoints, and a gesture label. By integrating these annotations, the dataset enables the development of hybrid classification approaches for smell gesture recognition. The datasets high-quality human pose estimation keypoints are achieved through the merging of five separate sets of keypoint annotations per person. The paper also presents a baseline analysis, evaluating the performance of representative algorithms for detection, keypoint estimation, and classification tasks, showcasing the potential of combining keypoint estimation with smell gesture classification. The SniffyArt dataset lays a solid foundation for future research and the exploration of multi-task approaches leveraging pose keypoints and person boxes to advance human gesture and olfactory dimension analysis in historical artworks.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with Weak Supervision on Sentence Classification
Authors:
Zhongfen Deng,
Seunghyun Yoon,
Trung Bui,
Franck Dernoncourt,
Quan Hung Tran,
Shuaiqi Liu,
Wenting Zhao,
Tao Zhang,
Yibo Wang,
Philip S. Yu
Abstract:
Aspect-based meeting transcript summarization aims to produce multiple summaries, each focusing on one aspect of content in a meeting transcript. It is challenging as sentences related to different aspects can mingle together, and those relevant to a specific aspect can be scattered throughout the long transcript of a meeting. The traditional summarization methods produce one summary mixing inform…
▽ More
Aspect-based meeting transcript summarization aims to produce multiple summaries, each focusing on one aspect of content in a meeting transcript. It is challenging as sentences related to different aspects can mingle together, and those relevant to a specific aspect can be scattered throughout the long transcript of a meeting. The traditional summarization methods produce one summary mixing information of all aspects, which cannot deal with the above challenges of aspect-based meeting transcript summarization. In this paper, we propose a two-stage method for aspect-based meeting transcript summarization. To select the input content related to specific aspects, we train a sentence classifier on a dataset constructed from the AMI corpus with pseudo-labeling. Then we merge the sentences selected for a specific aspect as the input for the summarizer to produce the aspect-based summary. Experimental results on the AMI corpus outperform many strong baselines, which verifies the effectiveness of our proposed method.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing
Authors:
Hieu Tran,
Zhichao Yang,
Zonghai Yao,
Hong Yu
Abstract:
To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles. We created the BioInstruct, comprising 25,005 instructions to instruction-tune LLMs(LLaMA 1 & 2, 7B & 13B version). The instructions were created by prompting th…
▽ More
To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles. We created the BioInstruct, comprising 25,005 instructions to instruction-tune LLMs(LLaMA 1 & 2, 7B & 13B version). The instructions were created by prompting the GPT-4 language model with three-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation(LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into three major categories: question answering(QA), information extraction(IE), and text generation(GEN). We also examined whether categories(e.g., QA, IE, and generation) of instructions impact model performance. Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA, 5.7% in IE, and 96% in Generation tasks. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between two tasks. The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.
△ Less
Submitted 6 June, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from Chest X-ray Images
Authors:
Huyen Tran,
Duc Thanh Nguyen,
John Yearwood
Abstract:
Medical image analysis using computer-based algorithms has attracted considerable attention from the research community and achieved tremendous progress in the last decade. With recent advances in computing resources and availability of large-scale medical image datasets, many deep learning models have been developed for disease diagnosis from medical images. However, existing techniques focus on…
▽ More
Medical image analysis using computer-based algorithms has attracted considerable attention from the research community and achieved tremendous progress in the last decade. With recent advances in computing resources and availability of large-scale medical image datasets, many deep learning models have been developed for disease diagnosis from medical images. However, existing techniques focus on sub-tasks, e.g., disease classification and identification, individually, while there is a lack of a unified framework enabling multi-task diagnosis. Inspired by the capability of Vision Transformers in both local and global representation learning, we propose in this paper a new method, namely Multi-task Vision Transformer (MVC) for simultaneously classifying chest X-ray images and identifying affected regions from the input data. Our method is built upon the Vision Transformer but extends its learning capability in a multi-task setting. We evaluated our proposed method and compared it with existing baselines on a benchmark dataset of COVID-19 chest X-ray images. Experimental results verified the superiority of the proposed method over the baselines on both the image classification and affected region identification tasks.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
Authors:
Minh-Tuan Tran,
Trung Le,
Xuan-May Le,
Mehrtash Harandi,
Quan Hung Tran,
Dinh Phung
Abstract:
Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models s…
▽ More
Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in prolonging training times and low-quality outputs. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the random source from the input to a noisy layer and utilizes the meaningful constant label-text embedding (LTE) as the input. LTE is generated by using the language model once, and then it is stored in memory for all subsequent training processes. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches. The code is available at https://github.com/tmtuan1307/nayer.
△ Less
Submitted 21 March, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
On Generating Explanations for Reinforcement Learning Policies: An Empirical Study
Authors:
Mikihisa Yuasa,
Huy T. Tran,
Ramavarapu S. Sreenivas
Abstract:
Understanding a \textit{reinforcement learning} policy, which guides state-to-action mappings to maximize rewards, necessitates an accompanying explanation for human comprehension. In this paper, we introduce a set of \textit{linear temporal logic} (LTL) formulae designed to provide explanations for policies, and an algorithm for searching through those formulae for the one that best explains a gi…
▽ More
Understanding a \textit{reinforcement learning} policy, which guides state-to-action mappings to maximize rewards, necessitates an accompanying explanation for human comprehension. In this paper, we introduce a set of \textit{linear temporal logic} (LTL) formulae designed to provide explanations for policies, and an algorithm for searching through those formulae for the one that best explains a given policy. Our focus is on crafting explanations that elucidate both the ultimate objectives accomplished by the policy and the prerequisite conditions it upholds throughout its execution. These LTL-based explanations feature a structured representation, which is particularly well-suited for local-search techniques. The effectiveness of our proposed approach is illustrated through a simulated game of capture the flag and a car-parking environment. The paper concludes with suggested directions for future
△ Less
Submitted 5 March, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Test-Case Quality -- Understanding Practitioners' Perspectives
Authors:
Huynh Khanh Vi Tran,
Nauman Bin Ali,
Jürgen Börstler,
Michael Unterkalmsteiner
Abstract:
Background: Test-case quality has always been one of the major concerns in software testing. To improve test-case quality, it is important to better understand how practitioners perceive the quality of test-cases. Objective: Motivated by that need, we investigated how practitioners define test-case quality and which aspects of test-cases are important for quality assessment. Method: We conducted s…
▽ More
Background: Test-case quality has always been one of the major concerns in software testing. To improve test-case quality, it is important to better understand how practitioners perceive the quality of test-cases. Objective: Motivated by that need, we investigated how practitioners define test-case quality and which aspects of test-cases are important for quality assessment. Method: We conducted semi-structured interviews with professional developers, testers and test architects from a multinational software company in Sweden. Before the interviews, we asked participants for actual test cases (written in natural language) that they perceive as good, normal, and bad respectively together with rationales for their assessment. We also compared their opinions on shared test cases and contrasted their views with the relevant literature. Results: We present a quality model which consists of 11 test-case quality attributes. We also identify a misalignment in defining test-case quality among practitioners and between academia and industry, along with suggestions for improving test-case quality in industry. Conclusion: The results show that practitioners' background, including roles and working experience, are critical dimensions of how test-case quality is defined and assessed.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
A Comprehensive Survey of Document-level Relation Extraction (2016-2023)
Authors:
Julien Delaunay,
Hanh Thi Hong Tran,
Carlos-Emiliano González-Gallardo,
Georgeta Bordea,
Nicolas Sidere,
Antoine Doucet
Abstract:
Document-level relation extraction (DocRE) is an active area of research in natural language processing (NLP) concerned with identifying and extracting relationships between entities beyond sentence boundaries. Compared to the more traditional sentence-level relation extraction, DocRE provides a broader context for analysis and is more challenging because it involves identifying relationships that…
▽ More
Document-level relation extraction (DocRE) is an active area of research in natural language processing (NLP) concerned with identifying and extracting relationships between entities beyond sentence boundaries. Compared to the more traditional sentence-level relation extraction, DocRE provides a broader context for analysis and is more challenging because it involves identifying relationships that may span multiple sentences or paragraphs. This task has gained increased interest as a viable solution to build and populate knowledge bases automatically from unstructured large-scale documents (e.g., scientific papers, legal contracts, or news articles), in order to have a better understanding of relationships between entities. This paper aims to provide a comprehensive overview of recent advances in this field, highlighting its different applications in comparison to sentence-level relation extraction.
△ Less
Submitted 12 October, 2023; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Partial Transport for Point-Cloud Registration
Authors:
Yikun Bai,
Huy Tran,
Steven B. Damelin,
Soheil Kolouri
Abstract:
Point cloud registration plays a crucial role in various fields, including robotics, computer graphics, and medical imaging. This process involves determining spatial relationships between different sets of points, typically within a 3D space. In real-world scenarios, complexities arise from non-rigid movements and partial visibility, such as occlusions or sensor noise, making non-rigid registrati…
▽ More
Point cloud registration plays a crucial role in various fields, including robotics, computer graphics, and medical imaging. This process involves determining spatial relationships between different sets of points, typically within a 3D space. In real-world scenarios, complexities arise from non-rigid movements and partial visibility, such as occlusions or sensor noise, making non-rigid registration a challenging problem. Classic non-rigid registration methods are often computationally demanding, suffer from unstable performance, and, importantly, have limited theoretical guarantees. The optimal transport problem and its unbalanced variations (e.g., the optimal partial transport problem) have emerged as powerful tools for point-cloud registration, establishing a strong benchmark in this field. These methods view point clouds as empirical measures and provide a mathematically rigorous way to quantify the `correspondence' between (the transformed) source and target points. In this paper, we approach the point-cloud registration problem through the lens of optimal transport theory and first propose a comprehensive set of non-rigid registration methods based on the optimal partial transportation problem. Subsequently, leveraging the emerging work on efficient solutions to the one-dimensional optimal partial transport problem, we extend our proposed algorithms via slicing to gain significant computational efficiency, resulting in fast and robust non-rigid registration algorithms. We demonstrate the effectiveness of our proposed methods and compare them against baselines on various 3D and 2D non-rigid registration problems where the source and target point clouds are corrupted by random noise.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency
Authors:
Long Tan Le,
Tuan Dung Nguyen,
Tung-Anh Nguyen,
Choong Seon Hong,
Nguyen H. Tran
Abstract:
Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by commun…
▽ More
Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by communication bottlenecks, data heterogeneity, and memory limitations. To address these challenges jointly, we introduce FeDEQ, a pioneering FL framework that effectively employs deep equilibrium learning and consensus optimization to exploit a compact shared data representation across edge nodes, allowing the derivation of personalized models specific to each node. We delve into a unique model structure composed of an equilibrium layer followed by traditional neural network layers. Here, the equilibrium layer functions as a global feature representation that edge nodes can adapt to personalize their local layers. Capitalizing on FeDEQ's compactness and representation power, we present a novel distributed algorithm rooted in the alternating direction method of multipliers (ADMM) consensus optimization and theoretically establish its convergence for smooth objectives. Experiments across various benchmarks demonstrate that FeDEQ achieves performance comparable to state-of-the-art personalized methods while employing models of up to 4 times smaller in communication size and 1.5 times lower memory footprint during training.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Authors:
Yevgen Chebotar,
Quan Vuong,
Alex Irpan,
Karol Hausman,
Fei Xia,
Yao Lu,
Aviral Kumar,
Tianhe Yu,
Alexander Herzog,
Karl Pertsch,
Keerthana Gopalakrishnan,
Julian Ibarz,
Ofir Nachum,
Sumedh Sontakke,
Grecia Salazar,
Huong T Tran,
Jodilyn Peralta,
Clayton Tan,
Deeksha Manjunath,
Jaspiar Singht,
Brianna Zitkovich,
Tomas Jackson,
Kanishka Rao,
Chelsea Finn,
Sergey Levine
Abstract:
In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizi…
▽ More
In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://qtransformer.github.io
△ Less
Submitted 17 October, 2023; v1 submitted 18 September, 2023;
originally announced September 2023.
-
PolyGET: Accelerating Polymer Simulations by Accurate and Generalizable Forcefield with Equivariant Transformer
Authors:
Rui Feng,
Huan Tran,
Aubrey Toland,
Binghong Chen,
Qi Zhu,
Rampi Ramprasad,
Chao Zhang
Abstract:
Polymer simulation with both accuracy and efficiency is a challenging task. Machine learning (ML) forcefields have been developed to achieve both the accuracy of ab initio methods and the efficiency of empirical force fields. However, existing ML force fields are usually limited to single-molecule settings, and their simulations are not robust enough. In this paper, we present PolyGET, a new frame…
▽ More
Polymer simulation with both accuracy and efficiency is a challenging task. Machine learning (ML) forcefields have been developed to achieve both the accuracy of ab initio methods and the efficiency of empirical force fields. However, existing ML force fields are usually limited to single-molecule settings, and their simulations are not robust enough. In this paper, we present PolyGET, a new framework for Polymer Forcefields with Generalizable Equivariant Transformers. PolyGET is designed to capture complex quantum interactions between atoms and generalize across various polymer families, using a deep learning model called Equivariant Transformers. We propose a new training paradigm that focuses exclusively on optimizing forces, which is different from existing methods that jointly optimize forces and energy. This simple force-centric objective function avoids competing objectives between energy and forces, thereby allowing for learning a unified forcefield ML model over different polymer families. We evaluated PolyGET on a large-scale dataset of 24 distinct polymer types and demonstrated state-of-the-art performance in force accuracy and robust MD simulations. Furthermore, PolyGET can simulate large polymers with high fidelity to the reference ab initio DFT method while being able to generalize to unseen polymers.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
A Text-based Approach For Link Prediction on Wikipedia Articles
Authors:
Anh Hoang Tran,
Tam Minh Nguyen,
Son T. Luu
Abstract:
This paper present our work in the DSAA 2023 Challenge about Link Prediction for Wikipedia Articles. We use traditional machine learning models with POS tags (part-of-speech tags) features extracted from text to train the classification model for predicting whether two nodes has the link. Then, we use these tags to test on various machine learning models. We obtained the results by F1 score at 0.9…
▽ More
This paper present our work in the DSAA 2023 Challenge about Link Prediction for Wikipedia Articles. We use traditional machine learning models with POS tags (part-of-speech tags) features extracted from text to train the classification model for predicting whether two nodes has the link. Then, we use these tags to test on various machine learning models. We obtained the results by F1 score at 0.99999 and got 7th place in the competition. Our source code is publicly available at this link: https://github.com/Tam1032/DSAA2023-Challenge-Link-prediction-DS-UIT_SAT
△ Less
Submitted 6 November, 2023; v1 submitted 1 September, 2023;
originally announced September 2023.
-
May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations
Authors:
Rui Feng,
Qi Zhu,
Huan Tran,
Binghong Chen,
Aubrey Toland,
Rampi Ramprasad,
Chao Zhang
Abstract:
Recent works have shown the promise of learning pre-trained models for 3D molecular representation. However, existing pre-training models focus predominantly on equilibrium data and largely overlook off-equilibrium conformations. It is challenging to extend these methods to off-equilibrium data because their training objective relies on assumptions of conformations being the local energy minima. W…
▽ More
Recent works have shown the promise of learning pre-trained models for 3D molecular representation. However, existing pre-training models focus predominantly on equilibrium data and largely overlook off-equilibrium conformations. It is challenging to extend these methods to off-equilibrium data because their training objective relies on assumptions of conformations being the local energy minima. We address this gap by proposing a force-centric pretraining model for 3D molecular conformations covering both equilibrium and off-equilibrium data. For off-equilibrium data, our model learns directly from their atomic forces. For equilibrium data, we introduce zero-force regularization and forced-based denoising techniques to approximate near-equilibrium forces. We obtain a unified pre-trained model for 3D molecular representation with over 15 million diverse conformations. Experiments show that, with our pre-training objective, we increase forces accuracy by around 3 times compared to the un-pre-trained Equivariant Transformer model. By incorporating regularizations on equilibrium data, we solved the problem of unstable MD simulations in vanilla Equivariant Transformers, achieving state-of-the-art simulation performance with 2.45 times faster inference time than NequIP. As a powerful molecular encoder, our pre-trained model achieves on-par performance with state-of-the-art property prediction tasks.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Learning Networks from Gaussian Graphical Models and Gaussian Free Fields
Authors:
Subhro Ghosh,
Soumendu Sundar Mukherjee,
Hoang-Son Tran,
Ujan Gangopadhyay
Abstract:
We investigate the problem of estimating the structure of a weighted network from repeated measurements of a Gaussian Graphical Model (GGM) on the network. In this vein, we consider GGMs whose covariance structures align with the geometry of the weighted network on which they are based. Such GGMs have been of longstanding interest in statistical physics, and are referred to as the Gaussian Free Fi…
▽ More
We investigate the problem of estimating the structure of a weighted network from repeated measurements of a Gaussian Graphical Model (GGM) on the network. In this vein, we consider GGMs whose covariance structures align with the geometry of the weighted network on which they are based. Such GGMs have been of longstanding interest in statistical physics, and are referred to as the Gaussian Free Field (GFF). In recent years, they have attracted considerable interest in the machine learning and theoretical computer science. In this work, we propose a novel estimator for the weighted network (equivalently, its Laplacian) from repeated measurements of a GFF on the network, based on the Fourier analytic properties of the Gaussian distribution. In this pursuit, our approach exploits complex-valued statistics constructed from observed data, that are of interest on their own right. We demonstrate the effectiveness of our estimator with concrete recovery guarantees and bounds on the required sample complexity. In particular, we show that the proposed statistic achieves the parametric rate of estimation for fixed network size. In the setting of networks growing with sample size, our results show that for Erdos-Renyi random graphs $G(d,p)$ above the connectivity threshold, we demonstrate that network recovery takes place with high probability as soon as the sample size $n$ satisfies $n \gg d^4 \log d \cdot p^{-2}$.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Authors:
Anthony Brohan,
Noah Brown,
Justice Carbajal,
Yevgen Chebotar,
Xi Chen,
Krzysztof Choromanski,
Tianli Ding,
Danny Driess,
Avinava Dubey,
Chelsea Finn,
Pete Florence,
Chuyuan Fu,
Montse Gonzalez Arenas,
Keerthana Gopalakrishnan,
Kehang Han,
Karol Hausman,
Alexander Herzog,
Jasmine Hsu,
Brian Ichter,
Alex Irpan,
Nikhil Joshi,
Ryan Julian,
Dmitry Kalashnikov,
Yuheng Kuang,
Isabel Leal
, et al. (29 additional authors not shown)
Abstract:
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.…
▽ More
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).
△ Less
Submitted 28 July, 2023;
originally announced July 2023.