-
Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Authors:
Jianhua Wu,
Bingzhao Gao,
Jincheng Gao,
Jianhao Yu,
Hongqing Chu,
Qiankun Yu,
Xun Gong,
Yi Chang,
H. Eric Tseng,
Hong Chen,
Jie Chen
Abstract:
With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reas…
▽ More
With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.
△ Less
Submitted 17 May, 2024; v1 submitted 8 December, 2023;
originally announced May 2024.
-
QNCD: Quantization Noise Correction for Diffusion Models
Authors:
Huanpeng Chu,
Wei Wu,
Chengjie Zang,
Kun Yuan
Abstract:
Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity. However, their widespread adoption is hindered by the intensive computation required during the iterative denoising process. Post-training quantization (PTQ) presents a solution to accelerate sampling, aibeit at the expense of sample quality, extremely in low-bit settings. Addressing this, our s…
▽ More
Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity. However, their widespread adoption is hindered by the intensive computation required during the iterative denoising process. Post-training quantization (PTQ) presents a solution to accelerate sampling, aibeit at the expense of sample quality, extremely in low-bit settings. Addressing this, our study introduces a unified Quantization Noise Correction Scheme (QNCD), aimed at minishing quantization noise throughout the sampling process. We identify two primary quantization challenges: intra and inter quantization noise. Intra quantization noise, mainly exacerbated by embeddings in the resblock module, extends activation quantization ranges, increasing disturbances in each single denosing step. Besides, inter quantization noise stems from cumulative quantization deviations across the entire denoising process, altering data distributions step-by-step. QNCD combats these through embedding-derived feature smoothing for eliminating intra quantization noise and an effective runtime noise estimatiation module for dynamicly filtering inter quantization noise. Extensive experiments demonstrate that our method outperforms previous quantization methods for diffusion models, achieving lossless results in W4A8 and W8A8 quantization settings on ImageNet (LDM-4). Code is available at: https://github.com/huanpengchu/QNCD
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization
Authors:
Jae-hee So,
Joonhwan Chang,
Eunji Kim,
Junho Na,
JiYeon Choi,
Jy-yong Sohn,
Byung-Hoon Kim,
Sang Hui Chu
Abstract:
Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interview…
▽ More
Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interviews, by analyzing counseling data from North Korean defectors with traumatic events and mental health issues. Specifically, we investigate whether LLMs can (1) delineate the part of the conversation that suggests psychiatric symptoms and name the symptoms, and (2) summarize stressors and symptoms, based on the interview dialogue transcript. Here, the transcript data was labeled by mental health experts for training and evaluation of LLMs. Our experimental results show that appropriately prompted LLMs can achieve high performance on both the symptom delineation task and the summarization task. This research contributes to the nascent field of applying LLMs to psychiatric interview and demonstrates their potential effectiveness in aiding mental health practitioners.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification
Authors:
Hongbo Chu,
Qiehe Sun,
Jiawen Li,
Yuxuan Chen,
Lizhong Zhang,
Tian Guan,
Anjia Han,
Yonghong He
Abstract:
Histopathological whole slide image (WSI) analysis with deep learning has become a research focus in computational pathology. The current paradigm is mainly based on multiple instance learning (MIL), in which approaches with Transformer as the backbone are well discussed. These methods convert WSI tasks into sequence tasks by representing patches as tokens in the WSI sequence. However, the feature…
▽ More
Histopathological whole slide image (WSI) analysis with deep learning has become a research focus in computational pathology. The current paradigm is mainly based on multiple instance learning (MIL), in which approaches with Transformer as the backbone are well discussed. These methods convert WSI tasks into sequence tasks by representing patches as tokens in the WSI sequence. However, the feature complexity brought by high heterogeneity and the ultra-long sequences brought by gigapixel size makes Transformer-based MIL suffer from the challenges of high memory consumption, slow inference speed, and lack of performance. To this end, we propose a retentive MIL method called RetMIL, which processes WSI sequences through hierarchical feature propagation structure. At the local level, the WSI sequence is divided into multiple subsequences. Tokens of each subsequence are updated through a parallel linear retention mechanism and aggregated utilizing an attention layer. At the global level, subsequences are fused into a global sequence, then updated through a serial retention mechanism, and finally the slide-level representation is obtained through a global attention pooling. We conduct experiments on two public CAMELYON and BRACS datasets and an public-internal LUNG dataset, confirming that RetMIL not only achieves state-of-the-art performance but also significantly reduces computational overhead. Our code will be accessed shortly.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
Authors:
Jiawen Li,
Yuxuan Chen,
Hongbo Chu,
Qiehe Sun,
Tian Guan,
Anjia Han,
Yonghong He
Abstract:
Histopathological whole slide images (WSIs) classification has become a foundation task in medical microscopic imaging processing. Prevailing approaches involve learning WSIs as instance-bag representations, emphasizing significant instances but struggling to capture the interactions between instances. Additionally, conventional graph representation methods utilize explicit spatial positions to co…
▽ More
Histopathological whole slide images (WSIs) classification has become a foundation task in medical microscopic imaging processing. Prevailing approaches involve learning WSIs as instance-bag representations, emphasizing significant instances but struggling to capture the interactions between instances. Additionally, conventional graph representation methods utilize explicit spatial positions to construct topological structures but restrict the flexible interaction capabilities between instances at arbitrary locations, particularly when spatially distant. In response, we propose a novel dynamic graph representation algorithm that conceptualizes WSIs as a form of the knowledge graph structure. Specifically, we dynamically construct neighbors and directed edge embeddings based on the head and tail relationships between instances. Then, we devise a knowledge-aware attention mechanism that can update the head node features by learning the joint attention score of each neighbor and edge. Finally, we obtain a graph-level embedding through the global pooling process of the updated head, serving as an implicit representation for the WSI classification. Our end-to-end graph representation learning approach has outperformed the state-of-the-art WSI analysis methods on three TCGA benchmark datasets and in-house test sets. Our code is available at https://github.com/WonderLandxD/WiKG.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Multiple Population Alternate Evolution Neural Architecture Search
Authors:
Juan Zou,
Han Chu,
Yizhang Xia,
Junwen Xu,
Yuan Liu,
Zhanglu Hou
Abstract:
The effectiveness of Evolutionary Neural Architecture Search (ENAS) is influenced by the design of the search space. Nevertheless, common methods including the global search space, scalable search space and hierarchical search space have certain limitations. Specifically, the global search space requires a significant amount of computational resources and time, the scalable search space sacrifices…
▽ More
The effectiveness of Evolutionary Neural Architecture Search (ENAS) is influenced by the design of the search space. Nevertheless, common methods including the global search space, scalable search space and hierarchical search space have certain limitations. Specifically, the global search space requires a significant amount of computational resources and time, the scalable search space sacrifices the diversity of network structures and the hierarchical search space increases the search cost in exchange for network diversity. To address above limitation, we propose a novel paradigm of searching neural network architectures and design the Multiple Population Alternate Evolution Neural Architecture Search (MPAE), which can achieve module diversity with a smaller search cost. MPAE converts the search space into L interconnected units and sequentially searches the units, then the above search of the entire network be cycled several times to reduce the impact of previous units on subsequent units. To accelerate the population evolution process, we also propose the the population migration mechanism establishes an excellent migration archive and transfers the excellent knowledge and experience in the migration archive to new populations. The proposed method requires only 0.3 GPU days to search a neural network on the CIFAR dataset and achieves the state-of-the-art results.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Implicit Regularization via Spectral Neural Networks and Non-linear Matrix Sensing
Authors:
Hong T. M. Chu,
Subhro Ghosh,
Chi Thanh Lam,
Soumendu Sundar Mukherjee
Abstract:
The phenomenon of implicit regularization has attracted interest in recent years as a fundamental aspect of the remarkable generalizing ability of neural networks. In a nutshell, it entails that gradient descent dynamics in many neural nets, even without any explicit regularizer in the loss function, converges to the solution of a regularized learning problem. However, known results attempting to…
▽ More
The phenomenon of implicit regularization has attracted interest in recent years as a fundamental aspect of the remarkable generalizing ability of neural networks. In a nutshell, it entails that gradient descent dynamics in many neural nets, even without any explicit regularizer in the loss function, converges to the solution of a regularized learning problem. However, known results attempting to theoretically explain this phenomenon focus overwhelmingly on the setting of linear neural nets, and the simplicity of the linear structure is particularly crucial to existing arguments. In this paper, we explore this problem in the context of more realistic neural networks with a general class of non-linear activation functions, and rigorously demonstrate the implicit regularization phenomenon for such networks in the setting of matrix sensing problems, together with rigorous rate guarantees that ensure exponentially fast convergence of gradient descent.In this vein, we contribute a network architecture called Spectral Neural Networks (abbrv. SNN) that is particularly suitable for matrix learning problems. Conceptually, this entails coordinatizing the space of matrices by their singular values and singular vectors, as opposed to by their entries, a potentially fruitful perspective for matrix learning. We demonstrate that the SNN architecture is inherently much more amenable to theoretical analysis than vanilla neural nets and confirm its effectiveness in the context of matrix sensing, via both mathematical guarantees and empirical investigations. We believe that the SNN architecture has the potential to be of wide applicability in a broad class of matrix learning scenarios.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
CTGAN: Semantic-guided Conditional Texture Generator for 3D Shapes
Authors:
Yi-Ting Pan,
Chai-Rong Lee,
Shu-Ho Fan,
Jheng-Wei Su,
Jia-Bin Huang,
Yung-Yu Chuang,
Hung-Kuo Chu
Abstract:
The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guide…
▽ More
The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guided Conditional Texture Generator (CTGAN), producing high-quality textures for 3D shapes that are consistent with the viewing angle while respecting shape semantics. CTGAN utilizes the disentangled nature of StyleGAN to finely manipulate the input latent codes, enabling explicit control over both the style and structure of the generated textures. A coarse-to-fine encoder architecture is introduced to enhance control over the structure of the resulting textures via input segmentation. Experimental results show that CTGAN outperforms existing methods on multiple quality metrics and achieves state-of-the-art performance on texture generation in both conditional and unconditional settings.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Enabling Technologies for Web 3.0: A Comprehensive Survey
Authors:
Md Arif Hassan,
Mohammad Behdad Jamshidi,
Bui Duc Manh,
Nam H. Chu,
Chi-Hieu Nguyen,
Nguyen Quang Hieu,
Cong T. Nguyen,
Dinh Thai Hoang,
Diep N. Nguyen,
Nguyen Van Huynh,
Mohammad Abu Alsheikh,
Eryk Dutkiewicz
Abstract:
Web 3.0 represents the next stage of Internet evolution, aiming to empower users with increased autonomy, efficiency, quality, security, and privacy. This evolution can potentially democratize content access by utilizing the latest developments in enabling technologies. In this paper, we conduct an in-depth survey of enabling technologies in the context of Web 3.0, such as blockchain, semantic web…
▽ More
Web 3.0 represents the next stage of Internet evolution, aiming to empower users with increased autonomy, efficiency, quality, security, and privacy. This evolution can potentially democratize content access by utilizing the latest developments in enabling technologies. In this paper, we conduct an in-depth survey of enabling technologies in the context of Web 3.0, such as blockchain, semantic web, 3D interactive web, Metaverse, Virtual reality/Augmented reality, Internet of Things technology, and their roles in shaping Web 3.0. We commence by providing a comprehensive background of Web 3.0, including its concept, basic architecture, potential applications, and industry adoption. Subsequently, we examine recent breakthroughs in IoT, 5G, and blockchain technologies that are pivotal to Web 3.0 development. Following that, other enabling technologies, including AI, semantic web, and 3D interactive web, are discussed. Utilizing these technologies can effectively address the critical challenges in realizing Web 3.0, such as ensuring decentralized identity, platform interoperability, data transparency, reducing latency, and enhancing the system's scalability. Finally, we highlight significant challenges associated with Web 3.0 implementation, emphasizing potential solutions and providing insights into future research directions in this field.
△ Less
Submitted 29 December, 2023;
originally announced January 2024.
-
An annotated grain kernel image database for visual quality inspection
Authors:
Lei Fan,
Yiwen Ding,
Dongdong Fan,
Yong Wu,
Hongxia Chu,
Maurice Pagnucco,
Yang Song
Abstract:
We present a machine vision-based database named GrainSet for the purpose of visual quality inspection of grain kernels. The database contains more than 350K single-kernel images with experts' annotations. The grain kernels used in the study consist of four types of cereal grains including wheat, maize, sorghum and rice, and were collected from over 20 regions in 5 countries. The surface informati…
▽ More
We present a machine vision-based database named GrainSet for the purpose of visual quality inspection of grain kernels. The database contains more than 350K single-kernel images with experts' annotations. The grain kernels used in the study consist of four types of cereal grains including wheat, maize, sorghum and rice, and were collected from over 20 regions in 5 countries. The surface information of each kernel is captured by our custom-built device equipped with high-resolution optic sensor units, and corresponding sampling information and annotations include collection location and time, morphology, physical size, weight, and Damage & Unsound grain categories provided by senior inspectors. In addition, we employed a commonly used deep learning model to provide classification results as a benchmark. We believe that our GrainSet will facilitate future research in fields such as assisting inspectors in grain quality inspections, providing guidance for grain storage and trade, and contributing to applications of smart agriculture.
△ Less
Submitted 20 November, 2023;
originally announced January 2024.
-
Structure-Preserving Physics-Informed Neural Networks With Energy or Lyapunov Structure
Authors:
Haoyu Chu,
Yuto Miyatake,
Wenjun Cui,
Shikui Wei,
Daisuke Furihata
Abstract:
Recently, there has been growing interest in using physics-informed neural networks (PINNs) to solve differential equations. However, the preservation of structure, such as energy and stability, in a suitable manner has yet to be established. This limitation could be a potential reason why the learning process for PINNs is not always efficient and the numerical results may suggest nonphysical beha…
▽ More
Recently, there has been growing interest in using physics-informed neural networks (PINNs) to solve differential equations. However, the preservation of structure, such as energy and stability, in a suitable manner has yet to be established. This limitation could be a potential reason why the learning process for PINNs is not always efficient and the numerical results may suggest nonphysical behavior. Besides, there is little research on their applications on downstream tasks. To address these issues, we propose structure-preserving PINNs to improve their performance and broaden their applications for downstream tasks. Firstly, by leveraging prior knowledge about the physical system, a structure-preserving loss function is designed to assist the PINN in learning the underlying structure. Secondly, a framework that utilizes structure-preserving PINN for robust image recognition is proposed. Here, preserving the Lyapunov structure of the underlying system ensures the stability of the system. Experimental results demonstrate that the proposed method improves the numerical accuracy of PINNs for partial differential equations. Furthermore, the robustness of the model against adversarial perturbations in image data is enhanced.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
FPT Approximation using Treewidth: Capacitated Vertex Cover, Target Set Selection and Vector Dominating Set
Authors:
Huairui Chu,
Bingkai Lin
Abstract:
Treewidth is a useful tool in designing graph algorithms. Although many NP-hard graph problems can be solved in linear time when the input graphs have small treewidth, there are problems which remain hard on graphs of bounded treewidth. In this paper, we consider three vertex selection problems that are W[1]-hard when parameterized by the treewidth of the input graph, namely the capacitated vertex…
▽ More
Treewidth is a useful tool in designing graph algorithms. Although many NP-hard graph problems can be solved in linear time when the input graphs have small treewidth, there are problems which remain hard on graphs of bounded treewidth. In this paper, we consider three vertex selection problems that are W[1]-hard when parameterized by the treewidth of the input graph, namely the capacitated vertex cover problem, the target set selection problem and the vector dominating set problem. We provide two new methods to obtain FPT approximation algorithms for these problems. For the capacitated vertex cover problem and the vector dominating set problem, we obtain $(1+o(1))$-approximation FPT algorithms. For the target set selection problem, we give an FPT algorithm providing a tradeoff between its running time and the approximation ratio.
△ Less
Submitted 18 January, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Learning High-Order Relationships of Brain Regions
Authors:
Weikang Qiu,
Huangrui Chu,
Selena Wang,
Haolan Zuo,
Xiaoxiao Li,
Yize Zhao,
Rex Ying
Abstract:
Discovering reliable and informative relationships among brain regions from functional magnetic resonance imaging (fMRI) signals is essential in phenotypic predictions. Most of the current methods fail to accurately characterize those interactions because they only focus on pairwise connections and overlook the high-order relationships of brain regions. We propose that these high-order relationshi…
▽ More
Discovering reliable and informative relationships among brain regions from functional magnetic resonance imaging (fMRI) signals is essential in phenotypic predictions. Most of the current methods fail to accurately characterize those interactions because they only focus on pairwise connections and overlook the high-order relationships of brain regions. We propose that these high-order relationships should be maximally informative and minimally redundant (MIMR). However, identifying such high-order relationships is challenging and under-explored due to the exponential search space and the absence of a tractable objective. In response to this gap, we propose a novel method named HYBRID which aims to extract MIMR high-order relationships from fMRI data. HYBRID employs a CONSTRUCTOR to identify hyperedge structures, and a WEIGHTER to compute a weight for each hyperedge, which avoids searching in exponential space. HYBRID achieves the MIMR objective through an innovative information bottleneck framework named multi-head drop-bottleneck with theoretical guarantees. Our comprehensive experiments demonstrate the effectiveness of our model. Our model outperforms the state-of-the-art predictive model by an average of 11.2%, regarding the quality of hyperedges measured by CPM, a standard protocol for studying brain connections.
△ Less
Submitted 8 June, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
A Modular Pneumatic Soft Gripper Design for Aerial Grasping and Landing
Authors:
Hiu Ching Cheung,
Ching-Wei Chang,
Bailun Jiang,
Chih-Yung Wen,
Henry K. Chu
Abstract:
Aerial robots have garnered significant attention due to their potential applications in various industries, such as inspection, search and rescue, and drone delivery. Successful missions often depend on the ability of these robots to grasp and land effectively. This paper presents a novel modular soft gripper design tailored explicitly for aerial grasping and landing operations. The proposed modu…
▽ More
Aerial robots have garnered significant attention due to their potential applications in various industries, such as inspection, search and rescue, and drone delivery. Successful missions often depend on the ability of these robots to grasp and land effectively. This paper presents a novel modular soft gripper design tailored explicitly for aerial grasping and landing operations. The proposed modular pneumatic soft gripper incorporates a feed-forward proportional controller to regulate pressure, enabling compliant gripping capabilities. The modular connectors of the soft fingers offer two configurations for the 4-tip soft gripper, H-base (cylindrical) and X-base (spherical), allowing adaptability to different target objects. Additionally, the gripper can serve as a soft landing gear when deflated, eliminating the need for an extra landing gear. This design reduces weight, simplifies aerial manipulation control, and enhances flight efficiency. We demonstrate the efficacy of indoor aerial grasping and achieve a maximum payload of 217 g using the proposed soft aerial vehicle and its H-base pneumatic soft gripper (808 g).
△ Less
Submitted 25 March, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Authors:
Neel Jain,
Ping-yeh Chiang,
Yuxin Wen,
John Kirchenbauer,
Hong-Min Chu,
Gowthami Somepalli,
Brian R. Bartoldson,
Bhavya Kailkhura,
Avi Schwarzschild,
Aniruddha Saha,
Micah Goldblum,
Jonas Geiping,
Tom Goldstein
Abstract:
We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instru…
▽ More
We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a 10% improvement, with ShareGPT an 8% improvement, and with OpenPlatypus an 8% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune.
△ Less
Submitted 10 October, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Robustness-Guided Image Synthesis for Data-Free Quantization
Authors:
Jianhong Bai,
Yuchen Yang,
Huanpeng Chu,
Hualiang Wang,
Zuozhu Liu,
Ruizhe Chen,
Xiaoxuan He,
Lianrui Mu,
Chengfei Cai,
Haoji Hu
Abstract:
Quantization has emerged as a promising direction for model compression. Recently, data-free quantization has been widely studied as a promising method to avoid privacy concerns, which synthesizes images as an alternative to real training data. Existing methods use classification loss to ensure the reliability of the synthesized images. Unfortunately, even if these images are well-classified by th…
▽ More
Quantization has emerged as a promising direction for model compression. Recently, data-free quantization has been widely studied as a promising method to avoid privacy concerns, which synthesizes images as an alternative to real training data. Existing methods use classification loss to ensure the reliability of the synthesized images. Unfortunately, even if these images are well-classified by the pre-trained model, they still suffer from low semantics and homogenization issues. Intuitively, these low-semantic images are sensitive to perturbations, and the pre-trained model tends to have inconsistent output when the generator synthesizes an image with poor semantics. To this end, we propose Robustness-Guided Image Synthesis (RIS), a simple but effective method to enrich the semantics of synthetic images and improve image diversity, further boosting the performance of downstream data-free compression tasks. Concretely, we first introduce perturbations on input and model weight, then define the inconsistency metrics at feature and prediction levels before and after perturbations. On the basis of inconsistency on two levels, we design a robustness optimization objective to enhance the semantics of synthetic images. Moreover, we also make our approach diversity-aware by forcing the generator to synthesize images with small correlations in the label space. With RIS, we achieve state-of-the-art performance for various settings on data-free quantization and can be extended to other data-free compression tasks.
△ Less
Submitted 20 February, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Countering Eavesdroppers with Meta-learning-based Cooperative Ambient Backscatter Communications
Authors:
Nam H. Chu,
Nguyen Van Huynh,
Diep N. Nguyen,
Dinh Thai Hoang,
Shimin Gong,
Tao Shu,
Eryk Dutkiewicz,
Khoa T. Phan
Abstract:
This article introduces a novel lightweight framework using ambient backscattering communications to counter eavesdroppers. In particular, our framework divides an original message into two parts: (i) the active-transmit message transmitted by the transmitter using conventional RF signals and (ii) the backscatter message transmitted by an ambient backscatter tag that backscatters upon the active s…
▽ More
This article introduces a novel lightweight framework using ambient backscattering communications to counter eavesdroppers. In particular, our framework divides an original message into two parts: (i) the active-transmit message transmitted by the transmitter using conventional RF signals and (ii) the backscatter message transmitted by an ambient backscatter tag that backscatters upon the active signals emitted by the transmitter. Notably, the backscatter tag does not generate its own signal, making it difficult for an eavesdropper to detect the backscattered signals unless they have prior knowledge of the system. Here, we assume that without decoding/knowing the backscatter message, the eavesdropper is unable to decode the original message. Even in scenarios where the eavesdropper can capture both messages, reconstructing the original message is a complex task without understanding the intricacies of the message-splitting mechanism. A challenge in our proposed framework is to effectively decode the backscattered signals at the receiver, often accomplished using the maximum likelihood (MLK) approach. However, such a method may require a complex mathematical model together with perfect channel state information (CSI). To address this issue, we develop a novel deep meta-learning-based signal detector that can not only effectively decode the weak backscattered signals without requiring perfect CSI but also quickly adapt to a new wireless environment with very little knowledge. Simulation results show that our proposed learning approach, without requiring perfect CSI and complex mathematical model, can achieve a bit error ratio close to that of the MLK-based approach. They also clearly show the efficiency of the proposed approach in dealing with eavesdropping attacks and the lack of training data for deep learning models in practical scenarios.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning
Authors:
Jianhong Bai,
Zuozhu Liu,
Hualiang Wang,
Jin Hao,
Yang Feng,
Huanpeng Chu,
Haoji Hu
Abstract:
Though Self-supervised learning (SSL) has been widely studied as a promising technique for representation learning, it doesn't generalize well on long-tailed datasets due to the majority classes dominating the feature space. Recent work shows that the long-tailed learning performance could be boosted by sampling extra in-domain (ID) data for self-supervised training, however, large-scale ID data w…
▽ More
Though Self-supervised learning (SSL) has been widely studied as a promising technique for representation learning, it doesn't generalize well on long-tailed datasets due to the majority classes dominating the feature space. Recent work shows that the long-tailed learning performance could be boosted by sampling extra in-domain (ID) data for self-supervised training, however, large-scale ID data which can rebalance the minority classes are expensive to collect. In this paper, we propose an alternative but easy-to-use and effective solution, Contrastive with Out-of-distribution (OOD) data for Long-Tail learning (COLT), which can effectively exploit OOD data to dynamically re-balance the feature space. We empirically identify the counter-intuitive usefulness of OOD samples in SSL long-tailed learning and principally design a novel SSL method. Concretely, we first localize the `head' and `tail' samples by assigning a tailness score to each OOD sample based on its neighborhoods in the feature space. Then, we propose an online OOD sampling strategy to dynamically re-balance the feature space. Finally, we enforce the model to be capable of distinguishing ID and OOD samples by a distribution-level supervised contrastive loss. Extensive experiments are conducted on various datasets and several state-of-the-art SSL frameworks to verify the effectiveness of the proposed method. The results show that our method significantly improves the performance of SSL on long-tailed datasets by a large margin, and even outperforms previous work which uses external ID data. Our code is available at https://github.com/JianhongBai/COLT.
△ Less
Submitted 12 July, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
W-procer: Weighted Prototypical Contrastive Learning for Medical Few-Shot Named Entity Recognition
Authors:
Mingchen Li,
Yang Ye,
Jeremy Yeung,
Huixue Zhou,
Huaiyuan Chu,
Rui Zhang
Abstract:
Contrastive learning has become a popular solution for few-shot Name Entity Recognization (NER). The conventional configuration strives to reduce the distance between tokens with the same labels and increase the distance between tokens with different labels. The effect of this setup may, however, in the medical domain, there are a lot of entities annotated as OUTSIDE (O), and they are undesirably…
▽ More
Contrastive learning has become a popular solution for few-shot Name Entity Recognization (NER). The conventional configuration strives to reduce the distance between tokens with the same labels and increase the distance between tokens with different labels. The effect of this setup may, however, in the medical domain, there are a lot of entities annotated as OUTSIDE (O), and they are undesirably pushed apart to other entities that are not labeled as OUTSIDE (O) by the current contrastive learning method end up with a noisy prototype for the semantic representation of the label, though there are many OUTSIDE (O) labeled entities are relevant to the labeled entities. To address this challenge, we propose a novel method named Weighted Prototypical Contrastive Learning for Medical Few Shot Named Entity Recognization (W-PROCER). Our approach primarily revolves around constructing the prototype-based contractive loss and weighting network. These components play a crucial role in assisting the model in differentiating the negative samples from OUTSIDE (O) tokens and enhancing the discrimination ability of contrastive learning. Experimental results show that our proposed W-PROCER framework significantly outperforms the strong baselines on the three medical benchmark datasets.
△ Less
Submitted 31 July, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Covariate-distance Weighted Regression (CWR): A Case Study for Estimation of House Prices
Authors:
Hone-Jay Chu,
Po-Hung Chen,
Sheng-Mao Chang,
Muhammad Zeeshan Ali,
Sumriti Ranjan Patra
Abstract:
Geographically weighted regression (GWR) is a popular tool for modeling spatial heterogeneity in a regression model. However, the current weighting function used in GWR only considers the geographical distance, while the attribute similarity is totally ignored. In this study, we proposed a covariate weighting function that combines the geographical distance and attribute distance. The covariate-di…
▽ More
Geographically weighted regression (GWR) is a popular tool for modeling spatial heterogeneity in a regression model. However, the current weighting function used in GWR only considers the geographical distance, while the attribute similarity is totally ignored. In this study, we proposed a covariate weighting function that combines the geographical distance and attribute distance. The covariate-distance weighted regression (CWR) is the extension of GWR including geographical distance and attribute distance. House prices are affected by numerous factors, such as house age, floor area, and land use. Prediction model is used to help understand the characteristics of regional house prices. The CWR was used to understand the relationship between the house price and controlling factors. The CWR can consider the geological and attribute distances, and produce accurate estimates of house price that preserve the weight matrix for geological and attribute distance functions. Results show that the house attributes/conditions and the characteristics of the house, such as floor area and house age, might affect the house price. After factor selection, in which only house age and floor area of a building are considered, the RMSE of the CWR model can be improved by 2.9%-26.3% for skyscrapers when compared to the GWR. CWR can effectively reduce estimation errors from traditional spatial regression models and provide novel and feasible models for spatial estimation.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Lyapunov-Stable Deep Equilibrium Models
Authors:
Haoyu Chu,
Shikui Wei,
Ting Liu,
Yao Zhao,
Yuto Miyatake
Abstract:
Deep equilibrium (DEQ) models have emerged as a promising class of implicit layer models, which abandon traditional depth by solving for the fixed points of a single nonlinear layer. Despite their success, the stability of the fixed points for these models remains poorly understood. By considering DEQ models as nonlinear dynamic systems, we propose a robust DEQ model named LyaDEQ with guaranteed p…
▽ More
Deep equilibrium (DEQ) models have emerged as a promising class of implicit layer models, which abandon traditional depth by solving for the fixed points of a single nonlinear layer. Despite their success, the stability of the fixed points for these models remains poorly understood. By considering DEQ models as nonlinear dynamic systems, we propose a robust DEQ model named LyaDEQ with guaranteed provable stability via Lyapunov theory. The crux of our method is ensuring the Lyapunov stability of the DEQ model's fixed points, which enables the proposed model to resist minor initial perturbations. To avoid poor adversarial defense due to Lyapunov-stable fixed points being located near each other, we orthogonalize the layers after the Lyapunov stability module to separate different fixed points. We evaluate LyaDEQ models under well-known adversarial attacks, and experimental results demonstrate significant improvement in robustness. Furthermore, we show that the LyaDEQ model can be combined with other defense methods, such as adversarial training, to achieve even better adversarial robustness.
△ Less
Submitted 10 January, 2024; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Improved constructions of secondary structure avoidance codes for DNA sequences
Authors:
Hui Chu,
Chen Wang,
Yiwei Zhang
Abstract:
In a DNA sequence, we have the celebrated Watson-Crick complement $\overline{T}=A$, $\overline{A}=T$, $\overline{C}=G$, and $\overline{G}=C$. Given an integer $m\ge 2$, a secondary structure in a DNA sequence refers to the existence of two non-overlapping reverse complement consecutive subsequences of length $m$, denoted as $\boldsymbol{x}=(x_1, \dots, x_m)$ and $\boldsymbol{y}=(y_1, \dots, y_m)$,…
▽ More
In a DNA sequence, we have the celebrated Watson-Crick complement $\overline{T}=A$, $\overline{A}=T$, $\overline{C}=G$, and $\overline{G}=C$. Given an integer $m\ge 2$, a secondary structure in a DNA sequence refers to the existence of two non-overlapping reverse complement consecutive subsequences of length $m$, denoted as $\boldsymbol{x}=(x_1, \dots, x_m)$ and $\boldsymbol{y}=(y_1, \dots, y_m)$, such that $x_i=\overline{y_{m-i+1}}$ for $1\leq i \leq m$. The property of secondary structure avoidance (SSA) forbids a sequence to contain such reverse complement subsequences, and it is a key criterion in the design of single-stranded DNA sequences for DNA computing and storage. In this paper, we improve on a recent result of Nguyen et al., by introducing explicit constructions of secondary structure avoidance codes and analyzing the capacity for any given $m$. In particular, our constructions have optimal rate 1.1679bits/nt and 1.5515bits/nt when $m=2$ and $m=3$, respectively.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
Dynamic Resource Allocation for Metaverse Applications with Deep Reinforcement Learning
Authors:
Nam H. Chu,
Diep N. Nguyen,
Dinh Thai Hoang,
Khoa T. Phan,
Eryk Dutkiewicz,
Dusit Niyato,
Tao Shu
Abstract:
This work proposes a novel framework to dynamically and effectively manage and allocate different types of resources for Metaverse applications, which are forecasted to demand massive resources of various types that have never been seen before. Specifically, by studying functions of Metaverse applications, we first propose an effective solution to divide applications into groups, namely MetaInstan…
▽ More
This work proposes a novel framework to dynamically and effectively manage and allocate different types of resources for Metaverse applications, which are forecasted to demand massive resources of various types that have never been seen before. Specifically, by studying functions of Metaverse applications, we first propose an effective solution to divide applications into groups, namely MetaInstances, where common functions can be shared among applications to enhance resource usage efficiency. Then, to capture the real-time, dynamic, and uncertain characteristics of request arrival and application departure processes, we develop a semi-Markov decision process-based framework and propose an intelligent algorithm that can gradually learn the optimal admission policy to maximize the revenue and resource usage efficiency for the Metaverse service provider and at the same time enhance the Quality-of-Service for Metaverse users. Extensive simulation results show that our proposed approach can achieve up to 120% greater revenue for the Metaverse service providers and up to 178.9% higher acceptance probability for Metaverse application requests than those of other baselines.
△ Less
Submitted 26 February, 2023;
originally announced February 2023.
-
A Tight Lower Bound for Compact Set Packing
Authors:
Huairui Chu
Abstract:
This note is devoted to show a simple proof of a tight lower bound of the parameterized compact set packing problem, based on ETH.
This note is devoted to show a simple proof of a tight lower bound of the parameterized compact set packing problem, based on ETH.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Universal Guidance for Diffusion Models
Authors:
Arpit Bansal,
Hong-Min Chu,
Avi Schwarzschild,
Soumyadip Sengupta,
Micah Goldblum,
Jonas Geiping,
Tom Goldstein
Abstract:
Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully…
▽ More
Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at https://github.com/arpitbansal297/Universal-Guided-Diffusion.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
Layout-guided Indoor Panorama Inpainting with Plane-aware Normalization
Authors:
Chao-Chen Gao,
Cheng-Hsiu Chen,
Jheng-Wei Su,
Hung-Kuo Chu
Abstract:
We present an end-to-end deep learning framework for indoor panoramic image inpainting. Although previous inpainting methods have shown impressive performance on natural perspective images, most fail to handle panoramic images, particularly indoor scenes, which usually contain complex structure and texture content. To achieve better inpainting quality, we propose to exploit both the global and loc…
▽ More
We present an end-to-end deep learning framework for indoor panoramic image inpainting. Although previous inpainting methods have shown impressive performance on natural perspective images, most fail to handle panoramic images, particularly indoor scenes, which usually contain complex structure and texture content. To achieve better inpainting quality, we propose to exploit both the global and local context of indoor panorama during the inpainting process. Specifically, we take the low-level layout edges estimated from the input panorama as a prior to guide the inpainting model for recovering the global indoor structure. A plane-aware normalization module is employed to embed plane-wise style features derived from the layout into the generator, encouraging local texture restoration from adjacent room structures (i.e., ceiling, floor, and walls). Experimental results show that our work outperforms the current state-of-the-art methods on a public panoramic dataset in both qualitative and quantitative evaluations. Our code is available at https://ericsujw.github.io/LGPN-net/
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Sampling Neural Radiance Fields for Refractive Objects
Authors:
Jen-I Pan,
Jheng-Wei Su,
Kai-Wen Hsiao,
Ting-Yu Yen,
Hung-Kuo Chu
Abstract:
Recently, differentiable volume rendering in neural radiance fields (NeRF) has gained a lot of popularity, and its variants have attained many impressive results. However, existing methods usually assume the scene is a homogeneous volume so that a ray is cast along the straight path. In this work, the scene is instead a heterogeneous volume with a piecewise-constant refractive index, where the pat…
▽ More
Recently, differentiable volume rendering in neural radiance fields (NeRF) has gained a lot of popularity, and its variants have attained many impressive results. However, existing methods usually assume the scene is a homogeneous volume so that a ray is cast along the straight path. In this work, the scene is instead a heterogeneous volume with a piecewise-constant refractive index, where the path will be curved if it intersects the different refractive indices. For novel view synthesis of refractive objects, our NeRF-based framework aims to optimize the radiance fields of bounded volume and boundary from multi-view posed images with refractive object silhouettes. To tackle this challenging problem, the refractive index of a scene is reconstructed from silhouettes. Given the refractive index, we extend the stratified and hierarchical sampling techniques in NeRF to allow drawing samples along a curved path tracked by the Eikonal equation. The results indicate that our framework outperforms the state-of-the-art method both quantitatively and qualitatively, demonstrating better performance on the perceptual similarity metric and an apparent improvement in the rendering quality on several synthetic and real scenes.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Optimal Privacy Preserving for Federated Learning in Mobile Edge Computing
Authors:
Hai M. Nguyen,
Nam H. Chu,
Diep N. Nguyen,
Dinh Thai Hoang,
Van-Dinh Nguyen,
Minh Hoang Ha,
Eryk Dutkiewicz,
Marwan Krunz
Abstract:
Federated Learning (FL) with quantization and deliberately added noise over wireless networks is a promising approach to preserve user differential privacy (DP) while reducing wireless resources. Specifically, an FL process can be fused with quantized Binomial mechanism-based updates contributed by multiple users. However, optimizing quantization parameters, communication resources (e.g., transmit…
▽ More
Federated Learning (FL) with quantization and deliberately added noise over wireless networks is a promising approach to preserve user differential privacy (DP) while reducing wireless resources. Specifically, an FL process can be fused with quantized Binomial mechanism-based updates contributed by multiple users. However, optimizing quantization parameters, communication resources (e.g., transmit power, bandwidth, and quantization bits), and the added noise to guarantee the DP requirement and performance of the learned FL model remains an open and challenging problem. This article aims to jointly optimize the quantization and Binomial mechanism parameters and communication resources to maximize the convergence rate under the constraints of the wireless network and DP requirement. To that end, we first derive a novel DP budget estimation of the FL with quantization/noise that is tighter than the state-of-the-art bound. We then provide a theoretical bound on the convergence rate. This theoretical bound is decomposed into two components, including the variance of the global gradient and the quadratic bias that can be minimized by optimizing the communication resources, and quantization/noise parameters. The resulting optimization turns out to be a Mixed-Integer Non-linear Programming (MINLP) problem. To tackle it, we first transform this MINLP problem into a new problem whose solutions are proved to be the optimal solutions of the original one. We then propose an approximate algorithm to solve the transformed problem with an arbitrary relative error guarantee. Extensive simulations show that under the same wireless resource constraints and DP protection requirements, the proposed approximate algorithm achieves an accuracy close to the accuracy of the conventional FL without quantization/noise. The results can achieve a higher convergence rate while preserving users' privacy.
△ Less
Submitted 20 May, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network
Authors:
Jheng-Wei Su,
Chi-Han Peng,
Peter Wonka,
Hung-Kuo Chu
Abstract:
Reconstructing 3D layouts from multiple $360^{\circ}$ panoramas has received increasing attention recently as estimating a complete layout of a large-scale and complex room from a single panorama is very difficult. The state-of-the-art method, called PSMNet, introduces the first learning-based framework that jointly estimates the room layout and registration given a pair of panoramas. However, PSM…
▽ More
Reconstructing 3D layouts from multiple $360^{\circ}$ panoramas has received increasing attention recently as estimating a complete layout of a large-scale and complex room from a single panorama is very difficult. The state-of-the-art method, called PSMNet, introduces the first learning-based framework that jointly estimates the room layout and registration given a pair of panoramas. However, PSMNet relies on an approximate (i.e., "noisy") registration as input. Obtaining this input requires a solution for wide baseline registration which is a challenging problem. In this work, we present a complete multi-view panoramic layout estimation framework that jointly learns panorama registration and layout estimation given a pair of panoramas without relying on a pose prior. The major improvement over PSMNet comes from a novel Geometry-aware Panorama Registration Network or GPR-Net that effectively tackles the wide baseline registration problem by exploiting the layout geometry and computing fine-grained correspondences on the layout boundaries, instead of the global pixel-space. Our architecture consists of two parts. First, given two panoramas, we adopt a vision transformer to learn a set of 1D horizon features sampled on the panorama. These 1D horizon features encode the depths of individual layout boundary samples and the correspondence and covisibility maps between layout boundaries. We then exploit a non-linear registration module to convert these 1D horizon features into a set of corresponding 2D boundary points on the layout. Finally, we estimate the final relative camera pose via RANSAC and obtain the complete layout simply by taking the union of registered layouts. Experimental results indicate that our method achieves state-of-the-art performance in both panorama registration and layout estimation on a large-scale indoor panorama dataset ZInD.
△ Less
Submitted 21 October, 2022; v1 submitted 20 October, 2022;
originally announced October 2022.
-
UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering on Vietnamese Language
Authors:
Triet Minh Thai,
Ngan Ha-Thao Chu,
Anh Tuan Vo,
Son T. Luu
Abstract:
For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answerin…
▽ More
For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answering dataset for developing question answering systems for COVID-19 called UIT-ViCoV19QA. The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question. Along with the dataset, we set up various deep learning models as baseline to assess the quality of our dataset and initiate the benchmark results for further research through commonly used metrics such as BLEU, METEOR, and ROUGE-L. We also illustrate the positive effects of having multiple paraphrased answers experimented on these models, especially on Transformer - a dominant architecture in the field of study.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Text Simplification of College Admissions Instructions: A Professionally Simplified and Verified Corpus
Authors:
Zachary W. Taylor,
Maximus H. Chu,
Junyi Jessy Li
Abstract:
Access to higher education is critical for minority populations and emergent bilingual students. However, the language used by higher education institutions to communicate with prospective students is often too complex; concretely, many institutions in the US publish admissions application instructions far above the average reading level of a typical high school graduate, often near the 13th or 14…
▽ More
Access to higher education is critical for minority populations and emergent bilingual students. However, the language used by higher education institutions to communicate with prospective students is often too complex; concretely, many institutions in the US publish admissions application instructions far above the average reading level of a typical high school graduate, often near the 13th or 14th grade level. This leads to an unnecessary barrier between students and access to higher education. This work aims to tackle this challenge via text simplification. We present PSAT (Professionally Simplified Admissions Texts), a dataset with 112 admissions instructions randomly selected from higher education institutions across the US. These texts are then professionally simplified, and verified and accepted by subject-matter experts who are full-time employees in admissions offices at various institutions. Additionally, PSAT comes with manual alignments of 1,883 original-simplified sentence pairs. The result is a first-of-its-kind corpus for the evaluation and fine-tuning of text simplification systems in a high-stakes genre distinct from existing simplification resources.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate Folding Landscape and Protein Structure Prediction
Authors:
Jun Zhang,
Sirui Liu,
Mengyun Chen,
Haotian Chu,
Min Wang,
Zidong Wang,
Jialiang Yu,
Ningxi Ni,
Fan Yu,
Diqing Chen,
Yi Isaac Yang,
Boxin Xue,
Lijiang Yang,
Yuan Liu,
Yi Qin Gao
Abstract:
Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development. Determining accurate folding landscape using co-evolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatic…
▽ More
Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development. Determining accurate folding landscape using co-evolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit co-evolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologs. Based on the interrogation on the cause of such dependence, we presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime and even achieve encouraging performance with single-sequence predictions. Being able to make accurate predictions with few-shot MSA not only generalizes AlphaFold2 better for orphan sequences, but also democratizes its use for high-throughput applications. Besides, EvoGen combined with AlphaFold2 yields a probabilistic structure generation method which could explore alternative conformations of protein sequences, and the task-aware differentiable algorithm for sequence generation will benefit other related tasks including protein design.
△ Less
Submitted 8 October, 2023; v1 submitted 20 August, 2022;
originally announced August 2022.
-
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
Authors:
Arpit Bansal,
Eitan Borgnia,
Hong-Min Chu,
Jie S. Li,
Hamid Kazemi,
Furong Huang,
Micah Goldblum,
Jonas Geiping,
Tom Goldstein
Abstract:
Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministi…
▽ More
Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e.g., blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference, and paves the way for generalized diffusion models that invert arbitrary processes. Our code is available at https://github.com/arpitbansal297/Cold-Diffusion-Models
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
A Visual Analytics System for Improving Attention-based Traffic Forecasting Models
Authors:
Seungmin Jin,
Hyunwook Lee,
Cheonbok Park,
Hyeshin Chu,
Yunwon Tae,
Jaegul Choo,
Sungahn Ko
Abstract:
With deep learning (DL) outperforming conventional methods for different tasks, much effort has been devoted to utilizing DL in various domains. Researchers and developers in the traffic domain have also designed and improved DL models for forecasting tasks such as estimation of traffic speed and time of arrival. However, there exist many challenges in analyzing DL models due to the black-box prop…
▽ More
With deep learning (DL) outperforming conventional methods for different tasks, much effort has been devoted to utilizing DL in various domains. Researchers and developers in the traffic domain have also designed and improved DL models for forecasting tasks such as estimation of traffic speed and time of arrival. However, there exist many challenges in analyzing DL models due to the black-box property of DL models and complexity of traffic data (i.e., spatio-temporal dependencies). Collaborating with domain experts, we design a visual analytics system, AttnAnalyzer, that enables users to explore how DL models make predictions by allowing effective spatio-temporal dependency analysis. The system incorporates dynamic time warping (DTW) and Granger causality tests for computational spatio-temporal dependency analysis while providing map, table, line chart, and pixel views to assist user to perform dependency and model behavior analysis. For the evaluation, we present three case studies showing how AttnAnalyzer can effectively explore model behaviors and improve model performance in two different road networks. We also provide domain expert feedback.
△ Less
Submitted 11 August, 2022; v1 submitted 8 August, 2022;
originally announced August 2022.
-
SimCURL: Simple Contrastive User Representation Learning from Command Sequences
Authors:
Hang Chu,
Amir Hosein Khasahmadi,
Karl D. D. Willis,
Fraser Anderson,
Yaoli Mao,
Linh Tran,
Justin Matejka,
Jo Vermeulen
Abstract:
User modeling is crucial to understanding user behavior and essential for improving user experience and personalized recommendations. When users interact with software, vast amounts of command sequences are generated through logging and analytics systems. These command sequences contain clues to the users' goals and intents. However, these data modalities are highly unstructured and unlabeled, mak…
▽ More
User modeling is crucial to understanding user behavior and essential for improving user experience and personalized recommendations. When users interact with software, vast amounts of command sequences are generated through logging and analytics systems. These command sequences contain clues to the users' goals and intents. However, these data modalities are highly unstructured and unlabeled, making it difficult for standard predictive systems to learn from. We propose SimCURL, a simple yet effective contrastive self-supervised deep learning framework that learns user representation from unlabeled command sequences. Our method introduces a user-session network architecture, as well as session dropout as a novel way of data augmentation. We train and evaluate our method on a real-world command sequence dataset of more than half a billion commands. Our method shows significant improvement over existing methods when the learned representation is transferred to downstream tasks such as experience and expertise classification.
△ Less
Submitted 29 July, 2022;
originally announced July 2022.
-
PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction
Authors:
Sirui Liu,
Jun Zhang,
Haotian Chu,
Min Wang,
Boxin Xue,
Ningxi Ni,
Jialiang Yu,
Yuhao Xie,
Zhenyu Chen,
Mengyun Chen,
Yuan Liu,
Piya Patra,
Fan Xu,
Jie Chen,
Zidong Wang,
Lijiang Yang,
Fan Yu,
Lei Chen,
Yi Qin Gao
Abstract:
Proteins are essential component of human life and their structures are important for function and mechanism analysis. Recent work has shown the potential of AI-driven methods for protein structure prediction. However, the development of new models is restricted by the lack of dataset and benchmark training procedure. To the best of our knowledge, the existing open source datasets are far less to…
▽ More
Proteins are essential component of human life and their structures are important for function and mechanism analysis. Recent work has shown the potential of AI-driven methods for protein structure prediction. However, the development of new models is restricted by the lack of dataset and benchmark training procedure. To the best of our knowledge, the existing open source datasets are far less to satisfy the needs of modern protein sequence-structure related research. To solve this problem, we present the first million-level protein structure prediction dataset with high coverage and diversity, named as PSP. This dataset consists of 570k true structure sequences (10TB) and 745k complementary distillation sequences (15TB). We provide in addition the benchmark training procedure for SOTA protein structure prediction model on this dataset. We validate the utility of this dataset for training by participating CAMEO contest in which our model won the first place. We hope our PSP dataset together with the training benchmark can enable a broader community of AI/biology researchers for AI-driven protein related research.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
MetaSlicing: A Novel Resource Allocation Framework for Metaverse
Authors:
Nam H. Chu,
Dinh Thai Hoang,
Diep N. Nguyen,
Khoa T. Phan,
Eryk Dutkiewicz,
Dusit Niyato,
Tao Shu
Abstract:
Creating and maintaining the Metaverse requires enormous resources that have never been seen before, especially computing resources for intensive data processing to support the Extended Reality, enormous storage resources, and massive networking resources for maintaining ultra high-speed and low-latency connections. Therefore, this work aims to propose a novel framework, namely MetaSlicing, that c…
▽ More
Creating and maintaining the Metaverse requires enormous resources that have never been seen before, especially computing resources for intensive data processing to support the Extended Reality, enormous storage resources, and massive networking resources for maintaining ultra high-speed and low-latency connections. Therefore, this work aims to propose a novel framework, namely MetaSlicing, that can provide a highly effective and comprehensive solution in managing and allocating different types of resources for Metaverse applications. In particular, by observing that Metaverse applications may have common functions, we first propose grouping applications into clusters, called MetaInstances. In a MetaInstance, common functions can be shared among applications. As such, the same resources can be used by multiple applications simultaneously, thereby enhancing resource utilization dramatically.To address the real-time characteristic and resource demand's dynamic and uncertainty in the Metaverse, we develop an effective framework based on the semi-Markov decision process and propose an intelligent admission control algorithm that can maximize resource utilization and enhance the Quality-of-Service for end-users. Extensive simulation results show that our proposed solution outperforms the Greedy-based policies by up to 80% and 47% in terms of long-term revenue for Metaverse providers and request acceptance probability, respectively.
△ Less
Submitted 26 February, 2023; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Improving Neural ODEs via Knowledge Distillation
Authors:
Haoyu Chu,
Shikui Wei,
Qiming Lu,
Yao Zhao
Abstract:
Neural Ordinary Differential Equations (Neural ODEs) construct the continuous dynamics of hidden units using ordinary differential equations specified by a neural network, demonstrating promising results on many tasks. However, Neural ODEs still do not perform well on image recognition tasks. The possible reason is that the one-hot encoding vector commonly used in Neural ODEs can not provide enoug…
▽ More
Neural Ordinary Differential Equations (Neural ODEs) construct the continuous dynamics of hidden units using ordinary differential equations specified by a neural network, demonstrating promising results on many tasks. However, Neural ODEs still do not perform well on image recognition tasks. The possible reason is that the one-hot encoding vector commonly used in Neural ODEs can not provide enough supervised information. We propose a new training based on knowledge distillation to construct more powerful and robust Neural ODEs fitting image recognition tasks. Specially, we model the training of Neural ODEs into a teacher-student learning process, in which we propose ResNets as the teacher model to provide richer supervised information. The experimental results show that the new training manner can improve the classification accuracy of Neural ODEs by 24% on CIFAR10 and 5% on SVHN. In addition, we also quantitatively discuss the effect of both knowledge distillation and time horizon in Neural ODEs on robustness against adversarial examples. The experimental analysis concludes that introducing the knowledge distillation and increasing the time horizon can improve the robustness of Neural ODEs against adversarial examples.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
AI-enabled mm-Waveform Configuration for Autonomous Vehicles with Integrated Communication and Sensing
Authors:
Nam H. Chu,
Diep N. Nguyen,
Dinh Thai Hoang,
Quoc-Viet Pham,
Khoa T. Phan,
Won-Joo Hwang,
Eryk Dutkiewicz
Abstract:
Integrated Communications and Sensing (ICS) has recently emerged as an enabling technology for ubiquitous sensing and IoT applications. For ICS application to Autonomous Vehicles (AVs), optimizing the waveform structure is one of the most challenging tasks due to strong influences between sensing and data communication functions. Specifically, the preamble of a data communication frame is typicall…
▽ More
Integrated Communications and Sensing (ICS) has recently emerged as an enabling technology for ubiquitous sensing and IoT applications. For ICS application to Autonomous Vehicles (AVs), optimizing the waveform structure is one of the most challenging tasks due to strong influences between sensing and data communication functions. Specifically, the preamble of a data communication frame is typically leveraged for the sensing function. As such, the higher number of preambles in a Coherent Processing Interval (CPI) is, the greater sensing task's performance is. In contrast, communication efficiency is inversely proportional to the number of preambles. Moreover, surrounding radio environments are usually dynamic with high uncertainties due to their high mobility, making the ICS's waveform optimization problem even more challenging. To that end, this paper develops a novel ICS framework established on the Markov decision process and recent advanced techniques in deep reinforcement learning. By doing so, without requiring complete knowledge of the surrounding environment in advance, the ICS-AV can adaptively optimize its waveform structure (i.e., number of frames in the CPI) to maximize sensing and data communication performance under the surrounding environment's dynamic and uncertainty. Extensive simulations show that our proposed approach can improve the joint communication and sensing performance up to 46.26% compared with other baseline methods.
△ Less
Submitted 31 October, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
A DFS Algorithm for Maximum Matchings in General Graphs
Authors:
Tony T. Lee,
Bojun Lu,
Hanli Chu
Abstract:
In this paper, we propose a depth-first search (DFS) algorithm for searching maximum matchings in general graphs. Unlike blossom shrinking algorithms, which store all possible alternative alternating paths in the super-vertices shrunk from blossoms, the newly proposed algorithm does not involve blossom shrinking. The basic idea is to deflect the alternating path when facing blossoms. The algorithm…
▽ More
In this paper, we propose a depth-first search (DFS) algorithm for searching maximum matchings in general graphs. Unlike blossom shrinking algorithms, which store all possible alternative alternating paths in the super-vertices shrunk from blossoms, the newly proposed algorithm does not involve blossom shrinking. The basic idea is to deflect the alternating path when facing blossoms. The algorithm maintains detour information in an auxiliary stack to minimize the redundant data structures. A benefit of our technique is to avoid spending time on shrinking and expanding blossoms. This DFS algorithm can determine a maximum matching of a general graph with $m$ edges and $n$ vertices in $O(mn)$ time with space complexity $O(n)$.
△ Less
Submitted 19 April, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Defeating Eavesdroppers with Ambient Backscatter Communications
Authors:
Nguyen Van Huynh,
Nguyen Quang Hieu,
Nam H. Chu,
Diep N. Nguyen,
Dinh Thai Hoang,
Eryk Dutkiewicz
Abstract:
Unlike conventional anti-eavesdropping methods that always require additional energy or computing resources (e.g., in friendly jamming and cryptography-based solutions), this work proposes a novel anti-eavesdropping solution that comes with mostly no extra power nor computing resource requirement. This is achieved by leveraging the ambient backscatter communications in which secret information can…
▽ More
Unlike conventional anti-eavesdropping methods that always require additional energy or computing resources (e.g., in friendly jamming and cryptography-based solutions), this work proposes a novel anti-eavesdropping solution that comes with mostly no extra power nor computing resource requirement. This is achieved by leveraging the ambient backscatter communications in which secret information can be transmitted by backscattering it over ambient radio signals. Specifically, the original message at the transmitter is first encoded into two parts: (i) active transmit message and (ii) backscatter message. The active transmit message is then transmitted by using the conventional wireless transmission method while the backscatter message is transmitted by backscattering it on the active transmit signals via an ambient backscatter tag. As the backscatter tag does not generate any active RF signals, it is intractable for the eavesdropper to detect the backscatter message. Therefore, secret information, e.g., secret key for decryption, can be carried by the backscattered message, making the adversary unable to decode the original message. Simulation results demonstrate that our proposed solution can significantly enhance security protection for communication systems.
△ Less
Submitted 1 June, 2023; v1 submitted 16 January, 2022;
originally announced January 2022.
-
Active Learning at the ImageNet Scale
Authors:
Zeyad Ali Sami Emam,
Hong-Min Chu,
Ping-Yeh Chiang,
Wojciech Czaja,
Richard Leapman,
Micah Goldblum,
Tom Goldstein
Abstract:
Active learning (AL) algorithms aim to identify an optimal subset of data for annotation, such that deep neural networks (DNN) can achieve better performance when trained on this labeled subset. AL is especially impactful in industrial scale settings where data labeling costs are high and practitioners use every tool at their disposal to improve model performance. The recent success of self-superv…
▽ More
Active learning (AL) algorithms aim to identify an optimal subset of data for annotation, such that deep neural networks (DNN) can achieve better performance when trained on this labeled subset. AL is especially impactful in industrial scale settings where data labeling costs are high and practitioners use every tool at their disposal to improve model performance. The recent success of self-supervised pretraining (SSP) highlights the importance of harnessing abundant unlabeled data to boost model performance. By combining AL with SSP, we can make use of unlabeled data while simultaneously labeling and training on particularly informative samples.
In this work, we study a combination of AL and SSP on ImageNet. We find that performance on small toy datasets -- the typical benchmark setting in the literature -- is not representative of performance on ImageNet due to the class imbalanced samples selected by an active learner. Among the existing baselines we test, popular AL algorithms across a variety of small and large scale settings fail to outperform random sampling. To remedy the class-imbalance problem, we propose Balanced Selection (BASE), a simple, scalable AL algorithm that outperforms random sampling consistently by selecting more balanced samples for annotation than existing methods. Our code is available at: https://github.com/zeyademam/active_learning .
△ Less
Submitted 24 November, 2021;
originally announced November 2021.
-
JoinABLe: Learning Bottom-up Assembly of Parametric CAD Joints
Authors:
Karl D. D. Willis,
Pradeep Kumar Jayaraman,
Hang Chu,
Yunsheng Tian,
Yifei Li,
Daniele Grandi,
Aditya Sanghi,
Linh Tran,
Joseph G. Lambourne,
Armando Solar-Lezama,
Wojciech Matusik
Abstract:
Physical products are often complex assemblies combining a multitude of 3D parts modeled in computer-aided design (CAD) software. CAD designers build up these assemblies by aligning individual parts to one another using constraints called joints. In this paper we introduce JoinABLe, a learning-based method that assembles parts together to form joints. JoinABLe uses the weak supervision available i…
▽ More
Physical products are often complex assemblies combining a multitude of 3D parts modeled in computer-aided design (CAD) software. CAD designers build up these assemblies by aligning individual parts to one another using constraints called joints. In this paper we introduce JoinABLe, a learning-based method that assembles parts together to form joints. JoinABLe uses the weak supervision available in standard parametric CAD files without the help of object class labels or human guidance. Our results show that by making network predictions over a graph representation of solid models we can outperform multiple baseline methods with an accuracy (79.53%) that approaches human performance (80%). Finally, to support future research we release the Fusion 360 Gallery assembly dataset, containing assemblies with rich information on joints, contact surfaces, holes, and the underlying assembly graph structure.
△ Less
Submitted 22 April, 2022; v1 submitted 24 November, 2021;
originally announced November 2021.
-
Meta-Auto-Decoder for Solving Parametric Partial Differential Equations
Authors:
Xiang Huang,
Zhanhong Ye,
Hongsheng Liu,
Beiji Shi,
Zidong Wang,
Kang Yang,
Yang Li,
Bingya Weng,
Min Wang,
Haotian Chu,
Fan Yu,
Bei Hua,
Lei Chen,
Bin Dong
Abstract:
Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i.e., PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (D…
▽ More
Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i.e., PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (DGM) and Physics-Informed Neural Networks (PINNs) aim to approximate the solution of the PDEs. They are typically unsupervised and mesh-free, but require going through the time-consuming network training process from scratch for each set of parameters of the PDE. Another category of methods such as Fourier Neural Operator (FNO) and Deep Operator Network (DeepONet) try to approximate the solution mapping directly. Being fast with only one forward inference for each PDE parameter without retraining, they often require a large corpus of paired input-output observations drawn from numerical simulations, and most of them need a predefined mesh as well. In this paper, we propose Meta-Auto-Decoder (MAD), a mesh-free and unsupervised deep learning method that enables the pre-trained model to be quickly adapted to equation instances by implicitly encoding (possibly heterogenous) PDE parameters as latent vectors. The proposed method MAD can be interpreted by manifold learning in infinite-dimensional spaces, granting it a geometric insight. Extensive numerical experiments show that the MAD method exhibits faster convergence speed without losing accuracy than other deep learning-based methods. The project page with code is available: https://gitee.com/mindspore/mindscience/tree/master/MindElec/.
△ Less
Submitted 18 November, 2022; v1 submitted 14 November, 2021;
originally announced November 2021.
-
Solving Partial Differential Equations with Point Source Based on Physics-Informed Neural Networks
Authors:
Xiang Huang,
Hongsheng Liu,
Beiji Shi,
Zidong Wang,
Kang Yang,
Yang Li,
Bingya Weng,
Min Wang,
Haotian Chu,
Jing Zhou,
Fan Yu,
Bei Hua,
Lei Chen,
Bin Dong
Abstract:
In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However…
▽ More
In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However, they cannot be solved directly by conventional PINNs method due to the singularity brought by the Dirac delta function. We propose a universal solution to tackle this problem with three novel techniques. Firstly the Dirac delta function is modeled as a continuous probability density function to eliminate the singularity; secondly a lower bound constrained uncertainty weighting algorithm is proposed to balance the PINNs losses between point source area and other areas; and thirdly a multi-scale deep neural network with periodic activation function is used to improve the accuracy and convergence speed of the PINNs method. We evaluate the proposed method with three representative PDEs, and the experimental results show that our method outperforms existing deep learning-based methods with respect to the accuracy, the efficiency and the versatility.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic Forecasting
Authors:
Hyunwook Lee,
Seungmin Jin,
Hyeshin Chu,
Hongkyu Lim,
Sungahn Ko
Abstract:
Traffic forecasting is a challenging problem due to complex road networks and sudden speed changes caused by various events on roads. A number of models have been proposed to solve this challenging problem with a focus on learning spatio-temporal dependencies of roads. In this work, we propose a new perspective of converting the forecasting problem into a pattern matching task, assuming that large…
▽ More
Traffic forecasting is a challenging problem due to complex road networks and sudden speed changes caused by various events on roads. A number of models have been proposed to solve this challenging problem with a focus on learning spatio-temporal dependencies of roads. In this work, we propose a new perspective of converting the forecasting problem into a pattern matching task, assuming that large data can be represented by a set of patterns. To evaluate the validness of the new perspective, we design a novel traffic forecasting model, called Pattern-Matching Memory Networks (PM-MemNet), which learns to match input data to the representative patterns with a key-value memory structure. We first extract and cluster representative traffic patterns, which serve as keys in the memory. Then via matching the extracted keys and inputs, PM-MemNet acquires necessary information of existing traffic patterns from the memory and uses it for forecasting. To model spatio-temporal correlation of traffic, we proposed novel memory architecture GCMem, which integrates attention and graph convolution for memory enhancement. The experiment results indicate that PM-MemNet is more accurate than state-of-the-art models, such as Graph WaveNet with higher responsiveness. We also present a qualitative analysis result, describing how PM-MemNet works and achieves its higher accuracy when road speed rapidly changes.
△ Less
Submitted 8 March, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Towards Better Plasticity-Stability Trade-off in Incremental Learning: A Simple Linear Connector
Authors:
Guoliang Lin,
Hanlu Chu,
Hanjiang Lai
Abstract:
Plasticity-stability dilemma is a main problem for incremental learning, where plasticity is referring to the ability to learn new knowledge, and stability retains the knowledge of previous tasks. Many methods tackle this problem by storing previous samples, while in some applications, training data from previous tasks cannot be legally stored. In this work, we propose to employ mode connectivity…
▽ More
Plasticity-stability dilemma is a main problem for incremental learning, where plasticity is referring to the ability to learn new knowledge, and stability retains the knowledge of previous tasks. Many methods tackle this problem by storing previous samples, while in some applications, training data from previous tasks cannot be legally stored. In this work, we propose to employ mode connectivity in loss landscapes to achieve better plasticity-stability trade-off without any previous samples. We give an analysis of why and how to connect two independently optimized optima of networks, null-space projection for previous tasks and simple SGD for the current task, can attain a meaningful balance between preserving already learned knowledge and granting sufficient flexibility for learning a new task. This analysis of mode connectivity also provides us a new perspective and technology to control the trade-off between plasticity and stability. We evaluate the proposed method on several benchmark datasets. The results indicate our simple method can achieve notable improvement, and perform well on both the past and current tasks. On 10-split-CIFAR-100 task, our method achieves 79.79% accuracy, which is 6.02% higher. Our method also achieves 6.33% higher accuracy on TinyImageNet. Code is available at https://github.com/lingl1024/Connector.
△ Less
Submitted 14 March, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Authors:
Aditya Sanghi,
Hang Chu,
Joseph G. Lambourne,
Ye Wang,
Chin-Yi Cheng,
Marco Fumero,
Kamal Rahimi Malekshan
Abstract:
Generating shapes using natural language can enable new ways of imagining and creating the things around us. While significant recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zero-shot text-to-shape generation that…
▽ More
Generating shapes using natural language can enable new ways of imagining and creating the things around us. While significant recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zero-shot text-to-shape generation that circumvents such data scarcity. Our proposed method, named CLIP-Forge, is based on a two-stage training process, which only depends on an unlabelled shape dataset and a pre-trained image-text network such as CLIP. Our method has the benefits of avoiding expensive inference time optimization, as well as the ability to generate multiple shapes for a given text. We not only demonstrate promising zero-shot generalization of the CLIP-Forge model qualitatively and quantitatively, but also provide extensive comparative evaluations to better understand its behavior.
△ Less
Submitted 28 April, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Learning from Language Description: Low-shot Named Entity Recognition via Decomposed Framework
Authors:
Yaqing Wang,
Haoda Chu,
Chao Zhang,
Jing Gao
Abstract:
In this work, we study the problem of named entity recognition (NER) in a low resource scenario, focusing on few-shot and zero-shot settings. Built upon large-scale pre-trained language models, we propose a novel NER framework, namely SpanNER, which learns from natural language supervision and enables the identification of never-seen entity classes without using in-domain labeled data. We perform…
▽ More
In this work, we study the problem of named entity recognition (NER) in a low resource scenario, focusing on few-shot and zero-shot settings. Built upon large-scale pre-trained language models, we propose a novel NER framework, namely SpanNER, which learns from natural language supervision and enables the identification of never-seen entity classes without using in-domain labeled data. We perform extensive experiments on 5 benchmark datasets and evaluate the proposed method in the few-shot learning, domain transfer and zero-shot learning settings. The experimental results show that the proposed method can bring 10%, 23% and 26% improvements in average over the best baselines in few-shot learning, domain transfer and zero-shot learning settings respectively.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies
Authors:
Dominic Roberts,
Ara Danielyan,
Hang Chu,
Mani Golparvar-Fard,
David Forsyth
Abstract:
Generative models for 3D shapes represented by hierarchies of parts can generate realistic and diverse sets of outputs. However, existing models suffer from the key practical limitation of modelling shapes holistically and thus cannot perform conditional sampling, i.e. they are not able to generate variants on individual parts of generated shapes without modifying the rest of the shape. This is li…
▽ More
Generative models for 3D shapes represented by hierarchies of parts can generate realistic and diverse sets of outputs. However, existing models suffer from the key practical limitation of modelling shapes holistically and thus cannot perform conditional sampling, i.e. they are not able to generate variants on individual parts of generated shapes without modifying the rest of the shape. This is limiting for applications such as 3D CAD design that involve adjusting created shapes at multiple levels of detail. To address this, we introduce LSD-StructureNet, an augmentation to the StructureNet architecture that enables re-generation of parts situated at arbitrary positions in the hierarchies of its outputs. We achieve this by learning individual, probabilistic conditional decoders for each hierarchy depth. We evaluate LSD-StructureNet on the PartNet dataset, the largest dataset of 3D shapes represented by hierarchies of parts. Our results show that contrarily to existing methods, LSD-StructureNet can perform conditional sampling without impacting inference speed or the realism and diversity of its outputs.
△ Less
Submitted 7 September, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.