Search | arXiv e-print repository

Efficient Continual Learning with Low Memory Footprint For Edge Device

Authors: Zeqing Wang, Fei Cheng, Kangye Ji, Bohu Huang

Abstract: Continual learning(CL) is a useful technique to acquire dynamic knowledge continually. Although powerful cloud platforms can fully exert the ability of CL,e.g., customized recommendation systems, similar personalized requirements for edge devices are almost disregarded. This phenomenon stems from the huge resource overhead involved in training neural networks and overcoming the forgetting problem… ▽ More Continual learning(CL) is a useful technique to acquire dynamic knowledge continually. Although powerful cloud platforms can fully exert the ability of CL,e.g., customized recommendation systems, similar personalized requirements for edge devices are almost disregarded. This phenomenon stems from the huge resource overhead involved in training neural networks and overcoming the forgetting problem of CL. This paper focuses on these scenarios and proposes a compact algorithm called LightCL. Different from other CL methods bringing huge resource consumption to acquire generalizability among all tasks for delaying forgetting, LightCL compress the resource consumption of already generalized components in neural networks and uses a few extra resources to improve memory in other parts. We first propose two new metrics of learning plasticity and memory stability to seek generalizability during CL. Based on the discovery that lower and middle layers have more generalizability and deeper layers are opposite, we $\textit{Maintain Generalizability}$ by freezing the lower and middle layers. Then, we $\textit{Memorize Feature Patterns}$ to stabilize the feature extracting patterns of previous tasks to improve generalizability in deeper layers. In the experimental comparison, LightCL outperforms other SOTA methods in delaying forgetting and reduces at most $\textbf{6.16$\times$}$ memory footprint, proving the excellent performance of LightCL in efficiency. We also evaluate the efficiency of our method on an edge device, the Jetson Nano, which further proves our method's practical effectiveness. △ Less

Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.03963 [pdf, other]

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03314 [pdf, other]

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimum elements and presents them in a graph structure. Element-wise style enables easy understanding, and structural composition liberates difficult locating. Careful prompt design births the BACON captions with the help of public-available VLMs and segmentation methods. In this way, we gather a dataset with 100K annotated images, which endow VLMs with remarkable capabilities, such as accurately generating BACON, transforming prompts into BACON format, envisioning scenarios in the style of BACONr, and dynamically modifying elements within BACON through interactive dialogue and more. Wide representative experiments, including detection, VQA, and image generation tasks, tell BACON as a lifeline to achieve previous out-of-reach tasks or excel in their current cutting-edge solutions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.10432 [pdf, other]

Enhancing In-Context Learning with Semantic Representations for Relation Extraction

Authors: Peitao Han, Lis Kanashiro Pereira, Fei Cheng, Wan Jou She, Eiji Aramaki

Abstract: In this work, we employ two AMR-enhanced semantic representations for ICL on RE: one that explores the AMR structure generated for a sentence at the subgraph level (shortest AMR path), and another that explores the full AMR structure generated for a sentence. In both cases, we demonstrate that all settings benefit from the fine-grained AMR's semantic structure. We evaluate our model on four RE dat… ▽ More In this work, we employ two AMR-enhanced semantic representations for ICL on RE: one that explores the AMR structure generated for a sentence at the subgraph level (shortest AMR path), and another that explores the full AMR structure generated for a sentence. In both cases, we demonstrate that all settings benefit from the fine-grained AMR's semantic structure. We evaluate our model on four RE datasets. Our results show that our model can outperform the GPT-based baselines, and achieve SOTA performance on two of the datasets, and competitive performance on the other two. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.06847 [pdf, other]

Generalized W-Net: Arbitrary-style Chinese Character Synthesization

Authors: Haochuan Jiang, Guanyu Yang, Fei Cheng, Kaizhu Huang

Abstract: Synthesizing Chinese characters with consistent style using few stylized examples is challenging. Existing models struggle to generate arbitrary style characters with limited examples. In this paper, we propose the Generalized W-Net, a novel class of W-shaped architectures that addresses this. By incorporating Adaptive Instance Normalization and introducing multi-content, our approach can synthesi… ▽ More Synthesizing Chinese characters with consistent style using few stylized examples is challenging. Existing models struggle to generate arbitrary style characters with limited examples. In this paper, we propose the Generalized W-Net, a novel class of W-shaped architectures that addresses this. By incorporating Adaptive Instance Normalization and introducing multi-content, our approach can synthesize Chinese characters in any desired style, even with limited examples. It handles seen and unseen styles during training and can generate new character contents. Experimental results demonstrate the effectiveness of our approach. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Journal ref: International Conference on Brain Inspired Cognitive Systems 2023

arXiv:2405.19209 [pdf, other]

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Authors: Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

Abstract: Video-language understanding tasks have focused on short video clips, often struggling with long-form video understanding tasks. Recently, many long video-language understanding approaches have leveraged the reasoning capabilities of Large Language Models (LLMs) to perform long video QA, transforming videos into densely sampled frame captions, and asking LLMs to respond to text queries over captio… ▽ More Video-language understanding tasks have focused on short video clips, often struggling with long-form video understanding tasks. Recently, many long video-language understanding approaches have leveraged the reasoning capabilities of Large Language Models (LLMs) to perform long video QA, transforming videos into densely sampled frame captions, and asking LLMs to respond to text queries over captions. However, the frames used for captioning are often redundant and contain irrelevant information, making dense sampling inefficient, and ignoring the fact that video QA requires varying levels of granularity, with some video segments being highly relevant to the question (needing more fine-grained detail) while others being less relevant. Thus, these LLM-based approaches are prone to missing information and operate on large numbers of irrelevant captions, lowering both performance and efficiency. To address these issues, we introduce VideoTree, a query-adaptive and hierarchical framework for long-video understanding with LLMs. VideoTree dynamically extracts query-related information from a video and builds a tree-based representation for LLM reasoning. First, VideoTree adaptively selects frames for captioning by iteratively clustering frames based on their visual features and scoring clusters using their relevance to the query. Second, it organizes visual clusters into a query-adaptive and hierarchical tree structure; the tree encodes varying levels of granularity, with higher resolution on relevant segments. Finally, VideoTree produces an answer by traversing the tree's keyframes and passing their captions to an LLM answerer. Our method improves both reasoning accuracy and efficiency compared to existing methods: VideoTree achieves a 7.0%, 2.2%, and 2.7% accuracy gain over baselines on the EgoSchema, NExT-QA, and IntentQA benchmarks, respectively, while reducing inference time by 40%. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 20 pages, first three authors contributed equally; Project page: https://videotree2024.github.io/

arXiv:2405.17137 [pdf, other]

Jump-teaching: Ultra Efficient and Robust Learning with Noisy Label

Authors: Kangye Ji, Fei Cheng, Zeqing Wang, Bohu Huang

Abstract: Sample selection is the most straightforward technique to combat label noise, aiming to distinguish mislabeled samples during training and avoid the degradation of the robustness of the model. In the workflow, $\textit{selecting possibly clean data}$ and $\textit{model update}$ are iterative. However, their interplay and intrinsic characteristics hinder the robustness and efficiency of learning wi… ▽ More Sample selection is the most straightforward technique to combat label noise, aiming to distinguish mislabeled samples during training and avoid the degradation of the robustness of the model. In the workflow, $\textit{selecting possibly clean data}$ and $\textit{model update}$ are iterative. However, their interplay and intrinsic characteristics hinder the robustness and efficiency of learning with noisy labels: 1) The model chooses clean data with selection bias, leading to the accumulated error in the model update. 2) Most selection strategies leverage partner networks or supplementary information to mitigate label corruption, albeit with increased computation resources and lower throughput speed. Therefore, we employ only one network with the jump manner update to decouple the interplay and mine more semantic information from the loss for a more precise selection. Specifically, the selection of clean data for each model update is based on one of the prior models, excluding the last iteration. The strategy of model update exhibits a jump behavior in the form. Moreover, we map the outputs of the network and labels into the same semantic feature space, respectively. In this space, a detailed and simple loss distribution is generated to distinguish clean samples more effectively. Our proposed approach achieves almost up to $2.53\times$ speedup, $0.46\times$ peak memory footprint, and superior robustness over state-of-the-art works with various noise settings. △ Less

Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.11921 [pdf, other]

MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections

Authors: Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan

Abstract: 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting.… ▽ More 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting. The key insight is grounded on the mirror symmetry between the real-world space and the virtual mirror space. We introduce an intuitive dual-rendering strategy that enables differentiable rasterization of both the real-world 3D Gaussians and the mirrored counterpart obtained by reflecting the former about the mirror plane. All 3D Gaussians are jointly optimized with the mirror plane in an end-to-end framework. MirrorGaussian achieves high-quality and real-time rendering in scenes with mirrors, empowering scene editing like adding new mirrors and objects. Comprehensive experiments on multiple datasets demonstrate that our approach significantly outperforms existing methods, achieving state-of-the-art results. Project page: https://mirror-gaussian.github.io/. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.03913 [pdf, other]

Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process

Authors: Fuqiang Cheng, Wei Xie, Hua Zheng

Abstract: Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider… ▽ More Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider the cell culture process multi-scale mechanistic model, also known as Biological System-of-Systems (Bio-SoS). This model with a modular design, composed of sub-models, allows us to integrate data across various production processes. To calibrate the Bio-SoS digital twin, we evaluate the mean squared error of model prediction and develop a computational approach to quantify the impact of parameter estimation error of individual sub-models on the prediction accuracy of digital twin, which can guide sample-efficient and interpretable DoEs. △ Less

Submitted 28 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.00708 [pdf, other]

Interactive Analysis of LLMs using Meaningful Counterfactuals

Authors: Furui Cheng, Vilém Zouhar, Robin Shing Moon Chan, Daniel Fürst, Hendrik Strobelt, Mennatallah El-Assady

Abstract: Counterfactual examples are useful for exploring the decision boundaries of machine learning models and determining feature attributions. How can we apply counterfactual-based methods to analyze and explain LLMs? We identify the following key challenges. First, the generated textual counterfactuals should be meaningful and readable to users and thus can be mentally compared to draw conclusions. Se… ▽ More Counterfactual examples are useful for exploring the decision boundaries of machine learning models and determining feature attributions. How can we apply counterfactual-based methods to analyze and explain LLMs? We identify the following key challenges. First, the generated textual counterfactuals should be meaningful and readable to users and thus can be mentally compared to draw conclusions. Second, to make the solution scalable to long-form text, users should be equipped with tools to create batches of counterfactuals from perturbations at various granularity levels and interactively analyze the results. In this paper, we tackle the above challenges and contribute 1) a novel algorithm for generating batches of complete and meaningful textual counterfactuals by removing and replacing text segments in different granularities, and 2) LLM Analyzer, an interactive visualization tool to help users understand an LLM's behaviors by interactively inspecting and aggregating meaningful counterfactuals. We evaluate the proposed algorithm by the grammatical correctness of its generated counterfactuals using 1,000 samples from medical, legal, finance, education, and news datasets. In our experiments, 97.2% of the counterfactuals are grammatically correct. Through a use case, user studies, and feedback from experts, we demonstrate the usefulness and usability of the proposed interactive visualization tool. △ Less

Submitted 23 April, 2024; originally announced May 2024.

ACM Class: I.2.7; H.5.2

arXiv:2404.10209 [pdf, other]

Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models

Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen

Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interact… ▽ More The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interaction tasks to enhance user experience and accessibility. DB-GPT is designed to understand data interaction tasks described by natural language and provide context-aware responses powered by LLMs, making it an indispensable tool for users ranging from novice to expert. Its system design supports deployment across local, distributed, and cloud environments. Beyond handling basic data interaction tasks like Text-to-SQL with LLMs, it can handle complex tasks like generative data analysis through a Multi-Agents framework and the Agentic Workflow Expression Language (AWEL). The Service-oriented Multi-model Management Framework (SMMF) ensures data privacy and security, enabling users to employ DB-GPT with private LLMs. Additionally, DB-GPT offers a series of product-ready features designed to enable users to integrate DB-GPT within their product environments easily. The code of DB-GPT is available at Github(https://github.com/eosphoros-ai/DB-GPT) which already has over 10.7k stars. Please install DB-GPT for your own usage with the instructions(https://github.com/eosphoros-ai/DB-GPT#install) and watch a 5-minute introduction video on Youtube(https://youtu.be/n_8RI1ENyl4) to further investigate DB-GPT. △ Less

Submitted 24 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.18504 [pdf]

AcTED: Automatic Acquisition of Typical Event Duration for Semi-supervised Temporal Commonsense QA

Authors: Felix Virgo, Fei Cheng, Lis Kanashiro Pereira, Masayuki Asahara, Ichiro Kobayashi, Sadao Kurohashi

Abstract: We propose a voting-driven semi-supervised approach to automatically acquire the typical duration of an event and use it as pseudo-labeled data. The human evaluation demonstrates that our pseudo labels exhibit surprisingly high accuracy and balanced coverage. In the temporal commonsense QA task, experimental results show that using only pseudo examples of 400 events, we achieve performance compara… ▽ More We propose a voting-driven semi-supervised approach to automatically acquire the typical duration of an event and use it as pseudo-labeled data. The human evaluation demonstrates that our pseudo labels exhibit surprisingly high accuracy and balanced coverage. In the temporal commonsense QA task, experimental results show that using only pseudo examples of 400 events, we achieve performance comparable to the existing BERT-based weakly supervised approaches that require a significant amount of training examples. When compared to the RoBERTa baselines, our best approach establishes state-of-the-art performance with a 7% improvement in Exact Match. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.14970 [pdf]

doi 10.1016/j.scib.2024.03.052

Quantum spin driven Yu-Shiba-Rusinov multiplets and fermion-parity-preserving phase transition in K$_3$C$_{60}$

Authors: Shu-Ze Wang, Xue-Qing Yu, Li-Xuan Wei, Li Wang, Qiang-Jun Cheng, Kun Peng, Fang-Jun Cheng, Yu Liu, Fang-Sen Li, Xu-Cun Ma, Qi-Kun Xue, Can-Li Song

Abstract: Magnetic impurities in superconductors are of increasing interest due to emergent Yu-Shiba-Rusinov (YSR) states and Majorana zero modes for fault-tolerant quantum computation. However, a direct relationship between the YSR multiple states and magnetic anisotropy splitting of quantum impurity spins remains poorly characterized. By using scanning tunneling microscopy, we resolve systematically indiv… ▽ More Magnetic impurities in superconductors are of increasing interest due to emergent Yu-Shiba-Rusinov (YSR) states and Majorana zero modes for fault-tolerant quantum computation. However, a direct relationship between the YSR multiple states and magnetic anisotropy splitting of quantum impurity spins remains poorly characterized. By using scanning tunneling microscopy, we resolve systematically individual transition-metal (Fe, Cr and Ni) impurities induced YSR multiplets as well as their Zeeman effects in K$_3$C$_{60}$ superconductor. The YSR multiplets show identical $d$ orbital-like wave functions that are symmetry-mismatched to the threefold K$_3$C$_{60}$(111) host surface, breaking point-group symmetries of the spatial distribution of YSR bound states in real space. Remarkably, we identify an unprecedented fermion-parity-preserving quantum phase transition between ground states with opposite signs of the uniaxial magnetic anisotropy that can be manipulated by an external magnetic field. These findings can be readily understood in terms of anisotropy splitting of quantum impurity spins, and thus elucidate the intricate interplay between the magnetic anisotropy and YSR multiplets. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 38 pages, 4 figures in the main text

Journal ref: Science Bulletin 69, 1392 (2024)

arXiv:2403.11517 [pdf, other]

Inter-individual and inter-site neural code conversion and image reconstruction without shared stimuli

Authors: Haibao Wang, Jun Kai Ho, Fan L. Cheng, Shuntaro C. Aoki, Yusuke Muraki, Misato Tanaka, Yukiyasu Kamitani

Abstract: The human brain demonstrates substantial inter-individual variability in fine-grained functional topography, posing challenges in identifying common neural representations across individuals. Functional alignment has the potential to harmonize these individual differences. However, it typically requires an identical set of stimuli presented to different individuals, which is often unavailable. To… ▽ More The human brain demonstrates substantial inter-individual variability in fine-grained functional topography, posing challenges in identifying common neural representations across individuals. Functional alignment has the potential to harmonize these individual differences. However, it typically requires an identical set of stimuli presented to different individuals, which is often unavailable. To address this, we propose a content loss-based neural code converter, designed to convert brain activity from one subject to another representing the same content. The converter is optimized so that the source subject's converted brain activity is decoded into a latent image representation that closely resembles that of the stimulus given to the source subject. We show that converters optimized using hierarchical image representations achieve conversion accuracy comparable to those optimized by paired brain activity as in conventional methods. The brain activity converted from a different individual and even from a different site sharing no stimuli produced reconstructions that approached the quality of within-individual reconstructions. The converted brain activity had a generalizable representation that can be read out by different decoding schemes. The converter required much fewer training samples than that typically required for decoder training to produce recognizable reconstructions. These results demonstrate that our method can effectively combine image representations to convert brain activity across individuals without the need for shared stimuli, providing a promising tool for flexibly aligning data from complex cognitive tasks and a basis for brain-to-brain communication. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.08755 [pdf, other]

DAM: Dynamic Adapter Merging for Continual Video QA Learning

Authors: Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

Abstract: We present a parameter-efficient method for continual video question-answering (VidQA) learning. Our method, named DAM, uses the proposed Dynamic Adapter Merging to (i) mitigate catastrophic forgetting, (ii) enable efficient adaptation to continually arriving datasets, (iii) handle inputs from unknown datasets during inference, and (iv) enable knowledge sharing across similar dataset domains. Give… ▽ More We present a parameter-efficient method for continual video question-answering (VidQA) learning. Our method, named DAM, uses the proposed Dynamic Adapter Merging to (i) mitigate catastrophic forgetting, (ii) enable efficient adaptation to continually arriving datasets, (iii) handle inputs from unknown datasets during inference, and (iv) enable knowledge sharing across similar dataset domains. Given a set of continually streaming VidQA datasets, we sequentially train dataset-specific adapters for each dataset while freezing the parameters of a large pretrained video-language backbone. During inference, given a video-question sample from an unknown domain, our method first uses the proposed non-parametric router function to compute a probability for each adapter, reflecting how relevant that adapter is to the current video-question input instance. Subsequently, the proposed dynamic adapter merging scheme aggregates all the adapter weights into a new adapter instance tailored for that particular test sample to compute the final VidQA prediction, mitigating the impact of inaccurate router predictions and facilitating knowledge sharing across domains. Our DAM model outperforms prior state-of-the-art continual learning approaches by 9.1% while exhibiting 1.9% less forgetting on 6 VidQA datasets spanning various domains. We further extend DAM to continual image classification and image QA and outperform prior methods by a large margin. The code is publicly available at: https://github.com/klauscc/DAM △ Less

Submitted 22 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: The first two authors contribute equally

arXiv:2403.03690 [pdf]

Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

Authors: Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi

Abstract: The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos… ▽ More The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propose an efficient self-instruct method based on GPT-4. We first translate a small amount of English instructions into Japanese and post-edit them to obtain native-level quality. GPT-4 then utilizes them as demonstrations to automatically generate Japanese instruction data. We also construct an evaluation benchmark containing 80 questions across 8 categories, using GPT-4 to automatically assess the response quality of LLMs without human references. The empirical results suggest that the models fine-tuned on our GPT-4 self-instruct data significantly outperformed the Japanese-Alpaca across all three base pre-trained models. Our GPT-4 self-instruct data allowed the LLaMA 13B model to defeat GPT-3.5 (Davinci-003) with a 54.37\% win-rate. The human evaluation exhibits the consistency between GPT-4's assessments and human preference. Our high-quality instruction data and evaluation benchmark have been released here. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: COLING 2024. Our code are available here: \href{https://github.com/hitoshizuku7/awesome-Ja-self-instruct}{self-instruct data} and \href{https://github.com/ku-nlp/ja-vicuna-qa-benchmark}{evaluation benchmark}

arXiv:2403.00667 [pdf, other]

doi 10.1038/s41550-024-02200-3

Physical properties of asteroid Dimorphos as derived from the DART impact

Authors: S. D. Raducan, M. Jutzi, A. F. Cheng, Y. Zhang, O. Barnouin, G. S. Collins, R. T. Daly, T. M. Davison, C. M. Ernst, T. L. Farnham, F. Ferrari, M. Hirabayashi, K. M. Kumamoto, P. Michel, N. Murdoch, R. Nakano, M. Pajola, A. Rossi, H. F. Agrusa, B. W. Barbee, M. Bruck Syal, N. L. Chabot, E. Dotto, E. G. Fahnestock, P. H. Hasselmann , et al. (17 additional authors not shown)

Abstract: On September 26, 2022, NASA's Double Asteroid Redirection Test (DART) mission successfully impacted Dimorphos, the natural satellite of the binary near-Earth asteroid (65803) Didymos. Numerical simulations of the impact provide a means to explore target surface material properties and structures, consistent with the observed momentum deflection efficiency, ejecta cone geometry, and ejected mass. O… ▽ More On September 26, 2022, NASA's Double Asteroid Redirection Test (DART) mission successfully impacted Dimorphos, the natural satellite of the binary near-Earth asteroid (65803) Didymos. Numerical simulations of the impact provide a means to explore target surface material properties and structures, consistent with the observed momentum deflection efficiency, ejecta cone geometry, and ejected mass. Our simulation, which best matches observations, indicates that Dimorphos is weak, with a cohesive strength of less than a few pascals (Pa), similar to asteroids (162173) Ryugu and (101955) Bennu. We find that a bulk density of Dimorphos, rhoB, lower than 2400 kg/m3, and a low volume fraction of boulders (<40 vol%) on the surface and in the shallow subsurface, are consistent with measured data from the DART experiment. These findings suggest Dimorphos is a rubble pile that might have formed through rotational mass shedding and re-accumulation from Didymos. Our simulations indicate that the DART impact caused global deformation and resurfacing of Dimorphos. ESA's upcoming Hera mission may find a re-shaped asteroid, rather than a well-defined crater. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.00891 [pdf, other]

Large Language Models in Cybersecurity: State-of-the-Art

Authors: Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd, Pejman Najafi, Feng Cheng, Christoph Meinel

Abstract: The rise of Large Language Models (LLMs) has revolutionized our comprehension of intelligence bringing us closer to Artificial Intelligence. Since their introduction, researchers have actively explored the applications of LLMs across diverse fields, significantly elevating capabilities. Cybersecurity, traditionally resistant to data-driven solutions and slow to embrace machine learning, stands out… ▽ More The rise of Large Language Models (LLMs) has revolutionized our comprehension of intelligence bringing us closer to Artificial Intelligence. Since their introduction, researchers have actively explored the applications of LLMs across diverse fields, significantly elevating capabilities. Cybersecurity, traditionally resistant to data-driven solutions and slow to embrace machine learning, stands out as a domain. This study examines the existing literature, providing a thorough characterization of both defensive and adversarial applications of LLMs within the realm of cybersecurity. Our review not only surveys and categorizes the current landscape but also identifies critical research gaps. By evaluating both offensive and defensive applications, we aim to provide a holistic understanding of the potential risks and opportunities associated with LLM-driven cybersecurity. △ Less

Submitted 30 January, 2024; originally announced February 2024.

arXiv:2401.00954 [pdf, other]

Radiation Pressure Induced Oscillations of an Optically Levitating Mirror

Authors: Satyam Shekhar Jha, Tal Carmon, Fan Cheng, Lev Deych

Abstract: Optical Fabry-Perot cavity with a movable mirror is a paradigmatic optomechanical systems. While usually the mirror is supported by a mechanical spring, it has been shown that it is possible to keep one of the mirrors in a stable equilibrium purely by optical levitation without any mechanical support. In this work we expand previous studies of nonlinear dynamics of such a system by demonstrating a… ▽ More Optical Fabry-Perot cavity with a movable mirror is a paradigmatic optomechanical systems. While usually the mirror is supported by a mechanical spring, it has been shown that it is possible to keep one of the mirrors in a stable equilibrium purely by optical levitation without any mechanical support. In this work we expand previous studies of nonlinear dynamics of such a system by demonstrating a possibility for mechanical parametric instability and emergence of the ``phonon laser'' phenomenon. △ Less

Submitted 10 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: 11 pages, 7 figures

arXiv:2312.17449 [pdf, other]

DB-GPT: Empowering Database Interactions with Private Large Language Models

Authors: Siqiao Xue, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Danrui Qi, Hong Yi, Shaodong Liu, Faqiang Chen

Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user… ▽ More The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user experience and accessibility. DB-GPT is designed to understand natural language queries, provide context-aware responses, and generate complex SQL queries with high accuracy, making it an indispensable tool for users ranging from novice to expert. The core innovation in DB-GPT lies in its private LLM technology, which is fine-tuned on domain-specific corpora to maintain user privacy and ensure data security while offering the benefits of state-of-the-art LLMs. We detail the architecture of DB-GPT, which includes a novel retrieval augmented generation (RAG) knowledge system, an adaptive learning mechanism to continuously improve performance based on user feedback and a service-oriented multi-model framework (SMMF) with powerful data-driven agents. Our extensive experiments and user studies confirm that DB-GPT represents a paradigm shift in database interactions, offering a more natural, efficient, and secure way to engage with data repositories. The paper concludes with a discussion of the implications of DB-GPT framework on the future of human-database interaction and outlines potential avenues for further enhancements and applications in the field. The project code is available at https://github.com/eosphoros-ai/DB-GPT. Experience DB-GPT for yourself by installing it with the instructions https://github.com/eosphoros-ai/DB-GPT#install and view a concise 10-minute video at https://www.youtube.com/watch?v=KYs4nTDzEhk. △ Less

Submitted 3 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.12632 [pdf]

doi 10.1364/PRJ.505164

Cavity Continuum

Authors: Fan Cheng, Vladimir Shuvayev, Mark Douvidzon, Lev Deych, Tal Carmon

Abstract: We experimentally demonstrate and numerically analyze large arrays of whispering gallery resonators. Using fluorescent mapping, we measure the spatial distribution of the cavity-ensemble's resonances, revealing that light reaches distant resonators in various ways, including while passing through dark gaps, resonator groups, or resonator lines. Energy spatially decays exponentially in the cavities… ▽ More We experimentally demonstrate and numerically analyze large arrays of whispering gallery resonators. Using fluorescent mapping, we measure the spatial distribution of the cavity-ensemble's resonances, revealing that light reaches distant resonators in various ways, including while passing through dark gaps, resonator groups, or resonator lines. Energy spatially decays exponentially in the cavities. Our practically infinite periodic array of resonators, with a quality factor [Q] exceeding 10^7, might impact a new type of photonic ensembles for nonlinear optics and lasers using our cavity continuum that is distributed, while having high-Q resonators as unit cells. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.12414 [pdf, ps, other]

Translating Natural Language Queries to SQL Using the T5 Model

Authors: Albert Wong, Lien Pham, Young Lee, Shek Chan, Razel Sadaya, Youry Khmelevsky, Mathias Clement, Florence Wing Yau Cheng, Joe Mahony, Michael Ferri

Abstract: This paper presents the development process of a natural language to SQL model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73\% and 84\% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used s… ▽ More This paper presents the development process of a natural language to SQL model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73\% and 84\% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used successfully on a daily basis. The approach used in the model development could be implemented in a similar fashion for other database environments and with a more powerful pre-trained language model. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.03814 [pdf, other]

Pearl: A Production-ready Reinforcement Learning Agent

Authors: Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

Abstract: Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety const… ▽ More Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety constraints are met. Despite considerable progress made by the RL research community in addressing these issues, existing open-source RL libraries tend to focus on a narrow portion of the RL solution pipeline, leaving other aspects largely unattended. This paper introduces Pearl, a Production-ready RL agent software package explicitly designed to embrace these challenges in a modular fashion. In addition to presenting preliminary benchmark results, this paper highlights Pearl's industry adoptions to demonstrate its readiness for production usage. Pearl is open sourced on Github at github.com/facebookresearch/pearl and its official website is located at pearlagent.github.io. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.18259 [pdf, other]

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. Project page: http://ego-exo4d-data.org/ △ Less

Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

arXiv:2311.16842 [pdf, other]

doi 10.1145/3613904.3641904

RELIC: Investigating Large Language Model Responses using Self-Consistency

Authors: Furui Cheng, Vilém Zouhar, Simran Arora, Mrinmaya Sachan, Hendrik Strobelt, Mennatallah El-Assady

Abstract: Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence… ▽ More Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence in individual claims in the generated texts. Using this idea, we design RELIC, an interactive system that enables users to investigate and verify semantic-level variations in multiple long-form responses. This allows users to recognize potentially inaccurate information in the generated text and make necessary corrections. From a user study with ten participants, we demonstrate that our approach helps users better verify the reliability of the generated text. We further summarize the design implications and lessons learned from this research for future studies of reliable human-LLM interactions. △ Less

Submitted 4 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.14381 [pdf]

Potential Societal Biases of ChatGPT in Higher Education: A Scoping Review

Authors: Ming Li, Ariunaa Enkhtur, Beverley Anne Yamamoto, Fei Cheng, Lilan Chen

Abstract: Purpose:Generative Artificial Intelligence (GAI) models, such as ChatGPT, may inherit or amplify societal biases due to their training on extensive datasets. With the increasing usage of GAI by students, faculty, and staff in higher education institutions (HEIs), it is urgent to examine the ethical issues and potential biases associated with these technologies. Design/Approach/Methods:This scoping… ▽ More Purpose:Generative Artificial Intelligence (GAI) models, such as ChatGPT, may inherit or amplify societal biases due to their training on extensive datasets. With the increasing usage of GAI by students, faculty, and staff in higher education institutions (HEIs), it is urgent to examine the ethical issues and potential biases associated with these technologies. Design/Approach/Methods:This scoping review aims to elucidate how biases related to GAI in HEIs have been researched and discussed in recent academic publications. We categorized the potential societal biases that GAI might cause in the field of higher education. Our review includes articles written in English, Chinese, and Japanese across four main databases, focusing on GAI usage in higher education and bias. Findings:Our findings reveal that while there is meaningful scholarly discussion around bias and discrimination concerning LLMs in the AI field, most articles addressing higher education approach the issue superficially. Few articles identify specific types of bias under different circumstances, and there is a notable lack of empirical research. Most papers in our review focus primarily on educational and research fields related to medicine and engineering, with some addressing English education. However, there is almost no discussion regarding the humanities and social sciences. Additionally, a significant portion of the current discourse is in English and primarily addresses English-speaking contexts. Originality/Value:To the best of our knowledge, our study is the first to summarize the potential societal biases in higher education. This review highlights the need for more in-depth studies and empirical work to understand the specific biases that GAI might introduce or amplify in educational settings, guiding the development of more ethical AI applications in higher education. △ Less

Submitted 11 July, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

Comments: Work in progress

arXiv:2311.14378 [pdf]

Ethical Implications of ChatGPT in Higher Education: A Scoping Review

Authors: Ming Li, Ariunaa Enkhtur, Fei Cheng, Beverley Anne Yamamoto

Abstract: This scoping review explores the ethical challenges of using ChatGPT in higher education. By reviewing recent academic articles in English, Chinese, and Japanese, we aimed to provide a deep dive review and identify gaps in the literature. Drawing on Arksey and O'Malley's (2005) scoping review framework, we defined search terms and identified relevant publications from four databases in the three t… ▽ More This scoping review explores the ethical challenges of using ChatGPT in higher education. By reviewing recent academic articles in English, Chinese, and Japanese, we aimed to provide a deep dive review and identify gaps in the literature. Drawing on Arksey and O'Malley's (2005) scoping review framework, we defined search terms and identified relevant publications from four databases in the three target languages. The research results showed that the majority of the papers were discussion papers, but there was some early empirical work. The ethical issues highlighted in these works mainly concern academic integrity, assessment issues, and data protection. Given the rapid deployment of generative artificial intelligence, it is imperative for educators to conduct more empirical studies to develop sound ethical policies for its use. △ Less

Submitted 5 June, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

Comments: Accepted by Journal of Interdisciplinary Studies in Education

Journal ref: Volume 13, Issue 1, 2024, pp. 55-68

arXiv:2310.20373 [pdf]

doi 10.1021/acs.nanolett.3c03692

Chiral charge density wave and backscattering-immune orbital texture in monolayer 1T-TiTe2

Authors: Mingqiang Ren, Fangjun Cheng, Yufei Zhao, Mingqiang Gu, Qiangjun Cheng, Binghai Yan, Qihang Liu, Xucun Ma, Qikun Xue, Can-Li Song

Abstract: Non-trivial electronic states are attracting intense attention in low-dimensional physics. Though chirality has been identified in charge states with a scalar order parameter, its intertwining with charge density waves (CDW), film thickness and the impact on the electronic behaviors remain less well understood. Here, using scanning tunneling microscopy, we report a 2 x 2 chiral CDW as well as a st… ▽ More Non-trivial electronic states are attracting intense attention in low-dimensional physics. Though chirality has been identified in charge states with a scalar order parameter, its intertwining with charge density waves (CDW), film thickness and the impact on the electronic behaviors remain less well understood. Here, using scanning tunneling microscopy, we report a 2 x 2 chiral CDW as well as a strong suppression of the Te-5p hole-band backscattering in monolayer 1T-TiTe2. These exotic characters vanish in bilayer TiTe2 with a non-CDW state. Theoretical calculations approve that chirality comes from a helical stacking of the triple-q CDW components and therefore can persist at the two-dimensional limit. Furthermore, the chirality renders the Te-5p bands an unconventional orbital texture that prohibits electron backscattering. Our study establishes TiTe2 as a promising playground for manipulating the chiral ground states at the monolayer limit and provides a novel path to engineer electronic properties from an orbital degree. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 21 pages, 5 figures

Journal ref: Nano Letters (2023)

arXiv:2310.20236 [pdf, other]

Dynamically Updating Event Representations for Temporal Relation Classification with Multi-category Learning

Authors: Fei Cheng, Masayuki Asahara, Ichiro Kobayashi, Sadao Kurohashi

Abstract: Temporal relation classification is a pair-wise task for identifying the relation of a temporal link (TLINK) between two mentions, i.e. event, time, and document creation time (DCT). It leads to two crucial limits: 1) Two TLINKs involving a common mention do not share information. 2) Existing models with independent classifiers for each TLINK category (E2E, E2T, and E2D) hinder from using the whol… ▽ More Temporal relation classification is a pair-wise task for identifying the relation of a temporal link (TLINK) between two mentions, i.e. event, time, and document creation time (DCT). It leads to two crucial limits: 1) Two TLINKs involving a common mention do not share information. 2) Existing models with independent classifiers for each TLINK category (E2E, E2T, and E2D) hinder from using the whole data. This paper presents an event centric model that allows to manage dynamic event representations across multiple TLINKs. Our model deals with three TLINK categories with multi-task learning to leverage the full size of data. The experimental results show that our proposal outperforms state-of-the-art models and two transfer learning baselines on both the English and Japanese data. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: EMNLP 2020 Findings

arXiv:2310.12182 [pdf, other]

Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

Authors: Xueying Wu, Edward Hanson, Nansu Wang, Qilin Zheng, Xiaoxuan Yang, Huanrui Yang, Shiyu Li, Feng Cheng, Partha Pratim Pande, Janardhan Rao Doppa, Krishnendu Chakrabarty, Hai Li

Abstract: Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures have demonstrated great potential to accelerate Deep Neural Network (DNN) training/inference. However, the computational accuracy of analog PIM is compromised due to the non-idealities, such as the conductance variation of ReRAM cells. The impact of these non-idealities worsens as the number of concurrently activ… ▽ More Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures have demonstrated great potential to accelerate Deep Neural Network (DNN) training/inference. However, the computational accuracy of analog PIM is compromised due to the non-idealities, such as the conductance variation of ReRAM cells. The impact of these non-idealities worsens as the number of concurrently activated wordlines and bitlines increases. To guarantee computational accuracy, only a limited number of wordlines and bitlines of the crossbar array can be turned on concurrently, significantly reducing the achievable parallelism of the architecture. While the constraints on parallelism limit the efficiency of the accelerators, they also provide a new opportunity for fine-grained mixed-precision quantization. To enable efficient DNN inference on practical ReRAM-based accelerators, we propose an algorithm-architecture co-design framework called \underline{B}lock-\underline{W}ise mixed-precision \underline{Q}uantization (BWQ). At the algorithm level, BWQ-A introduces a mixed-precision quantization scheme at the block level, which achieves a high weight and activation compression ratio with negligible accuracy degradation. We also present the hardware architecture design BWQ-H, which leverages the low-bit-width models achieved by BWQ-A to perform high-efficiency DNN inference on ReRAM devices. BWQ-H also adopts a novel precision-aware weight mapping method to increase the ReRAM crossbar's throughput. Our evaluation demonstrates the effectiveness of BWQ, which achieves a 6.08x speedup and a 17.47x energy saving on average compared to existing ReRAM-based architectures. △ Less

Submitted 27 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 12 pages, 13 figures

arXiv:2310.09426 [pdf, other]

Offline Reinforcement Learning for Optimizing Production Bidding Policies

Authors: Dmytro Korenkevych, Frank Cheng, Artsiom Balakir, Alex Nikulkov, Lingnan Gao, Zhihao Cen, Zuobing Xu, Zheqing Zhu

Abstract: The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the plat… ▽ More The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the platform but use advertiser funds to operate, there is a strong practical need to balance reliability and explainability of the agent with optimizing power. We propose a generalizable approach to optimizing bidding policies in production environments by learning from real data using offline reinforcement learning. This approach can be used to optimize any differentiable base policy (practically, a heuristic policy based on principles which the advertiser can easily understand), and only requires data generated by the base policy itself. We use a hybrid agent architecture that combines arbitrary base policies with deep neural networks, where only the optimized base policy parameters are eventually deployed, and the neural network part is discarded after training. We demonstrate that such an architecture achieves statistically significant performance gains in both simulated and at-scale production bidding environments. Our approach does not incur additional infrastructure, safety, or explainability costs, as it directly optimizes parameters of existing production routines without replacing them with black box-style models like neural networks. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.09393 [pdf]

All-dielectric hybrid VIS-NIR dual-function metasurface

Authors: Pei Xiong, Daniel K. Nikolov, Fei Cheng, Jannick P. Rolland, A. N. Vamivakas

Abstract: Metasurfaces are a promising technology that can serve as a compact alternative to conventional optics while providing multiple functions depending on the properties of the incident light, such as the wavelength, polarization, and incident angle. Here, we demonstrate a hybrid VIS-NIR dielectric metasurface that can reflect 940 nm light into a specified direction while transmitting visible light (4… ▽ More Metasurfaces are a promising technology that can serve as a compact alternative to conventional optics while providing multiple functions depending on the properties of the incident light, such as the wavelength, polarization, and incident angle. Here, we demonstrate a hybrid VIS-NIR dielectric metasurface that can reflect 940 nm light into a specified direction while transmitting visible light (450-750 nm). The dual functionality is realized by combining an aperiodic distributed Bragg reflector with dielectric meta-tokens. Experimental demonstration is also reported, showing an anomalous reflection of near-infrared (NIR) light within a 20o full field-of-view (FOV) and the transmission of wavelengths from 450 nm to 750 nm. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 10 pages, 8 figures

arXiv:2310.03328 [pdf, other]

Reformulating Domain Adaptation of Large Language Models as Adapt-Retrieve-Revise

Authors: Zhen wan, Yating Zhang, Yexiang Wang, Fei Cheng, Sadao Kurohashi

Abstract: While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capabilities in general domain tasks, they often generate content with hallucinations in specific domains such as Chinese law, hindering their application in these areas. This is typically due to the absence of training data that encompasses such a specific domain, preventing GPT-4 from acquiring in-doma… ▽ More While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capabilities in general domain tasks, they often generate content with hallucinations in specific domains such as Chinese law, hindering their application in these areas. This is typically due to the absence of training data that encompasses such a specific domain, preventing GPT-4 from acquiring in-domain knowledge. A pressing challenge is that it's not plausible to continue training LLMs of such scale on in-domain data. This paper introduces a simple and effective domain adaptation framework for GPT-4 by reformulating generation as an \textbf{adapt-retrieve-revise} process. The initial step is to \textbf{adapt} an affordable 7B LLM to the target domain by continuing learning on in-domain data. When solving a task, we leverage the adapted LLM to generate a draft answer given a task query. Then, the draft answer will be used to \textbf{retrieve} supporting evidence candidates from an external in-domain knowledge base. Finally, the draft answer and retrieved evidence are concatenated into a whole prompt to let GPT-4 assess the evidence and \textbf{revise} the draft answer to generate the final answer. Our proposal combines the advantages of the efficiency of adapting a smaller 7B model with the evidence-assessing capability of GPT-4 and effectively prevents GPT-4 from generating hallucinatory content. In the zero-shot setting of four Chinese legal tasks, our method improves accuracy by 33.3\% compared to the direct generation by GPT-4. When compared to two stronger retrieval-based baselines, our method outperforms them by 15.4\% and 23.9\%. Our code will be released △ Less

Submitted 12 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Under submission to ICLR 2024

arXiv:2309.10091 [pdf, other]

Unified Coarse-to-Fine Alignment for Video-Text Retrieval

Authors: Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal

Abstract: The canonical approach to video-text retrieval leverages a coarse-grained or fine-grained alignment between visual and textual information. However, retrieving the correct video according to the text query is often challenging as it requires the ability to reason about both high-level (scene) and low-level (object) visual clues and how they relate to the text query. To this end, we propose a Unifi… ▽ More The canonical approach to video-text retrieval leverages a coarse-grained or fine-grained alignment between visual and textual information. However, retrieving the correct video according to the text query is often challenging as it requires the ability to reason about both high-level (scene) and low-level (object) visual clues and how they relate to the text query. To this end, we propose a Unified Coarse-to-fine Alignment model, dubbed UCoFiA. Specifically, our model captures the cross-modal similarity information at different granularity levels. To alleviate the effect of irrelevant visual clues, we also apply an Interactive Similarity Aggregation module (ISA) to consider the importance of different visual features while aggregating the cross-modal similarity to obtain a similarity score for each granularity. Finally, we apply the Sinkhorn-Knopp algorithm to normalize the similarities of each level before summing them, alleviating over- and under-representation issues at different levels. By jointly considering the crossmodal similarity of different granularity, UCoFiA allows the effective unification of multi-grained alignments. Empirically, UCoFiA outperforms previous state-of-the-art CLIP-based methods on multiple video-text retrieval benchmarks, achieving 2.4%, 1.4% and 1.3% improvements in text-to-video retrieval R@1 on MSR-VTT, Activity-Net, and DiDeMo, respectively. Our code is publicly available at https://github.com/Ziyang412/UCoFiA. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: ICCV 2023

arXiv:2309.04212 [pdf, other]

doi 10.1103/PhysRevLett.131.108301

Extreme Spontaneous Deformations of Active Crystals

Authors: Xia-qing Shi, Fu Cheng, Hugues Chaté

Abstract: We demonstrate that two-dimensional crystals made of active particles can experience extremely large spontaneous deformations without melting. Using particles mostly interacting via pairwise repulsive forces, we show that such active crystals maintain long-range bond order and algebraically-decaying positional order, but with an exponent $η$ not limited by the $\tfrac{1}{3}$ bound given by the (eq… ▽ More We demonstrate that two-dimensional crystals made of active particles can experience extremely large spontaneous deformations without melting. Using particles mostly interacting via pairwise repulsive forces, we show that such active crystals maintain long-range bond order and algebraically-decaying positional order, but with an exponent $η$ not limited by the $\tfrac{1}{3}$ bound given by the (equilibrium) KTHNY theory. We rationalize our findings using linear elastic theory and show the existence of two well-defined effective temperatures quantifying respectively large-scale deformations and bond-order fluctuations. The root of these phenomena lies in the sole time-persistence of the intrinsic axes of particles, and they should thus be observed in many different situations. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 6 pages, 4 figures

MSC Class: 82D03

Journal ref: Phys. Rev. Lett. 131, 108301, Published 5 September 2023

arXiv:2308.11101 [pdf]

Discovery of smectic charge and pair-density-wave orders in topological monolayer 1T$^\prime$-MoTe$_2$

Authors: Li-Xuan Wei, Peng-Cheng Xiao, Fangsen Li, Li Wang, Bo-Yuan Deng, Fang-Jun Cheng, Fa-Wei Zheng, Ning Hao, Ping Zhang, Xu-Cun Ma, Qi-Kun Xue, Can-Li Song

Abstract: Electronic liquid-crystal phases are observed in numerous strongly-correlated systems including high-temperature superconductors. However, identifying these exotic phases and understanding their interplay with superconductivity in topological materials remain challenging. Here we employ a cryogenic scanning tunneling microscopy to discover a smectic (stripe) charge order (CO) and a primary pair-de… ▽ More Electronic liquid-crystal phases are observed in numerous strongly-correlated systems including high-temperature superconductors. However, identifying these exotic phases and understanding their interplay with superconductivity in topological materials remain challenging. Here we employ a cryogenic scanning tunneling microscopy to discover a smectic (stripe) charge order (CO) and a primary pair-density-wave (PDW) in topological monolayer 1T$^\prime$-MoTe$_2$. The two orders are spatially modulated unidirectionally at the same wavevector, but have a marked spatial phase difference of about 2$π$/5. Importantly, the primary PDW state features a two-gap superconductivity below the transition temperature of 6.0 K and induces another unique particle-hole-symmetric CO at twice the PDW wavevector. Combining these results and our density functional calculations, we reveal that the two smectic orders are primarily driven by nesting behaviors between electron and hole pockets. Our findings establish monolayer 1T$^\prime$-MoTe$_2$ as a topological paradigm for exploring electronic smecticity, which intertwines with multiple preexisting symmetry-breaking states. △ Less

Submitted 5 April, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: 16 pages, 4 figures, Supplementary materials

arXiv:2307.16777 [pdf, other]

The Perturbed Full Two-Body Problem: Application to Post-DART Didymos

Authors: Alex J. Meyer, Harrison F. Agrusa, Derek C. Richardson, R. Terik Daly, Oscar Fuentes-Muñoz, Masatoshi Hirabayashi, Patrick Michel, Colby C. Merrill, Ryota Nakano, Andrew F. Cheng, Brent Barbee, Olivier S. Barnouin, Steven R. Chesley, Carolyn M. Ernst, Ioannis Gkolias, Nicholas A. Moskovitz, Shantanu P. Naidu, Petr Pravec, Petr Scheirich, Cristina A. Thomas, Kleomenis Tsiganis, Daniel J. Scheeres

Abstract: With the successful impact of the NASA DART spacecraft in the Didymos-Dimorphos binary asteroid system, we provide an initial analysis of the post-impact perturbed binary asteroid dynamics. To compare our simulation results with observations, we introduce a set of "observable elements" calculated using only the physical separation of the binary asteroid, rather than traditional Keplerian elements.… ▽ More With the successful impact of the NASA DART spacecraft in the Didymos-Dimorphos binary asteroid system, we provide an initial analysis of the post-impact perturbed binary asteroid dynamics. To compare our simulation results with observations, we introduce a set of "observable elements" calculated using only the physical separation of the binary asteroid, rather than traditional Keplerian elements. Using numerical methods that treat the fully spin-orbit-coupled dynamics, we estimate the system's mass and the impact-induced changes in orbital velocity, semimajor axis, and eccentricity. We find that the changes to the mutual orbit depend strongly on the separation distance between Didymos and Dimorphos at the time of impact. If Dimorphos enters a tumbling state after the impact, this may be observable through changes in the system's eccentricity and orbit period. We also find that any DART-induced reshaping of Dimorphos would generally reduce the required change in orbital velocity to achieve the measured post-impact orbit period and will be assessed by the ESA Hera mission in 2027. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted for publication in PSJ

arXiv:2307.12199 [pdf, other]

Leveraging Historical Medical Records as a Proxy via Multimodal Modeling and Visualization to Enrich Medical Diagnostic Learning

Authors: Yang Ouyang, Yuchen Wu, He Wang, Chenyang Zhang, Furui Cheng, Chang Jiang, Lixia Jin, Yuanwu Cao, Quan Li

Abstract: Simulation-based Medical Education (SBME) has been developed as a cost-effective means of enhancing the diagnostic skills of novice physicians and interns, thereby mitigating the need for resource-intensive mentor-apprentice training. However, feedback provided in most SBME is often directed towards improving the operational proficiency of learners, rather than providing summative medical diagnose… ▽ More Simulation-based Medical Education (SBME) has been developed as a cost-effective means of enhancing the diagnostic skills of novice physicians and interns, thereby mitigating the need for resource-intensive mentor-apprentice training. However, feedback provided in most SBME is often directed towards improving the operational proficiency of learners, rather than providing summative medical diagnoses that result from experience and time. Additionally, the multimodal nature of medical data during diagnosis poses significant challenges for interns and novice physicians, including the tendency to overlook or over-rely on data from certain modalities, and difficulties in comprehending potential associations between modalities. To address these challenges, we present DiagnosisAssistant, a visual analytics system that leverages historical medical records as a proxy for multimodal modeling and visualization to enhance the learning experience of interns and novice physicians. The system employs elaborately designed visualizations to explore different modality data, offer diagnostic interpretive hints based on the constructed model, and enable comparative analyses of specific patients. Our approach is validated through two case studies and expert interviews, demonstrating its effectiveness in enhancing medical training. △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE VIS 2023

arXiv:2306.11251 [pdf, other]

Eliminating Lipschitz Singularities in Diffusion Models

Authors: Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng

Abstract: Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models. However, the rationality of the diffusion process itself receives limited attention, leaving the question of whether the problem is well-posed and well-conditioned. In this paper, we uncover a vexing propensity of diffusion models: they frequen… ▽ More Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models. However, the rationality of the diffusion process itself receives limited attention, leaving the question of whether the problem is well-posed and well-conditioned. In this paper, we uncover a vexing propensity of diffusion models: they frequently exhibit the infinite Lipschitz near the zero point of timesteps. This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations. We provide a comprehensive evaluation of the issue from both theoretical and empirical perspectives. To address this challenge, we propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz singularity of the diffusion model near zero. Remarkably, our technique yields a substantial improvement in performance, e.g., on the high-resolution FFHQ dataset ($256\times256$). Moreover, as a byproduct of our method, we manage to achieve a dramatic reduction in the Frechet Inception Distance of other acceleration methods relying on network Lipschitz, including DDIM and DPM-Solver, by over 33$\%$. We conduct extensive experiments on diverse datasets to validate our theory and method. Our work not only advances the understanding of the general diffusion process, but also provides insights for the design of diffusion models. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.09719 [pdf, other]

Pushing the Limits of ChatGPT on NLP Tasks

Authors: Xiaofei Sun, Linfeng Dong, Xiaoya Li, Zhen Wan, Shuhe Wang, Tianwei Zhang, Jiwei Li, Fei Cheng, Lingjuan Lyu, Fei Wu, Guoyin Wang

Abstract: Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines. In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors: (1) token limit in the prompt does not allow for the full utilization of the supervised datasets; (2) mismatch between the generation nature of ChatGPT and NLP tasks… ▽ More Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines. In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors: (1) token limit in the prompt does not allow for the full utilization of the supervised datasets; (2) mismatch between the generation nature of ChatGPT and NLP tasks; (3) intrinsic pitfalls of LLMs models, e.g., hallucination, overly focus on certain keywords, etc. In this work, we propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks. Our proposed modules include (1) a one-input-multiple-prompts strategy that employs multiple prompts for one input to accommodate more demonstrations; (2) using fine-tuned models for better demonstration retrieval; (3) transforming tasks to formats that are more tailored to the generation nature; (4) employing reasoning strategies that are tailored to addressing the task-specific complexity; (5) the self-verification strategy to address the hallucination issue of LLMs; (6) the paraphrase strategy to improve the robustness of model predictions. We conduct experiments on 21 datasets of 10 representative NLP tasks, including question answering, commonsense reasoning, natural language inference, sentiment analysis, named entity recognition, entity-relation extraction, event extraction, dependency parsing, semantic role labeling, and part-of-speech tagging. Using the proposed assemble of techniques, we are able to significantly boost the performance of ChatGPT on the selected NLP tasks, achieving performances comparable to or better than supervised baselines, or even existing SOTA performances. △ Less

Submitted 9 October, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.08308 [pdf]

New vision of convection induced freckle formation theory in Nickel-based superalloys by electron microscopy

Authors: Shuai Wang, Yuliang Jia, Yongzhe Wang, Yongjia Zhang, Lan Ma, Feng Cheng, Yi Zeng, Xu Shen, Yingliu Du, Binghui Ge

Abstract: Freckles, one of the common defects in blades used in heavy duty gas turbines, hugely deteriorates blades mechanical properties and liability under service conditions. Thermal-solutal convection theory is a widely adopted formation mechanism but few solid experimental evidences were reported. Here for the first time we systematically studied the microstructure of 117 grains in freckle chains from… ▽ More Freckles, one of the common defects in blades used in heavy duty gas turbines, hugely deteriorates blades mechanical properties and liability under service conditions. Thermal-solutal convection theory is a widely adopted formation mechanism but few solid experimental evidences were reported. Here for the first time we systematically studied the microstructure of 117 grains in freckle chains from four different Nickel-based superalloys of either single crystal or directionally solidified alloys. The relationship between the internal stress and the misorientation throughout the freckle chains is studied by means of state-of-the-art electron microscopy. All results give new experimental proof to the theory of thermal-solutal convection, which is further supported by the fact that borides at the boundary are randomly orientated to alloys. Our results enrich the methodology of freckle study, providing a new sight of the formation mechanism of casting defects. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2305.16896 [pdf, other]

MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting

Authors: Tatsuro Inaba, Hirokazu Kiyomaru, Fei Cheng, Sadao Kurohashi

Abstract: Large language models (LLMs) have achieved impressive performance on various reasoning tasks. To further improve the performance, we propose MultiTool-CoT, a novel framework that leverages chain-of-thought (CoT) prompting to incorporate multiple external tools, such as a calculator and a knowledge retriever, during the reasoning process. We apply MultiTool-CoT to the Task 2 dataset of NumGLUE, whi… ▽ More Large language models (LLMs) have achieved impressive performance on various reasoning tasks. To further improve the performance, we propose MultiTool-CoT, a novel framework that leverages chain-of-thought (CoT) prompting to incorporate multiple external tools, such as a calculator and a knowledge retriever, during the reasoning process. We apply MultiTool-CoT to the Task 2 dataset of NumGLUE, which requires both numerical reasoning and domain-specific knowledge. The experiments show that our method significantly outperforms strong baselines and achieves state-of-the-art performance. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: ACL2023. Our code is available at https://github.com/InabaTatsuro/MultiTool-CoT

arXiv:2305.07475 [pdf, other]

Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

Authors: Qianying Liu, Dongsheng Yang, Wenjie Zhong, Fei Cheng, Sadao Kurohashi

Abstract: Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges and has numerous potential applications. Noise and irrelevant variables in the model input have been a hindrance to its performance. Additionally, coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process… ▽ More Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges and has numerous potential applications. Noise and irrelevant variables in the model input have been a hindrance to its performance. Additionally, coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process. In this paper, we propose three pretraining tasks that operate at both the whole program and sub-program level: Variable Integrity Ranking, which guides the model to focus on useful variables; Variable Operator Prediction, which decomposes the supervision into fine-grained single operator prediction; and Variable Keyphrase Masking, which encourages the model to identify key evidence that sub-programs are derived from. Experimental results demonstrate the effectiveness of our proposed methods, surpassing transformer-based model baselines. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 11 pages

arXiv:2305.02105 [pdf, other]

GPT-RE: In-context Learning for Relation Extraction using Large Language Models

Authors: Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi

Abstract: In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE). This is due to the two major shortcomings of LLMs in RE: (1) low relevance regarding entity and relation in retrieved demonstrations for in-context learning; and (2) the… ▽ More In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE). This is due to the two major shortcomings of LLMs in RE: (1) low relevance regarding entity and relation in retrieved demonstrations for in-context learning; and (2) the strong inclination to wrongly classify NULL examples into other pre-defined labels. In this paper, we propose GPT-RE to bridge the gap between LLMs and fully-supervised baselines. GPT-RE successfully addresses the aforementioned issues by (1) incorporating task-specific entity representations in demonstration retrieval; and (2) enriching the demonstrations with gold label-induced reasoning logic. We evaluate GPT-RE on four widely-used RE datasets, and observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines. Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets. △ Less

Submitted 8 December, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: Accepted by EMNLP 2023 Main Conference (long paper)

arXiv:2305.00946 [pdf]

doi 10.1021/acs.est.3c03063

Inflation Reduction Act impacts on the economics of clean hydrogen and liquid fuels

Authors: Fangwei Cheng, Hongxi Luo, Jesse D. Jenkins, Eric D. Larson

Abstract: The Inflation Reduction Act (IRA) in the United States provides unprecedented incentives for deploying low-carbon hydrogen and liquid fuels, among other low greenhouse gas (GHG) emissions technologies. To better understand the prospective competitiveness of low-carbon or negative-carbon hydrogen and liquid fuels under the IRA in the early 2030s, we examine the impacts of IRA provisions on costs of… ▽ More The Inflation Reduction Act (IRA) in the United States provides unprecedented incentives for deploying low-carbon hydrogen and liquid fuels, among other low greenhouse gas (GHG) emissions technologies. To better understand the prospective competitiveness of low-carbon or negative-carbon hydrogen and liquid fuels under the IRA in the early 2030s, we examine the impacts of IRA provisions on costs of producing hydrogen and synthetic liquid fuel made from natural gas, electricity, short-cycle biomass (agricultural residues), and corn-ethanol. With IRA credits (45V or 45Q), but excluding incentives provided by other national or state policies, hydrogen produced by electrolysis using carbon-free electricity (green H2) and natural gas reforming with carbon capture and storage (CCS) (blue H2) are cost-competitive with the carbon-intensive benchmark gray H2 from steam methane reforming. Biomass-derived H2 with or without CCS is not cost-completive under current IRA provisions. However, if IRA allowed biomass gasification with CCS to claim a 45V credit for carbon-neutral H2 and a 45Q credit for negative biogenic-CO2 emissions, this pathway would be less costly than gray H2. The IRA credit for clean fuels (45Z), currently stipulated to end in 2027, would need to be extended, or similar policy support provided by other national or state policies, for clean synthetic liquid fuel to be cost-competitive with petroleum-derived liquid fuels. Levelized IRA subsidies per unit of CO2 mitigated for all hydrogen and synthetic liquid fuel production pathways, except electricity-derived synthetic liquid fuel, range from 65 to 384 $/t CO2, which is within or below the range in U.S. federal government estimates of the Social Cost of Carbon (SCC) in the 2030 to 2040 timeframe. △ Less

Submitted 14 August, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

arXiv:2304.13914 [pdf]

doi 10.1103/PhysRevLett.130.216004

Evidence for Band Renormalizations in Strong-coupling Superconducting Alkali-fulleride Films

Authors: J. S. Zhou, R. Z. Xu, X. Q. Yu, F. J. Cheng, W. X. Zhao, X. Du, S. Z. Wang, Q. Q. Zhang, X. Gu, S. M. He, Y. D. Li, M. Q. Ren, X. C. Ma, Q. K. Xue, Y. L. Chen, C. L. Song, L. X. Yang

Abstract: There has been a long-standing debate about the mechanism of the unusual superconductivity in alkali-intercalated fulleride superconductors. In this work, using high-resolution angle-resolved photoemission spectroscopy, we systematically investigate the electronic structures of superconducting K3C60 thin films. We observe a dispersive energy band crossing the Fermi level with an occupied bandwidth… ▽ More There has been a long-standing debate about the mechanism of the unusual superconductivity in alkali-intercalated fulleride superconductors. In this work, using high-resolution angle-resolved photoemission spectroscopy, we systematically investigate the electronic structures of superconducting K3C60 thin films. We observe a dispersive energy band crossing the Fermi level with an occupied bandwidth of about 130 meV. The measured band structure shows prominent quasiparticle kinks and a replica band involving high-energy Jahn-Teller active Hg(8) phonon mode, reflecting strong electron-phonon coupling in the system. The electron-phonon coupling constant is estimated to be about 1.2, which dominates the quasiparticle mass renormalization. Moreover, we observe an isotropic nodeless superconducting gap beyond the mean-field estimation. Both the large electron-phonon coupling constant and large reduced superconducting gap suggest a strong-coupling superconductivity in K3C60, while the electronic correlation effect is indicated by the observation of a waterfall-like band dispersion and the small bandwidth compared with the effective Coulomb interaction. Our results not only directly visualize the crucial band structure of superconducting fulleride but also provide important insights into the mechanism of the unusual superconductivity. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: Accepted by Phys. Rev. Lett

arXiv:2304.03928 [pdf]

doi 10.1039/D3NR02322B

Interpretable machine learning-accelerated seed treatment by nanomaterials for environmental stress alleviation

Authors: Hengjie Yu, Dan Luo, Sam F. Y. Li, Maozhen Qu, Da Liu, Yingchao He, Fang Cheng

Abstract: Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI)… ▽ More Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI) by 13.9% and 12.6% under salinity stress and combined heat-drought stress, respectively. Metabolomics data reveals that ZnO nanopriming treatment, with the highest SRI value, mainly regulates the pathways of amino acid metabolism, secondary metabolite synthesis, carbohydrate metabolism, and translation. Understanding the mechanism of seed nanopriming is still difficult due to the variety of nanomaterials and the complexity of interactions between nanomaterials and plants. Using the nanopriming data, we present an interpretable structure-activity relationship (ISAR) approach based on interpretable machine learning for predicting and understanding its stress mitigation effects. The post hoc and model-based interpretation approaches of machine learning are combined to provide complementary benefits and give researchers or policymakers more illuminating or trustworthy results. The concentration, size, and zeta potential of nanoparticles are identified as dominant factors for correlating root dry weight under salinity stress, and their effects and interactions are explained. Additionally, a web-based interactive tool is developed for offering prediction-level interpretation and gathering more details about specific nanopriming treatments. This work offers a promising framework for accelerating the agricultural applications of nanomaterials and may profoundly contribute to nanosafety assessment. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: 30 pages, 6 figures

arXiv:2303.10318 [pdf, other]

Crowd Counting with Online Knowledge Learning

Authors: Shengqin Jiang, Bowen Li, Fengna Cheng, Qingshan Liu

Abstract: Efficient crowd counting models are urgently required for the applications in scenarios with limited computing resources, such as edge computing and mobile devices. A straightforward method to achieve this is knowledge distillation (KD), which involves using a trained teacher network to guide the training of a student network. However, this traditional two-phase training method can be time-consumi… ▽ More Efficient crowd counting models are urgently required for the applications in scenarios with limited computing resources, such as edge computing and mobile devices. A straightforward method to achieve this is knowledge distillation (KD), which involves using a trained teacher network to guide the training of a student network. However, this traditional two-phase training method can be time-consuming, particularly for large datasets, and it is also challenging for the student network to mimic the learning process of the teacher network. To overcome these challenges, we propose an online knowledge learning method for crowd counting. Our method builds an end-to-end training framework that integrates two independent networks into a single architecture, which consists of a shared shallow module, a teacher branch, and a student branch. This approach is more efficient than the two-stage training technique of traditional KD. Moreover, we propose a feature relation distillation method which allows the student branch to more effectively comprehend the evolution of inter-layer features by constructing a new inter-layer relationship matrix. It is combined with response distillation and feature internal distillation to enhance the transfer of mutually complementary information from the teacher branch to the student branch. Extensive experiments on four challenging crowd counting datasets demonstrate the effectiveness of our method which achieves comparable performance to state-of-the-art methods despite using far fewer parameters. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: under review

arXiv:2303.07016 [pdf, other]

doi 10.1145/3544548.3581468

HOOV: Hand Out-Of-View Tracking for Proprioceptive Interaction using Inertial Sensing

Authors: Paul Streli, Rayan Armani, Yi Fei Cheng, Christian Holz

Abstract: Current Virtual Reality systems are designed for interaction under visual control. Using built-in cameras, headsets track the user's hands or hand-held controllers while they are inside the field of view. Current systems thus ignore the user's interaction with off-screen content -- virtual objects that the user could quickly access through proprioception without requiring laborious head motions to… ▽ More Current Virtual Reality systems are designed for interaction under visual control. Using built-in cameras, headsets track the user's hands or hand-held controllers while they are inside the field of view. Current systems thus ignore the user's interaction with off-screen content -- virtual objects that the user could quickly access through proprioception without requiring laborious head motions to bring them into focus. In this paper, we present HOOV, a wrist-worn sensing method that allows VR users to interact with objects outside their field of view. Based on the signals of a single wrist-worn inertial sensor, HOOV continuously estimates the user's hand position in 3-space to complement the headset's tracking as the hands leave the tracking range. Our novel data-driven method predicts hand positions and trajectories from just the continuous estimation of hand orientation, which by itself is stable based solely on inertial observations. Our inertial sensing simultaneously detects finger pinching to register off-screen selection events, confirms them using a haptic actuator inside our wrist device, and thus allows users to select, grab, and drop virtual content. We compared HOOV's performance with a camera-based optical motion capture system in two folds. In the first evaluation, participants interacted based on tracking information from the motion capture system to assess the accuracy of their proprioceptive input, whereas in the second, they interacted based on HOOV's real-time estimations. We found that HOOV's target-agnostic estimations had a mean tracking error of 7.7 cm, which allowed participants to reliably access virtual objects around their body without first bringing them into focus. We demonstrate several applications that leverage the larger input space HOOV opens up for quick proprioceptive interaction, and conclude by discussing the potential of our technique. △ Less

Submitted 30 April, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted at 2023 CHI Conference on Human Factors in Computing Systems

ACM Class: I.2; I.5; H.5

arXiv:2303.03464 [pdf]

doi 10.1038/s41586-023-05878-z

Momentum Transfer from the DART Mission Kinetic Impact on Asteroid Dimorphos

Authors: Andrew F. Cheng, Harrison F. Agrusa, Brent W. Barbee, Alex J. Meyer, Tony L. Farnham, Sabina D. Raducan, Derek C. Richardson, Elisabetta Dotto, Angelo Zinzi, Vincenzo Della Corte, Thomas S. Statler, Steven Chesley, Shantanu P. Naidu, Masatoshi Hirabayashi, Jian-Yang Li, Siegfried Eggl, Olivier S. Barnouin, Nancy L. Chabot, Sidney Chocron, Gareth S. Collins, R. Terik Daly, Thomas M. Davison, Mallory E. DeCoster, Carolyn M. Ernst, Fabio Ferrari , et al. (44 additional authors not shown)

Abstract: The NASA Double Asteroid Redirection Test (DART) mission performed a kinetic impact on asteroid Dimorphos, the satellite of the binary asteroid (65803) Didymos, at 23:14 UTC on September 26, 2022 as a planetary defense test. DART was the first hypervelocity impact experiment on an asteroid at size and velocity scales relevant to planetary defense, intended to validate kinetic impact as a means of… ▽ More The NASA Double Asteroid Redirection Test (DART) mission performed a kinetic impact on asteroid Dimorphos, the satellite of the binary asteroid (65803) Didymos, at 23:14 UTC on September 26, 2022 as a planetary defense test. DART was the first hypervelocity impact experiment on an asteroid at size and velocity scales relevant to planetary defense, intended to validate kinetic impact as a means of asteroid deflection. Here we report the first determination of the momentum transferred to an asteroid by kinetic impact. Based on the change in the binary orbit period, we find an instantaneous reduction in Dimorphos's along-track orbital velocity component of 2.70 +/- 0.10 mm/s, indicating enhanced momentum transfer due to recoil from ejecta streams produced by the impact. For a Dimorphos bulk density range of 1,500 to 3,300 kg/m$^3$, we find that the expected value of the momentum enhancement factor, $β$, ranges between 2.2 and 4.9, depending on the mass of Dimorphos. If Dimorphos and Didymos are assumed to have equal densities of 2,400 kg/m$^3$, $β$= 3.61 +0.19/-0.25 (1 $σ$). These $β$ values indicate that significantly more momentum was transferred to Dimorphos from the escaping impact ejecta than was incident with DART. Therefore, the DART kinetic impact was highly effective in deflecting the asteroid Dimorphos. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: accepted by Nature

Showing 1–50 of 230 results for author: Cheng, F