Search | arXiv e-print repository

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Authors: Neel Jain, Khalid Saifullah, Yuxin Wen, John Kirchenbauer, Manli Shu, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

Abstract: With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative. For example, a company deploying a client-facing chatbot must ensure that the model will not respond to client requests with profanity. Current evaluations approach this problem using small, domain-specific datasets with human-curated… ▽ More With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative. For example, a company deploying a client-facing chatbot must ensure that the model will not respond to client requests with profanity. Current evaluations approach this problem using small, domain-specific datasets with human-curated labels. These evaluation sets are often sampled from a narrow and simplified distribution, and data sources can unknowingly be leaked into the training set which can lead to misleading evaluations. To bypass these drawbacks, we propose a framework for self-supervised evaluation of LLMs by analyzing their sensitivity or invariance to transformations on the input text. Self-supervised evaluation can directly monitor LLM behavior on datasets collected in the wild or streamed during live model deployment. We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence, in addition to sensitivity to grammatical structure and tokenization errors. When comparisons to similar human-labeled benchmarks are available, we find strong correlations between self-supervised and human-supervised evaluations. The self-supervised paradigm complements current evaluation strategies that rely on labeled data. △ Less

Submitted 29 June, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

Comments: Code is available at https://github.com/neelsjain/BYOD. First two authors contributed equally. 21 pages, 22 figures

arXiv:2306.12457 [pdf, other]

Deep Dynamic Epidemiological Modelling for COVID-19 Forecasting in Multi-level Districts

Authors: Ruhan Liu, Jiajia Li, Yang Wen, Huating Li, Ping Zhang, Bin Sheng, David Dagan Feng

Abstract: Objective: COVID-19 has spread worldwide and made a huge influence across the world. Modeling the infectious spread situation of COVID-19 is essential to understand the current condition and to formulate intervention measurements. Epidemiological equations based on the SEIR model simulate disease development. The traditional parameter estimation method to solve SEIR equations could not precisely f… ▽ More Objective: COVID-19 has spread worldwide and made a huge influence across the world. Modeling the infectious spread situation of COVID-19 is essential to understand the current condition and to formulate intervention measurements. Epidemiological equations based on the SEIR model simulate disease development. The traditional parameter estimation method to solve SEIR equations could not precisely fit real-world data due to different situations, such as social distancing policies and intervention strategies. Additionally, learning-based models achieve outstanding fitting performance, but cannot visualize mechanisms. Methods: Thus, we propose a deep dynamic epidemiological (DDE) method that combines epidemiological equations and deep-learning advantages to obtain high accuracy and visualization. The DDE contains deep networks to fit the effect function to simulate the ever-changing situations based on the neural ODE method in solving variants' equations, ensuring the fitting performance of multi-level areas. Results: We introduce four SEIR variants to fit different situations in different countries and regions. We compare our DDE method with traditional parameter estimation methods (Nelder-Mead, BFGS, Powell, Truncated Newton Conjugate-Gradient, Neural ODE) in fitting the real-world data in the cases of countries (the USA, Columbia, South Africa) and regions (Wuhan in China, Piedmont in Italy). Our DDE method achieves the best Mean Square Error and Pearson coefficient in all five areas. Further, compared with the state-of-art learning-based approaches, the DDE outperforms all techniques, including LSTM, RNN, GRU, Random Forest, Extremely Random Trees, and Decision Tree. Conclusion: DDE presents outstanding predictive ability and visualized display of the changes in infection rates in different regions and countries. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.11335 [pdf, other]

Surfer: Progressive Reasoning with World Models for Robotic Manipulation

Authors: Pengzhen Ren, Kaidong Zhang, Hetao Zheng, Zixuan Li, Yuhang Wen, Fengda Zhu, Mas Ma, Xiaodan Liang

Abstract: Considering how to make the model accurately understand and follow natural language instructions and perform actions consistent with world knowledge is a key challenge in robot manipulation. This mainly includes human fuzzy instruction reasoning and the following of physical knowledge. Therefore, the embodied intelligence agent must have the ability to model world knowledge from training data. How… ▽ More Considering how to make the model accurately understand and follow natural language instructions and perform actions consistent with world knowledge is a key challenge in robot manipulation. This mainly includes human fuzzy instruction reasoning and the following of physical knowledge. Therefore, the embodied intelligence agent must have the ability to model world knowledge from training data. However, most existing vision and language robot manipulation methods mainly operate in less realistic simulator and language settings and lack explicit modeling of world knowledge. To bridge this gap, we introduce a novel and simple robot manipulation framework, called Surfer. It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene. Then, the generalization ability of the model on new instructions and new scenes is enhanced by explicit modeling of the action and scene prediction in multi-modal information. In addition to the framework, we also built a robot manipulation simulator that supports full physics execution based on the MuJoCo physics engine. It can automatically generate demonstration training data and test data, effectively reducing labor costs. To conduct a comprehensive and systematic evaluation of the robot manipulation model in terms of language understanding and physical execution, we also created a robotic manipulation benchmark with progressive reasoning tasks, called SeaWave. It contains 4 levels of progressive reasoning tasks and can provide a standardized testing platform for embedded AI agents in multi-modal environments. On average, Surfer achieved a success rate of 54.74% on the defined four levels of manipulation tasks, exceeding the best baseline performance of 47.64%. △ Less

Submitted 20 March, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.10255 [pdf, other]

doi 10.1029/2022GL102325

The First GECAM Observation Results on Terrestrial Gamma-ray Flashes and Terrestrial Electron Beams

Authors: Y. Zhao, J. C. Liu, S. L. Xiong, W. C. Xue, Q. B. Yi, G. P. Lu, W. Xu, F. C. Lyu, J. C. Sun, W. X. Peng, C. Zheng, Y. Q. Zhang, C. Cai, S. Xiao, S. L. Xie, C. W. Wang, W. J. Tan, Z. H. An, G. Chen, Y. Q. Du, Y. Huang, M. Gao, K. Gong, D. Y. Guo, J. J. He , et al. (37 additional authors not shown)

Abstract: Gravitational-wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) is a space-borne instrument dedicated to monitoring high-energy transients, including Terrestrial Gamma-ray Flashes (TGFs) and Terrestrial Electron Beams (TEBs). We implemented a TGF/TEB search algorithm for GECAM, with which 147 bright TGFs, 2 typical TEBs and 2 special TEB-like events are identified during an effe… ▽ More Gravitational-wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) is a space-borne instrument dedicated to monitoring high-energy transients, including Terrestrial Gamma-ray Flashes (TGFs) and Terrestrial Electron Beams (TEBs). We implemented a TGF/TEB search algorithm for GECAM, with which 147 bright TGFs, 2 typical TEBs and 2 special TEB-like events are identified during an effective observation time of $\sim$9 months. We show that, with gamma-ray and charged particle detectors, GECAM can effectively identify and distinguish TGFs and TEBs, and measure their temporal and spectral properties in detail. A very high TGF-lightning association rate of $\sim$80\% is obtained between GECAM and GLD360 in east Asia region. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: The paper was accepted by Geophysical Research Letters on June 16th, 2023

arXiv:2306.08990 [pdf, other]

doi 10.1145/3610548.3618183

Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Authors: Radek Daněček, Kiran Chhatre, Shashank Tripathi, Yandong Wen, Michael J. Black, Timo Bolkart

Abstract: To be widely adopted, 3D facial avatars must be animated easily, realistically, and directly from speech signals. While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the impact of emotions on facial expressions. Realistic facial animation requires lip-sync together with the natural expression of emotion. To that end, we propose EMOTE… ▽ More To be widely adopted, 3D facial avatars must be animated easily, realistically, and directly from speech signals. While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the impact of emotions on facial expressions. Realistic facial animation requires lip-sync together with the natural expression of emotion. To that end, we propose EMOTE (Expressive Model Optimized for Talking with Emotion), which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion. To achieve this, we supervise EMOTE with decoupled losses for speech (i.e., lip-sync) and emotion. These losses are based on two key observations: (1) deformations of the face due to speech are spatially localized around the mouth and have high temporal frequency, whereas (2) facial expressions may deform the whole face and occur over longer intervals. Thus, we train EMOTE with a per-frame lip-reading loss to preserve the speech-dependent content, while supervising emotion at the sequence level. Furthermore, we employ a content-emotion exchange mechanism in order to supervise different emotions on the same audio, while maintaining the lip motion synchronized with the speech. To employ deep perceptual losses without getting undesirable artifacts, we devise a motion prior in the form of a temporal VAE. Due to the absence of high-quality aligned emotional 3D face datasets with speech, EMOTE is trained with 3D pseudo-ground-truth extracted from an emotional video dataset (i.e., MEAD). Extensive qualitative and perceptual evaluations demonstrate that EMOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data, while offering additional, high-quality emotional control. △ Less

Submitted 26 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: SIGGRAPH Asia 2023 Conference Paper

arXiv:2306.05437 [pdf, other]

One-step Multi-view Clustering with Diverse Representation

Authors: Xinhang Wan, Jiyuan Liu, Xinwang Liu, Siwei Wang, Yi Wen, Tianjiao Wan, Li Shen, En Zhu

Abstract: Multi-view clustering has attracted broad attention due to its capacity to utilize consistent and complementary information among views. Although tremendous progress has been made recently, most existing methods undergo high complexity, preventing them from being applied to large-scale tasks. Multi-view clustering via matrix factorization is a representative to address this issue. However, most of… ▽ More Multi-view clustering has attracted broad attention due to its capacity to utilize consistent and complementary information among views. Although tremendous progress has been made recently, most existing methods undergo high complexity, preventing them from being applied to large-scale tasks. Multi-view clustering via matrix factorization is a representative to address this issue. However, most of them map the data matrices into a fixed dimension, limiting the model's expressiveness. Moreover, a range of methods suffers from a two-step process, i.e., multimodal learning and the subsequent $k$-means, inevitably causing a sub-optimal clustering result. In light of this, we propose a one-step multi-view clustering with diverse representation method, which incorporates multi-view learning and $k$-means into a unified framework. Specifically, we first project original data matrices into various latent spaces to attain comprehensive information and auto-weight them in a self-supervised manner. Then we directly use the information matrices under diverse dimensions to obtain consensus discrete clustering labels. The unified work of representation learning and clustering boosts the quality of the final results. Furthermore, we develop an efficient optimization algorithm with proven convergence to solve the resultant problem. Comprehensive experiments on various datasets demonstrate the promising clustering performance of our proposed method. △ Less

Submitted 27 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.05429 [pdf]

Spin photonics on chip based on a twinning crystal metamaterial

Authors: Yan Li, Jingbo Sun, Yongzheng Wen, Xiaoyu Xiong, Ji Zhou

Abstract: Two-dimensional photonic circuits with high capacity are essential for a wide range of applications in next-generation photonic information technology and optoelectronics. Here we demonstrate a multi-channel spin-dependent photonic device based on a twinning crystal metamaterial. The structural symmetry and material symmetry of the twinning crystal metamaterial enable a total of 4 channels carryin… ▽ More Two-dimensional photonic circuits with high capacity are essential for a wide range of applications in next-generation photonic information technology and optoelectronics. Here we demonstrate a multi-channel spin-dependent photonic device based on a twinning crystal metamaterial. The structural symmetry and material symmetry of the twinning crystal metamaterial enable a total of 4 channels carrying different transverse spins because of the spin-momentum locking. The orientation of the anisotropy controls the propagation direction of each signal, and the rotation of the E-field with respect to energy flow determines the spin characteristics during input/output coupling. Leveraging this mechanism, the spin of an incident beam can be maintained during propagation on-chip and then delivered back into the free space, offering a new scheme for metamaterial-based spin-controlled nano-photonic applications. △ Less

Submitted 19 May, 2023; originally announced June 2023.

arXiv:2306.04634 [pdf, other]

On the Reliability of Watermarks for Large Language Models

Authors: John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

Abstract: As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked… ▽ More As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a 1e-5 false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors. △ Less

Submitted 1 May, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 9 pages in the main body. Published at ICLR 2024. Code is available at https://github.com/jwkirchenbauer/lm-watermarking

arXiv:2306.03424 [pdf, other]

GCD-DDPM: A Generative Change Detection Model Based on Difference-Feature Guided DDPM

Authors: Yihan Wen, Xianping Ma, Xiaokang Zhang, Man-On Pun

Abstract: Deep learning (DL)-based methods have recently shown great promise in bitemporal change detection (CD). Existing discriminative methods based on Convolutional Neural Networks (CNNs) and Transformers rely on discriminative representation learning for change recognition while struggling with exploring local and long-range contextual dependencies. As a result, it is still challenging to obtain fine-g… ▽ More Deep learning (DL)-based methods have recently shown great promise in bitemporal change detection (CD). Existing discriminative methods based on Convolutional Neural Networks (CNNs) and Transformers rely on discriminative representation learning for change recognition while struggling with exploring local and long-range contextual dependencies. As a result, it is still challenging to obtain fine-grained and robust CD maps in diverse ground scenes. To cope with this challenge, this work proposes a generative change detection model called GCD-DDPM to directly generate CD maps by exploiting the Denoising Diffusion Probabilistic Model (DDPM), instead of classifying each pixel into changed or unchanged categories. Furthermore, the Difference Conditional Encoder (DCE), is designed to guide the generation of CD maps by exploiting multi-level difference features. Leveraging the variational inference (VI) procedure, GCD-DDPM can adaptively re-calibrate the CD results through an iterative inference process, while accurately distinguishing subtle and irregular changes in diverse scenes. Finally, a Noise Suppression-based Semantic Enhancer (NSSE) is specifically designed to mitigate noise in the current step's change-aware feature representations from the CD Encoder. This refinement, serving as an attention map, can guide subsequent iterations while enhancing CD accuracy. Extensive experiments on four high-resolution CD datasets confirm the superior performance of the proposed GCD-DDPM. The code for this work will be available at https://github.com/udrs/GCD. △ Less

Submitted 2 March, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.03359 [pdf]

Optical vortices enabled by structural vortices

Authors: Yuanfeng Liu, Le Zhou, Mengfan Guo, Zongqi Xu, Jing Ma, Yongzheng Wen, Natalia M. Litchinitser, Yang Shen, Jingbo Sun, Ji Zhou

Abstract: The structural symmetry of solids plays an important role in defining their linear and nonlinear optical properties. The quest for versatile, cost-effective, large-scale, and defect-free approaches and materials platforms for tailoring structural and optical properties on demand has been underway for decades. We experimentally demonstrate a bottom-up self-assembly-based organic engineered material… ▽ More The structural symmetry of solids plays an important role in defining their linear and nonlinear optical properties. The quest for versatile, cost-effective, large-scale, and defect-free approaches and materials platforms for tailoring structural and optical properties on demand has been underway for decades. We experimentally demonstrate a bottom-up self-assembly-based organic engineered material comprised of synthesized molecules with large dipole moments that are crystallized into a spherulite structure. The molecules align in an azimuthal direction, resulting in a vortex polarity with spontaneously broken symmetry leading to strong optical anisotropy and nonlinear optical responses. These unique polarization properties of the judiciously designed organic spherulite combined with the symmetry of structured optical beams enable a plethora of new linear and nonlinear light-matter interactions, including the generation of optical vortex beams with complex spin states and on-demand topological charges at the fundamental, doubled, and tripled frequencies. The results of this work are likely to enable numerous applications in areas such as high-dimensional quantum information processing, with large capacity and high security. The demonstrated spherulite crystals facilitate stand-alone micro-scale devices that rely on the unique micro-scale spontaneous vortex polarity that is likely to enable future applications for high-dimensional quantum information processing, spatiotemporal optical vortices, and a novel platform for optical manipulation and trapping. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.03034 [pdf, other]

Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination

Authors: Yang Li, Shao Zhang, Jichen Sun, Wenhao Zhang, Yali Du, Ying Wen, Xinbing Wang, Wei Pan

Abstract: Securing coordination between AI agent and teammates (human players or AI agents) in contexts involving unfamiliar humans continues to pose a significant challenge in Zero-Shot Coordination. The issue of cooperative incompatibility becomes particularly prominent when an AI agent is unsuccessful in synchronizing with certain previously unknown partners. Traditional algorithms have aimed to collabor… ▽ More Securing coordination between AI agent and teammates (human players or AI agents) in contexts involving unfamiliar humans continues to pose a significant challenge in Zero-Shot Coordination. The issue of cooperative incompatibility becomes particularly prominent when an AI agent is unsuccessful in synchronizing with certain previously unknown partners. Traditional algorithms have aimed to collaborate with partners by optimizing fixed objectives within a population, fostering diversity in strategies and behaviors. However, these techniques may lead to learning loss and an inability to cooperate with specific strategies within the population, a phenomenon named cooperative incompatibility in learning. In order to solve cooperative incompatibility in learning and effectively address the problem in the context of ZSC, we introduce the Cooperative Open-ended LEarning (COLE) framework, which formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy. We present two practical algorithms, specifically \algo and \algoR, which incorporate insights from game theory and graph theory. We also show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis. Subsequently, we created an online Overcooked human-AI experiment platform, the COLE platform, which enables easy customization of questionnaires, model weights, and other aspects. Utilizing the COLE platform, we enlist 130 participants for human experiments. Our findings reveal a preference for our approach over state-of-the-art methods using a variety of subjective metrics. Moreover, objective experimental outcomes in the Overcooked game environment indicate that our method surpasses existing ones when coordinating with previously unencountered AI agents and the human proxy model. △ Less

Submitted 7 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 46 pages. arXiv admin note: substantial text overlap with arXiv:2302.04831

arXiv:2305.20030 [pdf, other]

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust

Authors: Yuxin Wen, John Kirchenbauer, Jonas Geiping, Tom Goldstein

Abstract: Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content. In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. Unlike existing methods that perform post-hoc modifications to images after sampling, Tree-Ring Watermarking subtly influenc… ▽ More Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content. In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. Unlike existing methods that perform post-hoc modifications to images after sampling, Tree-Ring Watermarking subtly influences the entire sampling process, resulting in a model fingerprint that is invisible to humans. The watermark embeds a pattern into the initial noise vector used for sampling. These patterns are structured in Fourier space so that they are invariant to convolutions, crops, dilations, flips, and rotations. After image generation, the watermark signal is detected by inverting the diffusion process to retrieve the noise vector, which is then checked for the embedded signal. We demonstrate that this technique can be easily applied to arbitrary diffusion models, including text-conditioned Stable Diffusion, as a plug-in with negligible loss in FID. Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed. Code is available at https://github.com/YuxinWenRick/tree-ring-watermark. △ Less

Submitted 3 July, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: 16 pages, 8 figures, code is available at https://github.com/YuxinWenRick/tree-ring-watermark, fixed the repo link

arXiv:2305.19815 [pdf, other]

doi 10.1038/s41566-023-01212-1

Demonstration of the quantum principle of least action with single photons

Authors: Yong-Li Wen, Yunfei Wang, Li-Man Tian, Shanchao Zhang, Jianfeng Li, Jing-Song Du, Hui Yan, Shi-Liang Zhu

Abstract: The principle of least action is arguably the most fundamental principle in physics as it can be used to derive the equations of motion in various branches of physics. However, this principle has not been experimentally demonstrated at the quantum level because the propagators for Feymann's path integrals have never been observed. The propagator is a fundamental concept and contains various signif… ▽ More The principle of least action is arguably the most fundamental principle in physics as it can be used to derive the equations of motion in various branches of physics. However, this principle has not been experimentally demonstrated at the quantum level because the propagators for Feymann's path integrals have never been observed. The propagator is a fundamental concept and contains various significant properties of a quantum system in path integral formulation, so its experimental observation is itself essential in quantum mechanics. Here we theoretically propose and experimentally observe single photons' propagators based on the method of directly measuring quantum wave-functions. Furthermore, we obtain the classical trajectories of the single photons in free space and in a harmonic trap based on the extremum of the observed propagators, thereby experimentally demonstrating the quantum principle of least action. Our work paves the way for experimentally exploring fundamental problems of quantum theory in the formulation of path integrals. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Journal ref: Published online: 22 May 2023, Nature Photonics

arXiv:2305.18576 [pdf, other]

TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding

Authors: Zichen Liu, Xuyuan Liu, Yanlong Wen, Guoqing Zhao, Fen Xia, Xiaojie Yuan

Abstract: ICD coding is designed to assign the disease codes to electronic health records (EHRs) upon discharge, which is crucial for billing and clinical statistics. In an attempt to improve the effectiveness and efficiency of manual coding, many methods have been proposed to automatically predict ICD codes from clinical notes. However, most previous works ignore the decisive information contained in struc… ▽ More ICD coding is designed to assign the disease codes to electronic health records (EHRs) upon discharge, which is crucial for billing and clinical statistics. In an attempt to improve the effectiveness and efficiency of manual coding, many methods have been proposed to automatically predict ICD codes from clinical notes. However, most previous works ignore the decisive information contained in structured medical data in EHRs, which is hard to be captured from the noisy clinical notes. In this paper, we propose a Tree-enhanced Multimodal Attention Network (TreeMAN) to fuse tabular features and textual features into multimodal representations by enhancing the text representations with tree-based features via the attention mechanism. Tree-based features are constructed according to decision trees learned from structured multimodal medical data, which capture the decisive information about ICD coding. We can apply the same multi-label classifier from previous text models to the multimodal representations to predict ICD codes. Experiments on two MIMIC datasets show that our method outperforms prior state-of-the-art ICD coding approaches. The code is available at https://github.com/liu-zichen/TreeMAN. △ Less

Submitted 29 May, 2023; originally announced May 2023.

ACM Class: I.2.7

arXiv:2305.18498 [pdf, other]

ANPL: Towards Natural Programming with Interactive Decomposition

Authors: Di Huang, Ziyuan Nan, Xing Hu, Pengwei Jin, Shaohui Peng, Yuanbo Wen, Rui Zhang, Zidong Du, Qi Guo, Yewen Pu, Yunji Chen

Abstract: Though LLMs are capable of generating plausible programs, it's challenging to interact with the LLMs further to revise the program, especially if the user's specific requirements are different from the initial proposal. In this paper, we introduce ANPL, an interactive programming system that ensures users can always refine the generated code towards their specific programmatic intents via structur… ▽ More Though LLMs are capable of generating plausible programs, it's challenging to interact with the LLMs further to revise the program, especially if the user's specific requirements are different from the initial proposal. In this paper, we introduce ANPL, an interactive programming system that ensures users can always refine the generated code towards their specific programmatic intents via structured decompositions. Borrowing the paradigm of sketching from program synthesis, an ANPL program consists of a set of input-outputs that it must satisfy, a ``sketch'' -- control/data flow expressed in precise code (e.g. Python), and ``holes'' -- sub-modules to be implemented by the LLM specified with natural language. The user revises an ANPL program by either modifying the sketch, changing the language used to describe the holes, or providing additional input-outputs to a particular hole, turning it into a sub-ANPL program that can be solved recursively. This workflow allows the users to offload programming burdens to the LLM as much as possible while retaining the ability to pinpoint and resolve bugs locally, without exposing the rest of the program to the LLM. We deploy ANPL on the Abstraction and Reasoning Corpus (ARC), a set of unique tasks that are challenging for state-of-the-art AI systems, showing it outperforms baseline programming systems that (a) without the ability to decompose tasks interactively and (b) without the guarantee that the modules can be correctly composed together. Additional evaluations on APPS, HumanEval, and real-world programming tasks have validated that the ANPL framework is applicable to multiple programming domains. We release the ANPL solutions to the ARC tasks as a dataset, providing insights into how humans decompose novel tasks programmatically. See our code at https://iprc-dip.github.io/ANPL/. △ Less

Submitted 30 November, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.13750 [pdf, other]

doi 10.1103/PhysRevA.109.023705

Tuning atom-field interaction via phase shaping

Authors: Y. -T. Cheng, C. -H. Chien, K. -M. Hsieh, Y. -H. Huang, P. Y. Wen, W. -J. Lin, Y. Lu, F. Aziz, C. -P. Lee, K. -T. Lin, C. -Y. Chen, J. C. Chen, C. -S. Chuu, A. F. Kockum, G. -D. Lin, Y. -H. Lin, I. -C. Hoi

Abstract: A coherent electromagnetic field can be described by its amplitude, frequency, and phase. All these properties can influence the interaction between the field and an atom. Here we demonstrate the phase shaping of microwaves that are scattered by a superconducting artificial atom coupled to the end of a semi-infinite 1D transmission line. In particular, we input a weak exponentially rising pulse wi… ▽ More A coherent electromagnetic field can be described by its amplitude, frequency, and phase. All these properties can influence the interaction between the field and an atom. Here we demonstrate the phase shaping of microwaves that are scattered by a superconducting artificial atom coupled to the end of a semi-infinite 1D transmission line. In particular, we input a weak exponentially rising pulse with phase modulation to a transmon qubit. We observe that field-atom interaction can be tuned from nearly full interaction (interaction efficiency, i.e., amount of the field energy interacting with the atom, of 94.5%) to effectively no interaction (interaction efficiency 3.5%). △ Less

Submitted 26 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Journal ref: Physical Review A 109, 023705 (2024)

arXiv:2305.05686 [pdf, ps, other]

doi 10.1088/1751-8121/ace810

Separable Ball around any Full-Rank Multipartite Product State

Authors: Robin Yunfei Wen, Achim Kempf

Abstract: We show that around any $m$-partite product state $ρ_{\rm prod}=ρ_1\otimes...\otimesρ_m$ of full rank (that is ${\rm det}(ρ_{\rm prod})\neq 0)$, there exists a finite-sized closed ball of separable states centered around $ρ_{\rm prod}$ whose radius is $β:=2^{1-m/2}λ_{\rm min}(ρ_{\rm prod})$. Here, $λ_{\rm min}(ρ_{\rm prod})$ is the smallest eigenvalue of $ρ_{\rm prod}$. We are assuming that the to… ▽ More We show that around any $m$-partite product state $ρ_{\rm prod}=ρ_1\otimes...\otimesρ_m$ of full rank (that is ${\rm det}(ρ_{\rm prod})\neq 0)$, there exists a finite-sized closed ball of separable states centered around $ρ_{\rm prod}$ whose radius is $β:=2^{1-m/2}λ_{\rm min}(ρ_{\rm prod})$. Here, $λ_{\rm min}(ρ_{\rm prod})$ is the smallest eigenvalue of $ρ_{\rm prod}$. We are assuming that the total Hilbert space is finite dimensional and we use the notion of distance induced by the Frobenius norm. Applying a scaling relation, we also give a new and simple sufficient criterion for multipartite separability based on trace: ${\rm Tr}[ρρ_{\rm prod}]^2/{\rm Tr}[ρ^2]\geq {\rm Tr}[ρ_{\rm prod}^2]-β^2$. Using the separable balls around the full-rank product states, we discuss the existence and possible sizes of separable balls around any multipartite separable states, which are important features for the set of all separable states. We discuss the implication of these separable balls on entanglement dynamics. △ Less

Submitted 9 June, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 12 pages. v2: new sections, added references

Journal ref: 2023 J. Phys. A: Math. Theor. 56 335302

arXiv:2304.14897 [pdf]

doi 10.1103/PhysRevApplied.19.054045

First-principles Prediction of Potential Candidate Materials MCu$_3$X$_4$ (M = V, Nb, Ta; X = S, Se, Te) for Neuromorphic Computing

Authors: Baoxing Zhai, Ruiqing Cheng, Tianxing Wang, Li Liu, Lei Yin, Yao Wen, Hao Wang, Sheng Chang, Jun He

Abstract: Inspired by the neuro-synaptic frameworks in the human brain, neuromorphic computing is expected to overcome the bottleneck of traditional von-Neumann architecture and be used in artificial intelligence. Here, we predict a class of potential candidate materials, MCu$_3$X$_4$ (M = V, Nb, Ta; X = S, Se, Te), for neuromorphic computing applications through first-principles calculations based on densi… ▽ More Inspired by the neuro-synaptic frameworks in the human brain, neuromorphic computing is expected to overcome the bottleneck of traditional von-Neumann architecture and be used in artificial intelligence. Here, we predict a class of potential candidate materials, MCu$_3$X$_4$ (M = V, Nb, Ta; X = S, Se, Te), for neuromorphic computing applications through first-principles calculations based on density functional theory. We find that when MCu$_3$X$_4$ are inserted with Li atom, the systems would transform from semiconductors to metals due to the considerable electron filling [~0.8 electrons per formula unit (f.u.)] and still maintain well structural stability. Meanwhile, the inserted Li atom also has a low diffusion barrier (~0.6 eV/f.u.), which ensures the feasibility to control the insertion/extraction of Li by gate voltage. These results establish that the system can achieve the reversible switching between two stable memory states, i.e., high/low resistance state, indicating that it could potentially be used to design synaptic transistor to enable neuromorphic computing. Our work provides inspiration for advancing the search of candidate materials related to neuromorphic computing from the perspective of theoretical calculations. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: 28+8 pages, 18 figures

Journal ref: Phys. Rev. Applied 19, 054045 (2023)

arXiv:2304.13678 [pdf, other]

A marker-less human motion analysis system for motion-based biomarker discovery in knee disorders

Authors: Kai Armstrong, Lei Zhang, Yan Wen, Alexander P. Willmott, Paul Lee, Xujioing Ye

Abstract: In recent years the NHS has been having increased difficulty seeing all low-risk patients, this includes but not limited to suspected osteoarthritis (OA) patients. To help address the increased waiting lists and shortages of staff, we propose a novel method of automated biomarker identification for diagnosis of knee disorders and the monitoring of treatment progression. The proposed method allows… ▽ More In recent years the NHS has been having increased difficulty seeing all low-risk patients, this includes but not limited to suspected osteoarthritis (OA) patients. To help address the increased waiting lists and shortages of staff, we propose a novel method of automated biomarker identification for diagnosis of knee disorders and the monitoring of treatment progression. The proposed method allows for the measurement and analysis of biomechanics and analyse their clinical significance, in both a cheap and sensitive alternative to the currently available commercial alternatives. These methods and results validate the capabilities of standard RGB cameras in clinical environments to capture motion and show that when compared to alternatives such as depth cameras there is a comparable accuracy in the clinical environment. Biomarker identification using Principal Component Analysis (PCA) allows the reduction of the dimensionality to produce the most representative features from motion data, these new biomarkers can then be used to assess the success of treatment and track the progress of rehabilitation. This was validated by applying these techniques on a case study utilising the exploratory use of local anaesthetic applied on knee pain, this allows these new representative biomarkers to be validated as statistically significant (p-value < 0.05). △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 11 pages, 5 figures

arXiv:2304.11966 [pdf, other]

ICDAR 2023 Competition on Reading the Seal Title

Authors: Wenwen Yu, Mingyu Liu, Mingrui Chen, Ning Lu, Yinlong Wen, Yuliang Liu, Dimosthenis Karatzas, Xiang Bai

Abstract: Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (Re… ▽ More Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2). We constructed a dataset of 10,000 real seal data, covering the most common classes of seals, and labeled all seal title texts with text polygons and text contents. The competition opened on 30th December, 2022 and closed on 20th March, 2023. The competition attracted 53 participants from academia and industry including 28 submissions for Task 1 and 25 submissions for Task 2, which demonstrated significant interest in this challenging task. In this report, we present an overview of the competition, including the organization, challenges, and results. We describe the dataset and tasks, and summarize the submissions and evaluation results. The results show that significant progress has been made in the field of seal title text reading, and we hope that this competition will inspire further research and development in this important area of OCR technology. △ Less

Submitted 5 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: ICDAR2023 Competition on ReST report (To be appear in ICDAR 2023)

arXiv:2304.07281 [pdf, other]

doi 10.1088/1475-7516/2023/09/028

Sweeping Horndeski Canvas: New Growth-Rate Parameterization for Modified-Gravity Theories

Authors: Yuewei Wen, Nhat-Minh Nguyen, Dragan Huterer

Abstract: We propose and numerically validate a new fitting formula that is sufficiently accurate to model the growth of structure in Horndeski theories of modified gravity for upcoming Stage IV and V large-scale structure surveys. Based on an analysis of more than 18,000 Horndeski models and adopting the popular parameterization of the growth rate $f(z) = Ω_{M}(z)^γ$, we generalize the constant growth inde… ▽ More We propose and numerically validate a new fitting formula that is sufficiently accurate to model the growth of structure in Horndeski theories of modified gravity for upcoming Stage IV and V large-scale structure surveys. Based on an analysis of more than 18,000 Horndeski models and adopting the popular parameterization of the growth rate $f(z) = Ω_{M}(z)^γ$, we generalize the constant growth index $γ$ to a two-parameter redshift-dependent quantity, $γ(z)$, that more accurately fits these models. We demonstrate that the functional form $γ(z)=γ_0+γ_1z^2 / (1+z)$ improves the median $χ^2$ of the fit to viable Horndeski models by a factor of $\sim40$ relative to that of a constant $γ$, and is sufficient to obtain unbiased results even for precise measurements expected in Stage IV and V surveys. Finally, we constrain the parameters of the new fitting formula using current cosmological data. △ Less

Submitted 6 September, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: 23 pages, 6 figures; prepared for JCAP submission; comments welcome! v2: Add missing footnote on author role on the first page; fix latex package rendering error in a section title. v3: Match the version accepted to JCAP; correct repeated references

Report number: LCTP-23-05

Journal ref: JCAP09(2023)028

arXiv:2304.03217 [pdf, other]

doi 10.1063/5.0164930

Exploring the Spin Dynamics of a Room-Temperature Diamond Maser using an Extended rate Equation Model

Authors: Yongqiang Wen, Philip L. Diggle, Neil McN. Alford, Daan M. Arroo

Abstract: Masers - the microwave analogue of lasers - are coherent microwave sources that can act as oscillators or quantum-limited amplifiers. Masers have historically required high vacuum and cryogenic temperatures to operate, but recently masers based on diamond have been demonstrated to operate at room temperature and pressure, opening a route to new applications as ultra-low noise microwave amplifiers.… ▽ More Masers - the microwave analogue of lasers - are coherent microwave sources that can act as oscillators or quantum-limited amplifiers. Masers have historically required high vacuum and cryogenic temperatures to operate, but recently masers based on diamond have been demonstrated to operate at room temperature and pressure, opening a route to new applications as ultra-low noise microwave amplifiers. For these new applications to become feasible at a mass scale, it is important to optimise diamond masers by minimising their size and maximising their gain, as well as the maximum input power of signals that can be amplified. Here, we develop and numerically solve an extended rate equation model to present a detailed phenomenology of masing dynamics and determine the optimal properties required for the cavity, resonator and gain medium in order to develop portable maser devices. We conclude by suggesting how the material parameters of the diamond gain media and dielectric resonators used in diamond masers can be optimised and how rate equation models could be further developed to incorporate the effects of temperature and nitrogen concentration on spin lifetimes. △ Less

Submitted 7 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 23 pages, 8 figures

Journal ref: J. Appl. Phys. 134, 194501 (2023)

arXiv:2304.02860 [pdf, other]

Towards an Effective and Efficient Transformer for Rain-by-snow Weather Removal

Authors: Tao Gao, Yuanbo Wen, Kaihao Zhang, Peng Cheng, Ting Chen

Abstract: Rain-by-snow weather removal is a specialized task in weather-degraded image restoration aiming to eliminate coexisting rain streaks and snow particles. In this paper, we propose RSFormer, an efficient and effective Transformer that addresses this challenge. Initially, we explore the proximity of convolution networks (ConvNets) and vision Transformers (ViTs) in hierarchical architectures and exper… ▽ More Rain-by-snow weather removal is a specialized task in weather-degraded image restoration aiming to eliminate coexisting rain streaks and snow particles. In this paper, we propose RSFormer, an efficient and effective Transformer that addresses this challenge. Initially, we explore the proximity of convolution networks (ConvNets) and vision Transformers (ViTs) in hierarchical architectures and experimentally find they perform approximately at intra-stage feature learning. On this basis, we utilize a Transformer-like convolution block (TCB) that replaces the computationally expensive self-attention while preserving attention characteristics for adapting to input content. We also demonstrate that cross-stage progression is critical for performance improvement, and propose a global-local self-attention sampling mechanism (GLASM) that down-/up-samples features while capturing both global and local dependencies. Finally, we synthesize two novel rain-by-snow datasets, RSCityScape and RS100K, to evaluate our proposed RSFormer. Extensive experiments verify that RSFormer achieves the best trade-off between performance and time-consumption compared to other restoration methods. For instance, it outperforms Restormer with a 1.53% reduction in the number of parameters and a 15.6% reduction in inference time. Datasets, source code and pre-trained models are available at \url{https://github.com/chdwyb/RSFormer}. △ Less

Submitted 27 October, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: code is available at \url{https://github.com/chdwyb/RSFormer}

arXiv:2303.10362 [pdf, other]

doi 10.3847/1538-4357/acc4c1

The H$α$ broadband photometric reverberation mapping of four Seyfert 1 galaxies

Authors: Qinchun Ma, Xue-Bing Wu, Huapeng Gu, Yuhan Wen, Yuming Fu

Abstract: Broadband photometric reverberation mapping (PRM) have been investigated for AGNs in recent years, but mostly on accretion disk continuum RM. Due to the small fraction of broad emission lines in the broadband, PRM for emission lines is very challenging. Here we present an ICCF-Cut method for broadband PRM to obtain the H$α$ broad line lag and apply it to four Seyfert 1 galaxies, MCG+08-11-011, NGC… ▽ More Broadband photometric reverberation mapping (PRM) have been investigated for AGNs in recent years, but mostly on accretion disk continuum RM. Due to the small fraction of broad emission lines in the broadband, PRM for emission lines is very challenging. Here we present an ICCF-Cut method for broadband PRM to obtain the H$α$ broad line lag and apply it to four Seyfert 1 galaxies, MCG+08-11-011, NGC 2617, 3C 120 and NGC 5548. All of them have high quality broadband lightcurves with daily/sub-daily cadence, which enables us to extract H$α$ lightcurves from the line band by subtracting the contributions from the continuum and host galaxy. Their extracted H$α$ lightcurves are compared with the lagged continuum band lightcurves, as well as the lagged H$β$ lightcurves obtained by spectroscopic RM (SRM) at the same epochs. The consistency of these lightcurves and the comparison with the SRM H$β$ lags provide supports to the H$α$ lags of these AGNs, in a range from 9 to 19 days, obtained by the ICCF-Cut, JAVELIN and $χ^2$ methods. The simulations to evaluate the reliability of H$α$ lags and the comparisons between SRM H$β$ and PRM H$α$ lags indicate that the consistency of the ICCF-Cut, JAVELIN and $χ^2$ results can ensure the reliability of the derived H$α$ lags. These methods may be used to estimate the broad line region sizes and black hole masses of a large sample of AGNs in the large multi-epoch high cadence photometric surveys such as LSST in the future. △ Less

Submitted 18 March, 2023; originally announced March 2023.

Comments: 22 pages, 19 figures, accepted for publication in ApJ

arXiv:2303.02489 [pdf, other]

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Authors: Yanxin Long, Youpeng Wen, Jianhua Han, Hang Xu, Pengzhen Ren, Wei Zhang, Shen Zhao, Xiaodan Liang

Abstract: Benefiting from large-scale vision-language pre-training on image-text pairs, open-world detection methods have shown superior generalization ability under the zero-shot or few-shot detection settings. However, a pre-defined category space is still required during the inference stage of existing methods and only the objects belonging to that space will be predicted. To introduce a "real" open-worl… ▽ More Benefiting from large-scale vision-language pre-training on image-text pairs, open-world detection methods have shown superior generalization ability under the zero-shot or few-shot detection settings. However, a pre-defined category space is still required during the inference stage of existing methods and only the objects belonging to that space will be predicted. To introduce a "real" open-world detector, in this paper, we propose a novel method named CapDet to either predict under a given category list or directly generate the category of predicted bounding boxes. Specifically, we unify the open-world detection and dense caption tasks into a single yet effective framework by introducing an additional dense captioning head to generate the region-grounded captions. Besides, adding the captioning task will in turn benefit the generalization of detection performance since the captioning dataset covers more concepts. Experiment results show that by unifying the dense caption task, our CapDet has obtained significant performance improvements (e.g., +2.1% mAP on LVIS rare classes) over the baseline method on LVIS (1203 classes). Besides, our CapDet also achieves state-of-the-art performance on dense captioning tasks, e.g., 15.44% mAP on VG V1.2 and 13.98% on the VG-COCO dataset. △ Less

Submitted 15 March, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR2023

arXiv:2303.01983 [pdf, ps, other]

Auto-weighted Multi-view Clustering for Large-scale Data

Authors: Xinhang Wan, Xinwang Liu, Jiyuan Liu, Siwei Wang, Yi Wen, Weixuan Liang, En Zhu, Zhe Liu, Lu Zhou

Abstract: Multi-view clustering has gained broad attention owing to its capacity to exploit complementary information across multiple data views. Although existing methods demonstrate delightful clustering performance, most of them are of high time complexity and cannot handle large-scale data. Matrix factorization-based models are a representative of solving this problem. However, they assume that the view… ▽ More Multi-view clustering has gained broad attention owing to its capacity to exploit complementary information across multiple data views. Although existing methods demonstrate delightful clustering performance, most of them are of high time complexity and cannot handle large-scale data. Matrix factorization-based models are a representative of solving this problem. However, they assume that the views share a dimension-fixed consensus coefficient matrix and view-specific base matrices, limiting their representability. Moreover, a series of large-scale algorithms that bear one or more hyperparameters are impractical in real-world applications. To address the two issues, we propose an auto-weighted multi-view clustering (AWMVC) algorithm. Specifically, AWMVC first learns coefficient matrices from corresponding base matrices of different dimensions, then fuses them to obtain an optimal consensus matrix. By mapping original features into distinctive low-dimensional spaces, we can attain more comprehensive knowledge, thus obtaining better clustering results. Moreover, we design a six-step alternative optimization algorithm proven to be convergent theoretically. Also, AWMVC shows excellent performance on various benchmark datasets compared with existing ones. The code of AWMVC is publicly available at https://github.com/wanxinhang/AAAI-2023-AWMVC. △ Less

Submitted 20 January, 2023; originally announced March 2023.

arXiv:2303.01691 [pdf, other]

doi 10.1109/TSG.2023.3340157

Improved Inner Approximation for Aggregating Power Flexibility in Active Distribution Networks and its Applications

Authors: Yilin Wen, Zechun Hu, Jinhua He, Yi Guo

Abstract: Concise and reliable modeling for aggregating power flexibility of distributed energy resources in active distribution networks (ADNs) is a crucial technique for coordinating transmission and distribution networks. Our recent research has successfully derived an explicit expression for the exact aggregation model (EAM) of power flexibility at the substation level under linearized distribution netw… ▽ More Concise and reliable modeling for aggregating power flexibility of distributed energy resources in active distribution networks (ADNs) is a crucial technique for coordinating transmission and distribution networks. Our recent research has successfully derived an explicit expression for the exact aggregation model (EAM) of power flexibility at the substation level under linearized distribution network constraints. The EAM, however, is impractical for decision-making purposes due to its exponential complexity. In this paper, we propose an inner approximation method for aggregating flexibility in ADNs that utilizes the properties of the EAM to improve performance. Specifically, the geometric prototype of the inner approximation model is defined according to a subset of the coefficient vector set of the EAM, which enhances the accuracy. On the other hand, the computation efficiency of the inner approximation is also significantly improved by exploiting the regularity of coefficient vectors in the EAM in the parameter calculation process. The inner approximated flexibility model of ADNs is further incorporated into the security-constrained unit commitment problem as an application. Numerical simulations verify the effectiveness of the proposed method. △ Less

Submitted 24 June, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 10 pages

arXiv:2303.01277 [pdf, other]

Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication

Authors: Meng Zhang, Qinghao Hu, Peng Sun, Yonggang Wen, Tianwei Zhang

Abstract: Training Graph Neural Networks (GNNs) on large graphs is challenging due to the conflict between the high memory demand and limited GPU memory. Recently, distributed full-graph GNN training has been widely adopted to tackle this problem. However, the substantial inter-GPU communication overhead can cause severe throughput degradation. Existing communication compression techniques mainly focus on t… ▽ More Training Graph Neural Networks (GNNs) on large graphs is challenging due to the conflict between the high memory demand and limited GPU memory. Recently, distributed full-graph GNN training has been widely adopted to tackle this problem. However, the substantial inter-GPU communication overhead can cause severe throughput degradation. Existing communication compression techniques mainly focus on traditional DNN training, whose bottleneck lies in synchronizing gradients and parameters. We find they do not work well in distributed GNN training as the barrier is the layer-wise communication of features during the forward pass & feature gradients during the backward pass. To this end, we propose an efficient distributed GNN training framework Sylvie, which employs one-bit quantization technique in GNNs and further pipelines the curtailed communication with computation to enormously shrink the overhead while maintaining the model quality. In detail, Sylvie provides a lightweight Low-bit Module to quantize the sent data and dequantize the received data back to full precision values in each layer. Additionally, we propose a Bounded Staleness Adaptor to control the introduced staleness to achieve further performance enhancement. We conduct theoretical convergence analysis and extensive experiments on various models & datasets to demonstrate Sylvie can considerably boost the training throughput by up to 28.1x. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2303.00270 [pdf, ps, other]

doi 10.1063/5.0130905

Stability and energy identity for Yang-Mills-Higgs pairs

Authors: Xiaoli Han, Xishen Jin, Yang Wen

Abstract: In this paper, we study the properties of the critical points of Yang-Mills-Higgs functional, which are called Yang-Mills-Higgs pairs. We first consider the properties of weakly stable Yang-Mills-Higgs pairs on a vector bundle over S^n (n > 3). When n > 3, we prove that the norm of its Higgs field is 1 and the connection is actually Yang-Mills. More precisely, its curvature vanishes when n > 4. We… ▽ More In this paper, we study the properties of the critical points of Yang-Mills-Higgs functional, which are called Yang-Mills-Higgs pairs. We first consider the properties of weakly stable Yang-Mills-Higgs pairs on a vector bundle over S^n (n > 3). When n > 3, we prove that the norm of its Higgs field is 1 and the connection is actually Yang-Mills. More precisely, its curvature vanishes when n > 4. We also use the bubble-neck decomposition to prove the energy identity of a sequence of Yang-Mills-Higgs pairs over a 4-dimensional compact manifold with uniformly bounded energy. We show there is a subsequence converges smoothly to a Yang-Mills-Higgs pair up to gauge modulo finitely many 4-dimensional spheres with Yang-Mills connections. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 23 pages, 0 figures

MSC Class: 53C07

Journal ref: J. Math. Phys. 64, 021511 (2023)

arXiv:2302.14511 [pdf, other]

A Unified BEV Model for Joint Learning of 3D Local Features and Overlap Estimation

Authors: Lin Li, Wendong Ding, Yongkun Wen, Yufei Liang, Yong Liu, Guowei Wan

Abstract: Pairwise point cloud registration is a critical task for many applications, which heavily depends on finding correct correspondences from the two point clouds. However, the low overlap between input point clouds causes the registration to fail easily, leading to mistaken overlapping and mismatched correspondences, especially in scenes where non-overlapping regions contain similar structures. In th… ▽ More Pairwise point cloud registration is a critical task for many applications, which heavily depends on finding correct correspondences from the two point clouds. However, the low overlap between input point clouds causes the registration to fail easily, leading to mistaken overlapping and mismatched correspondences, especially in scenes where non-overlapping regions contain similar structures. In this paper, we present a unified bird's-eye view (BEV) model for jointly learning of 3D local features and overlap estimation to fulfill pairwise registration and loop closure. Feature description is performed by a sparse UNet-like network based on BEV representation, and 3D keypoints are extracted by a detection head for 2D locations, and a regression head for heights. For overlap detection, a cross-attention module is applied for interacting contextual information of input point clouds, followed by a classification head to estimate the overlapping region. We evaluate our unified model extensively on the KITTI dataset and Apollo-SouthBay dataset. The experiments demonstrate that our method significantly outperforms existing methods on overlap estimation, especially in scenes with small overlaps. It also achieves top registration performance on both datasets in terms of translation and rotation errors. △ Less

Submitted 14 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: 8 pages. Accepted by ICRA-2023

arXiv:2302.07450 [pdf, other]

FedABC: Targeting Fair Competition in Personalized Federated Learning

Authors: Dui Wang, Li Shen, Yong Luo, Han Hu, Kehua Su, Yonggang Wen, Dacheng Tao

Abstract: Federated learning aims to collaboratively train models without accessing their client's local private data. The data may be Non-IID for different clients and thus resulting in poor performance. Recently, personalized federated learning (PFL) has achieved great success in handling Non-IID data by enforcing regularization in local optimization or improving the model aggregation scheme on the server… ▽ More Federated learning aims to collaboratively train models without accessing their client's local private data. The data may be Non-IID for different clients and thus resulting in poor performance. Recently, personalized federated learning (PFL) has achieved great success in handling Non-IID data by enforcing regularization in local optimization or improving the model aggregation scheme on the server. However, most of the PFL approaches do not take into account the unfair competition issue caused by the imbalanced data distribution and lack of positive samples for some classes in each client. To address this issue, we propose a novel and generic PFL framework termed Federated Averaging via Binary Classification, dubbed FedABC. In particular, we adopt the ``one-vs-all'' training strategy in each client to alleviate the unfair competition between classes by constructing a personalized binary classification problem for each class. This may aggravate the class imbalance challenge and thus a novel personalized binary classification loss that incorporates both the under-sampling and hard sample mining strategies is designed. Extensive experiments are conducted on two popular datasets under different settings, and the results demonstrate that our FedABC can significantly outperform the existing counterparts. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 9 pages,5 figures

Journal ref: AAAI2023

arXiv:2302.07442 [pdf, other]

Microwave amplification via interfering multi-photon processes in a half-waveguide quantum electrodynamics system

Authors: Fahad Aziz, Kuan Ting Lin, Ping Yi Wen, Samina, Yu Chen Lin, Emely Wiegand, Ching-Ping Lee, Yu-Ting Cheng, Ching-Yeh Chen, Chin-Hsun Chien, Kai-Min Hsieh, Yu-Huan Huang, Ian Hou, Jeng-Chung Chen, Yen-Hsiang Lin, Anton Frisk Kockum, Guin Dar Lin, Io-Chun Hoi

Abstract: We investigate the amplification of a microwave probe signal by a superconducting artificial atom, a transmon, strongly coupled to the end of a one-dimensional semi-infinite transmission line. The end of the transmission line acts as a mirror for microwave fields. Due to the weak anharmonicity of the artificial atom, a strong pump field creates multi-photon excitations among the dressed states. Tr… ▽ More We investigate the amplification of a microwave probe signal by a superconducting artificial atom, a transmon, strongly coupled to the end of a one-dimensional semi-infinite transmission line. The end of the transmission line acts as a mirror for microwave fields. Due to the weak anharmonicity of the artificial atom, a strong pump field creates multi-photon excitations among the dressed states. Transitions between these dressed states, Rabi sidebands, give rise to either amplification or attenuation of the weak probe. We obtain a maximum amplitude amplification of about 18 %, higher than in any previous experiment with a single artificial atom, due to constructive interference between Rabi sidebands. We also characterize the noise properties of the system by measuring the spectrum of spontaneous emission. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.06205 [pdf, other]

Order Matters: Agent-by-agent Policy Optimization

Authors: Xihuai Wang, Zheng Tian, Ziyu Wan, Ying Wen, Jun Wang, Weinan Zhang

Abstract: While multi-agent trust region algorithms have achieved great success empirically in solving coordination tasks, most of them, however, suffer from a non-stationarity problem since agents update their policies simultaneously. In contrast, a sequential scheme that updates policies agent-by-agent provides another perspective and shows strong performance. However, sample inefficiency and lack of mono… ▽ More While multi-agent trust region algorithms have achieved great success empirically in solving coordination tasks, most of them, however, suffer from a non-stationarity problem since agents update their policies simultaneously. In contrast, a sequential scheme that updates policies agent-by-agent provides another perspective and shows strong performance. However, sample inefficiency and lack of monotonic improvement guarantees for each agent are still the two significant challenges for the sequential scheme. In this paper, we propose the \textbf{A}gent-by-\textbf{a}gent \textbf{P}olicy \textbf{O}ptimization (A2PO) algorithm to improve the sample efficiency and retain the guarantees of monotonic improvement for each agent during training. We justify the tightness of the monotonic improvement bound compared with other trust region algorithms. From the perspective of sequentially updating agents, we further consider the effect of agent updating order and extend the theory of non-stationarity into the sequential update scheme. To evaluate A2PO, we conduct a comprehensive empirical study on four benchmarks: StarCraftII, Multi-agent MuJoCo, Multi-agent Particle Environment, and Google Research Football full game scenarios. A2PO consistently outperforms strong baselines. △ Less

Submitted 26 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: Accepted by ICLR2023, https://openreview.net/forum?id=Q-neeWNVv1

arXiv:2302.04831 [pdf, other]

Cooperative Open-ended Learning Framework for Zero-shot Coordination

Authors: Yang Li, Shao Zhang, Jichen Sun, Yali Du, Ying Wen, Xinbing Wang, Wei Pan

Abstract: Zero-shot coordination in cooperative artificial intelligence (AI) remains a significant challenge, which means effectively coordinating with a wide range of unseen partners. Previous algorithms have attempted to address this challenge by optimizing fixed objectives within a population to improve strategy or behaviour diversity. However, these approaches can result in a loss of learning and an ina… ▽ More Zero-shot coordination in cooperative artificial intelligence (AI) remains a significant challenge, which means effectively coordinating with a wide range of unseen partners. Previous algorithms have attempted to address this challenge by optimizing fixed objectives within a population to improve strategy or behaviour diversity. However, these approaches can result in a loss of learning and an inability to cooperate with certain strategies within the population, known as cooperative incompatibility. To address this issue, we propose the Cooperative Open-ended LEarning (COLE) framework, which constructs open-ended objectives in cooperative games with two players from the perspective of graph theory to assess and identify the cooperative ability of each strategy. We further specify the framework and propose a practical algorithm that leverages knowledge from game theory and graph theory. Furthermore, an analysis of the learning process of the algorithm shows that it can efficiently overcome cooperative incompatibility. The experimental results in the Overcooked game environment demonstrate that our method outperforms current state-of-the-art methods when coordinating with different-level partners. Our demo is available at https://sites.google.com/view/cole-2023. △ Less

Submitted 28 February, 2024; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: 15 pages with 9 pages main body

arXiv:2302.03668 [pdf, other]

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

Authors: Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, Tom Goldstein

Abstract: The strength of modern generative models lies in their ability to be controlled through text-based prompts. Typical "hard" prompts are made from interpretable words and tokens, and must be hand-crafted by humans. There are also "soft" prompts, which consist of continuous feature vectors. These can be discovered using powerful optimization methods, but they cannot be easily interpreted, re-used acr… ▽ More The strength of modern generative models lies in their ability to be controlled through text-based prompts. Typical "hard" prompts are made from interpretable words and tokens, and must be hand-crafted by humans. There are also "soft" prompts, which consist of continuous feature vectors. These can be discovered using powerful optimization methods, but they cannot be easily interpreted, re-used across models, or plugged into a text-based interface. We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization. Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications. In the text-to-image setting, the method creates hard prompts for diffusion models, allowing API users to easily generate, discover, and mix and match image concepts without prior knowledge on how to prompt the model. In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification. △ Less

Submitted 1 June, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 15 pages, 12 figures, Code is available at https://github.com/YuxinWenRick/hard-prompts-made-easy

arXiv:2302.02184 [pdf, other]

Real-Time Image Demoireing on Mobile Devices

Authors: Yuxin Zhang, Mingbao Lin, Xunchao Li, Han Liu, Guozhi Wang, Fei Chao, Shuai Ren, Yafei Wen, Xiaoxin Chen, Rongrong Ji

Abstract: Moire patterns appear frequently when taking photos of digital screens, drastically degrading the image quality. Despite the advance of CNNs in image demoireing, existing networks are with heavy design, causing redundant computation burden for mobile devices. In this paper, we launch the first study on accelerating demoireing networks and propose a dynamic demoireing acceleration method (DDA) towa… ▽ More Moire patterns appear frequently when taking photos of digital screens, drastically degrading the image quality. Despite the advance of CNNs in image demoireing, existing networks are with heavy design, causing redundant computation burden for mobile devices. In this paper, we launch the first study on accelerating demoireing networks and propose a dynamic demoireing acceleration method (DDA) towards a real-time deployment on mobile devices. Our stimulus stems from a simple-yet-universal fact that moire patterns often unbalancedly distribute across an image. Consequently, excessive computation is wasted upon non-moire areas. Therefore, we reallocate computation costs in proportion to the complexity of image patches. In order to achieve this aim, we measure the complexity of an image patch by designing a novel moire prior that considers both colorfulness and frequency information of moire patterns. Then, we restore image patches with higher-complexity using larger networks and the ones with lower-complexity are assigned with smaller networks to relieve the computation burden. At last, we train all networks in a parameter-shared supernet paradigm to avoid additional parameter burden. Extensive experiments on several benchmarks demonstrate the efficacy of our proposed DDA. In addition, the acceleration evaluated on the VIVO X80 Pro smartphone equipped with a chip of Snapdragon 8 Gen 1 shows that our method can drastically reduce the inference time, leading to a real-time image demoireing on mobile devices. Source codes and models are released at https://github.com/zyxxmu/DDA △ Less

Submitted 4 February, 2023; originally announced February 2023.

Comments: To appear in the eleventh International Conference on Learning Representations (ICLR 2023)

arXiv:2302.01331 [pdf, other]

doi 10.1103/PhysRevLett.131.111001

Evidence for suppression of structure growth in the concordance cosmological model

Authors: Nhat-Minh Nguyen, Dragan Huterer, Yuewei Wen

Abstract: We present evidence for a suppressed growth rate of large-scale structure during the dark-energy dominated era. Modeling the growth rate of perturbations with the ``growth index'' $γ$, we find that current cosmological data strongly prefer a higher growth index than the value $γ=0.55$ predicted by general relativity in a flat $Λ$CDM cosmology. Both the cosmic microwave background data from Planck… ▽ More We present evidence for a suppressed growth rate of large-scale structure during the dark-energy dominated era. Modeling the growth rate of perturbations with the ``growth index'' $γ$, we find that current cosmological data strongly prefer a higher growth index than the value $γ=0.55$ predicted by general relativity in a flat $Λ$CDM cosmology. Both the cosmic microwave background data from Planck and the large-scale structure data from weak lensing, galaxy clustering, and cosmic velocities separately favor growth suppression. When combined, they yield $γ=0.633^{+0.025}_{-0.024}$, excluding $γ=0.55$ at a statistical significance of 3.7$σ$. The combination of $fσ_8$ and Planck measurements prefers an even higher growth index of $γ=0.639^{+0.024}_{-0.025}$, corresponding to a 4.2$σ$-tension with the concordance model. In Planck data, the suppressed growth rate offsets the preference for nonzero curvature and fits the data equally well as the latter model. A higher $γ$ leads to a higher matter fluctuation amplitude $S_8$ inferred from galaxy clustering and weak lensing measurements, and a lower $S_8$ from Planck data, effectively resolving the $S_8$ tension. △ Less

Submitted 7 August, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: 5 pages + references; 5 figures, 2 tables, 2900 words. Comments welcome! v2: a few more pages and figures, just enough number of words; PRL in press

Report number: LCTP-23-03

Journal ref: Phys. Rev. Lett. 131, 111001 (2023)

arXiv:2301.10226 [pdf, other]

A Watermark for Large Language Models

Authors: John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein

Abstract: Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient o… ▽ More Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security. △ Less

Submitted 1 May, 2024; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: 13 pages in the main body. Published at ICML 2023. Code is available at github.com/jwkirchenbauer/lm-watermarking

arXiv:2301.06244 [pdf, other]

Haptic Transparency and Interaction Force Control for a Lower-Limb Exoskeleton

Authors: Emek Barış Küçüktabak, Yue Wen, Sangjoon J. Kim, Matthew Short, Daniel Ludvig, Levi Hargrove, Eric Perreault, Kevin Lynch, Jose Pons

Abstract: Controlling the interaction forces between a human and an exoskeleton is crucial for providing transparency or adjusting assistance or resistance levels. However, it is an open problem to control the interaction forces of lower-limb exoskeletons designed for unrestricted overground walking. For these types of exoskeletons, it is challenging to implement force/torque sensors at every contact betwee… ▽ More Controlling the interaction forces between a human and an exoskeleton is crucial for providing transparency or adjusting assistance or resistance levels. However, it is an open problem to control the interaction forces of lower-limb exoskeletons designed for unrestricted overground walking. For these types of exoskeletons, it is challenging to implement force/torque sensors at every contact between the user and the exoskeleton for direct force measurement. Moreover, it is important to compensate for the exoskeleton's whole-body gravitational and dynamical forces, especially for heavy lower-limb exoskeletons. Previous works either simplified the dynamic model by treating the legs as independent double pendulums, or they did not close the loop with interaction force feedback. The proposed whole-exoskeleton closed-loop compensation (WECC) method calculates the interaction torques during the complete gait cycle by using whole-body dynamics and joint torque measurements on a hip-knee exoskeleton. Furthermore, it uses a constrained optimization scheme to track desired interaction torques in a closed loop while considering physical and safety constraints. We evaluated the haptic transparency and dynamic interaction torque tracking of WECC control on three subjects. We also compared the performance of WECC with a controller based on a simplified dynamic model and a passive version of the exoskeleton. The WECC controller results in a consistently low absolute interaction torque error during the whole gait cycle for both zero and nonzero desired interaction torques. In contrast, the simplified controller yields poor performance in tracking desired interaction torques during the stance phase. △ Less

Submitted 22 January, 2024; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: 19 pages, 13 figures. Accepted for publication in the IEEE Transactions on Robotics (T-RO)

arXiv:2301.05897 [pdf]

doi 10.1117/12.2644087

Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancy

Authors: Erik Isai Valle Salgado, Haoxin Yan, Yue Hong, Peiyuan Zhu, Shidong Zhu, Chengwei Liao, Yanxiang Wen, Xiu Li, Xiang Qian, Xiaohao Wang, Xinghui Li

Abstract: Transfer learning is a promising method for AOI applications since it can significantly shorten sample collection time and improve efficiency in today's smart manufacturing. However, related research enhanced the network models by applying TL without considering the domain similarity among datasets, the data long-tailedness of a source dataset, and mainly used linear transformations to mitigate th… ▽ More Transfer learning is a promising method for AOI applications since it can significantly shorten sample collection time and improve efficiency in today's smart manufacturing. However, related research enhanced the network models by applying TL without considering the domain similarity among datasets, the data long-tailedness of a source dataset, and mainly used linear transformations to mitigate the lack of samples. This research applies model-based TL via domain similarity to improve the overall performance and data augmentation in both target and source domains to enrich the data quality and reduce the imbalance. Given a group of source datasets from similar industrial processes, we define which group is the most related to the target through the domain discrepancy score and the number of samples each has. Then, we transfer the chosen pre-trained backbone weights to train and fine-tune the target network. Our research suggests increases in the F1 score and the PR curve up to 20% compared with TL using benchmark datasets. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: This is a fix of the published paper "Relational-based transfer learning for automatic optical inspection based on domain discrepancy"

Journal ref: Proc. SPIE 12317, Optoelectronic Imaging and Multimedia Technology IXMultimedia Technology IX, 2023

arXiv:2301.02445 [pdf, other]

IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling

Authors: Yilin Wen, Biao Luo, Yuqian Zhao

Abstract: Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data. However, for complex multimodal information and sparse training data, it is usually difficult to achieve interpretability and high accuracy simultaneously for most methods. To address this difficulty, a new model is developed in this paper, namely Interpretable Multi… ▽ More Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data. However, for complex multimodal information and sparse training data, it is usually difficult to achieve interpretability and high accuracy simultaneously for most methods. To address this difficulty, a new model is developed in this paper, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM). First, a multi-modal fine-grained fusion method is proposed, and Vgg16 and Optical Character Recognition (OCR) techniques are adopted to effectively extract text information from images and images. Then, the knowledge graph link prediction task is modelled as an offline reinforcement learning Markov decision model, which is then abstracted into a unified sequence framework. An interactive perception-based reward expectation mechanism and a special causal masking mechanism are designed, which "converts" the query into an inference path. Then, an autoregressive dynamic gradient adjustment mechanism is proposed to alleviate the insufficient problem of multimodal optimization. Finally, two datasets are adopted for experiments, and the popular SOTA baselines are used for comparison. The results show that the developed IMKGA-SM achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes. △ Less

Submitted 11 January, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: 12pages,10 figures

arXiv:2301.02403 [pdf, other]

CyberLoc: Towards Accurate Long-term Visual Localization

Authors: Liu Liu, Yukai Lin, Xiao Liang, Qichao Xu, Miao Jia, Yangdong Liu, Yuxiang Wen, Wei Luo, Jiangwei Li

Abstract: This technical report introduces CyberLoc, an image-based visual localization pipeline for robust and accurate long-term pose estimation under challenging conditions. The proposed method comprises four modules connected in a sequence. First, a mapping module is applied to build accurate 3D maps of the scene, one map for each reference sequence if there exist multiple reference sequences under diff… ▽ More This technical report introduces CyberLoc, an image-based visual localization pipeline for robust and accurate long-term pose estimation under challenging conditions. The proposed method comprises four modules connected in a sequence. First, a mapping module is applied to build accurate 3D maps of the scene, one map for each reference sequence if there exist multiple reference sequences under different conditions. Second, a single-image-based localization pipeline (retrieval--matching--PnP) is performed to estimate 6-DoF camera poses for each query image, one for each 3D map. Third, a consensus set maximization module is proposed to filter out outlier 6-DoF camera poses, and outputs one 6-DoF camera pose for a query. Finally, a robust pose refinement module is proposed to optimize 6-DoF query poses, taking candidate global 6-DoF camera poses and their corresponding global 2D-3D matches, sparse 2D-2D feature matches between consecutive query images and SLAM poses of the query sequence as input. Experiments on the 4seasons dataset show that our method achieves high accuracy and robustness. In particular, our approach wins the localization challenge of ECCV 2022 workshop on Map-based Localization for Autonomous Driving (MLAD-ECCV2022). △ Less

Submitted 6 January, 2023; originally announced January 2023.

Comments: MLAD-ECCV 2022

arXiv:2301.02348 [pdf, other]

High-Speed High-Accuracy Spatial Curve Tracking Using Motion Primitives in Industrial Robots

Authors: Honglu He, Chen-lung Lu, Yunshi Wen, Glenn Saunders, Pinghai Yang, Jeffrey Schoonover, Agung Julius, John T. Wen

Abstract: Industrial robots are increasingly deployed in applications requiring an end effector tool to closely track a specified path, such as in spraying and welding. Performance and productivity present possibly conflicting objectives: tracking accuracy, path speed, and motion uniformity. Industrial robots are programmed through motion primitives consisting of waypoints connected by pre-defined motion se… ▽ More Industrial robots are increasingly deployed in applications requiring an end effector tool to closely track a specified path, such as in spraying and welding. Performance and productivity present possibly conflicting objectives: tracking accuracy, path speed, and motion uniformity. Industrial robots are programmed through motion primitives consisting of waypoints connected by pre-defined motion segments, with specified parameters such as path speed and blending zone. The actual executed robot motion depends on the robot joint servo controller and joint motion constraints (velocity, acceleration, etc.) which are largely unknown to the users. Programming a robot to achieve the desired performance today is time-consuming and mostly manual, requiring tuning a large number of coupled parameters in the motion primitives. The performance also depends on the choice of additional parameters: possible redundant degrees of freedom, location of the target curve, and the robot configuration. This paper presents a systematic approach to optimize the robot motion primitives for performance. The approach first selects the static parameters, then the motion primitives, and finally iteratively update the waypoints to minimize the tracking error. The ultimate performance objective is to maximize the path speed subject to the tracking accuracy and speed uniformity constraints over the entire path. We have demonstrated the effectiveness of this approach in simulation for ABB and FANUC robots for two challenging example curves, and experimentally for an ABB robot. Comparing with the baseline using the current industry practice, the optimized performance shows over 200% performance improvement. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2301.02280 [pdf, other]

Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training

Authors: Filip Radenovic, Abhimanyu Dubey, Abhishek Kadian, Todor Mihaylov, Simon Vandenhende, Yash Patel, Yi Wen, Vignesh Ramanathan, Dhruv Mahajan

Abstract: Vision-language models trained with contrastive learning on large-scale noisy data are becoming increasingly popular for zero-shot recognition problems. In this paper we improve the following three aspects of the contrastive pre-training pipeline: dataset noise, model initialization and the training objective. First, we propose a straightforward filtering strategy titled Complexity, Action, and Te… ▽ More Vision-language models trained with contrastive learning on large-scale noisy data are becoming increasingly popular for zero-shot recognition problems. In this paper we improve the following three aspects of the contrastive pre-training pipeline: dataset noise, model initialization and the training objective. First, we propose a straightforward filtering strategy titled Complexity, Action, and Text-spotting (CAT) that significantly reduces dataset size, while achieving improved performance across zero-shot vision-language tasks. Next, we propose an approach titled Concept Distillation to leverage strong unimodal representations for contrastive training that does not increase training complexity while outperforming prior work. Finally, we modify the traditional contrastive alignment objective, and propose an importance-sampling approach to up-sample the importance of hard-negatives without adding additional complexity. On an extensive zero-shot benchmark of 29 tasks, our Distilled and Hard-negative Training (DiHT) approach improves on 20 tasks compared to the baseline. Furthermore, for few-shot linear probing, we propose a novel approach that bridges the gap between zero-shot and few-shot performance, substantially improving over prior work. Models are available at https://github.com/facebookresearch/diht. △ Less

Submitted 29 March, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: CVPR 2023

arXiv:2301.01795 [pdf, other]

PACO: Parts and Attributes of Common Objects

Authors: Vignesh Ramanathan, Anmol Kalia, Vladan Petrovic, Yi Wen, Baixue Zheng, Baishan Guo, Rui Wang, Aaron Marquez, Rama Kovvuri, Abhishek Kadian, Amir Mousavi, Yiwen Song, Abhimanyu Dubey, Dhruv Mahajan

Abstract: Object models are gradually progressing from predicting just category labels to providing detailed descriptions of object instances. This motivates the need for large datasets which go beyond traditional object masks and provide richer annotations such as part masks and attributes. Hence, we introduce PACO: Parts and Attributes of Common Objects. It spans 75 object categories, 456 object-part cate… ▽ More Object models are gradually progressing from predicting just category labels to providing detailed descriptions of object instances. This motivates the need for large datasets which go beyond traditional object masks and provide richer annotations such as part masks and attributes. Hence, we introduce PACO: Parts and Attributes of Common Objects. It spans 75 object categories, 456 object-part categories and 55 attributes across image (LVIS) and video (Ego4D) datasets. We provide 641K part masks annotated across 260K object boxes, with roughly half of them exhaustively annotated with attributes as well. We design evaluation metrics and provide benchmark results for three tasks on the dataset: part mask segmentation, object and part attribute prediction and zero-shot instance detection. Dataset, models, and code are open-sourced at https://github.com/facebookresearch/paco. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2212.14249 [pdf]

Momentum-Resolved Sum-Frequency Vibrational Spectroscopy of Bonded Interface Layer at Charged Water Interfaces

Authors: Yao Hsiao, Ting-Han Chou, Animesh Patra, Yu-Chieh Wen

Abstract: Interface-specific hydrogen- (H-)bonding network of water next to a substrate (including air) directly controls the energy transfer and chemical reaction pathway at many charged aqueous interfaces. Yet, experimental characterization of such bonded water layer structure is still a challenge due to the presence of the ion diffuse layer. We now develop a sum-frequency (SF) spectroscopic scheme with v… ▽ More Interface-specific hydrogen- (H-)bonding network of water next to a substrate (including air) directly controls the energy transfer and chemical reaction pathway at many charged aqueous interfaces. Yet, experimental characterization of such bonded water layer structure is still a challenge due to the presence of the ion diffuse layer. We now develop a sum-frequency (SF) spectroscopic scheme with varying photon momentums as an all-optic solution for retrieving the vibrational spectra of the bonded water layer and the diffuse layer, and hence microscopic structural and charging information about an interface. Application of the method to a charged surfactant-water interface reveals a hidden weakly-donor-H-bonded water species, suggesting an asymmetric hydration-shell structure of fully solvated surfactant headgroups. In another application to a zwitterionic phosphatidylcholine (PC) lipid monolayer-water interface, we find a highly polarized bonded water layer structure associating to the PC headgroup, while the diffuse layer contribution is experimentally proven to be negligible. Our all-optic method offers not only an in situ microscopic probe of the electrochemical and biological interfaces, but also a new opportunity for promoting these researches toward high spatial and temporal resolutions. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.14226 [pdf, other]

doi 10.1103/PhysRevE.107.L032601

Activity-assisted barrier-crossing of self-propelled colloids over parallel microgrooves

Authors: Yan Wen, Zhihao Li, Haiqin Wang, Jing Zheng, Jinyao Tang, Pik-Yin Lai, Xinpeng Xu, Penger Tong

Abstract: We report a systematic study of the dynamics of self-propelled particles (SPPs) over a one-dimensional periodic potential landscape, which is fabricated on a microgroove-patterned polydimethylsiloxane (PDMS) substrate. From the measured non-equilibrium probability density function of the SPPs, we find that the escape dynamics of the slow-rotating SPPs across the potential landscape can be describe… ▽ More We report a systematic study of the dynamics of self-propelled particles (SPPs) over a one-dimensional periodic potential landscape, which is fabricated on a microgroove-patterned polydimethylsiloxane (PDMS) substrate. From the measured non-equilibrium probability density function of the SPPs, we find that the escape dynamics of the slow-rotating SPPs across the potential landscape can be described by an effective potential, once the self-propulsion force is included into the potential under the fixed angle approximation. This work demonstrates that the parallel microgrooves provide a versatile platform for a quantitative understanding of the interplay among the self-propulsion force, spatial confinement by the potential landscape, and thermal noise, as well as its effects on activity-assisted escape dynamics and transport of the SPPs. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.12669 [pdf, other]

On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective

Authors: Ying Wen, Ziyu Wan, Ming Zhou, Shufang Hou, Zhe Cao, Chenyang Le, Jingxiao Chen, Zheng Tian, Weinan Zhang, Jun Wang

Abstract: The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI)… ▽ More The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI) that transcends task and application boundaries is critical for enhancing IDM. Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks, including computer vision, natural language processing, and reinforcement learning. We propose that a Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture, offering a promising solution for expanding IDM applications in complex real-world situations. In this paper, we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI, production scheduling, and robotics tasks. Lastly, we present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks, such as text generation, image captioning, video game playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 represents an initial step toward more autonomous and efficient real-world IDM applications. △ Less

Submitted 16 May, 2023; v1 submitted 24 December, 2022; originally announced December 2022.

Comments: 26 pages, 4 figures

arXiv:2212.09248 [pdf, other]

Natural Language to Code Generation in Interactive Data Science Notebooks

Authors: Pengcheng Yin, Wen-Ding Li, Kefan Xiao, Abhishek Rao, Yeming Wen, Kensen Shi, Joshua Howland, Paige Bailey, Michele Catasta, Henryk Michalewski, Alex Polozov, Charles Sutton

Abstract: Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using… ▽ More Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using the pandas data analysis framework in data science notebooks. ARCADE features multiple rounds of NL-to-code problems from the same notebook. It requires a model to understand rich multi-modal contexts, such as existing notebook cells and their execution states as well as previous turns of interaction. To establish a strong baseline on this challenging task, we develop PaChiNCo, a 62B code language model (LM) for Python computational notebooks, which significantly outperforms public code LMs. Finally, we explore few-shot prompting strategies to elicit better code with step-by-step decomposition and NL explanation, showing the potential to improve the diversity and explainability of model predictions. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: 46 pages. 32 figures

arXiv:2212.08059 [pdf, other]

Rethinking Vision Transformers for MobileNet Size and Speed

Authors: Yanyu Li, Ju Hu, Yang Wen, Georgios Evangelidis, Kamyar Salahi, Yanzhi Wang, Sergey Tulyakov, Jian Ren

Abstract: With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its varian… ▽ More With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its variants still have higher latency or considerably more parameters than lightweight CNNs, even true for the years-old MobileNet. In practice, latency and size are both crucial for efficient deployment on resource-constraint hardware. In this work, we investigate a central question, can transformer models run as fast as MobileNet and maintain a similar size? We revisit the design choices of ViTs and propose a novel supernet with low latency and high parameter efficiency. We further introduce a novel fine-grained joint search strategy for transformer models that can find efficient architectures by optimizing latency and number of parameters simultaneously. The proposed models, EfficientFormerV2, achieve 3.5% higher top-1 accuracy than MobileNetV2 on ImageNet-1K with similar latency and parameters. This work demonstrate that properly designed and optimized vision transformers can achieve high performance even with MobileNet-level size and speed. △ Less

Submitted 4 September, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: Code is available at: https://github.com/snap-research/EfficientFormer

Showing 251–300 of 622 results for author: Wen, Y