Search | arXiv e-print repository

High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models

Authors: Selim F. Yilmaz, Xueyan Niu, Bo Bai, Wei Han, Lei Deng, Deniz Gunduz

Abstract: We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. W… ▽ More We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. We introduce a novel scheme that utilizes the range-null space decomposition of the target image. We transmit the range-space of the image after encoding and employ DDPM to progressively refine its null space contents. Through extensive experiments, we demonstrate significant improvements in distortion and perceptual quality of reconstructed images compared to standard DeepJSCC and the state-of-the-art generative learning-based method. We will publicly share our source code to facilitate further research and reproducibility. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 6 pages, 4 figures

arXiv:2309.04842 [pdf, other]

Leveraging Large Language Models for Exploiting ASR Uncertainty

Authors: Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg, Xiaochuan Niu, Ahmed Tewfik

Abstract: While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the… ▽ More While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the accuracy of a fixed ASR system on the spoken input. Specifically, we tackle speech-intent classification task, where a high word-error-rate can limit the LLM's ability to understand the spoken intent. Instead of chasing a high accuracy by designing complex or specialized architectures regardless of deployment costs, we seek to answer how far we can go without substantially changing the underlying ASR and LLM, which can potentially be shared by multiple unrelated tasks. To this end, we propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis. We explore prompt-engineering to explain the concept of n-best lists to the LLM; followed by the finetuning of Low-Rank Adapters on the downstream tasks. Our approach using n-best lists proves to be effective on a device-directed speech detection task as well as on a keyword spotting task, where systems using n-best list prompts outperform those using 1-best ASR hypothesis; thus paving the way for an efficient method to exploit ASR uncertainty via LLMs for speech-based applications. △ Less

Submitted 12 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

Comments: Added references

arXiv:2309.03040 [pdf, other]

Automated CVE Analysis for Threat Prioritization and Impact Prediction

Authors: Ehsan Aghaei, Ehab Al-Shaer, Waseem Shadid, Xi Niu

Abstract: The Common Vulnerabilities and Exposures (CVE) are pivotal information for proactive cybersecurity measures, including service patching, security hardening, and more. However, CVEs typically offer low-level, product-oriented descriptions of publicly disclosed cybersecurity vulnerabilities, often lacking the essential attack semantic information required for comprehensive weakness characterization… ▽ More The Common Vulnerabilities and Exposures (CVE) are pivotal information for proactive cybersecurity measures, including service patching, security hardening, and more. However, CVEs typically offer low-level, product-oriented descriptions of publicly disclosed cybersecurity vulnerabilities, often lacking the essential attack semantic information required for comprehensive weakness characterization and threat impact estimation. This critical insight is essential for CVE prioritization and the identification of potential countermeasures, particularly when dealing with a large number of CVEs. Current industry practices involve manual evaluation of CVEs to assess their attack severities using the Common Vulnerability Scoring System (CVSS) and mapping them to Common Weakness Enumeration (CWE) for potential mitigation identification. Unfortunately, this manual analysis presents a major bottleneck in the vulnerability analysis process, leading to slowdowns in proactive cybersecurity efforts and the potential for inaccuracies due to human errors. In this research, we introduce our novel predictive model and tool (called CVEDrill) which revolutionizes CVE analysis and threat prioritization. CVEDrill accurately estimates the CVSS vector for precise threat mitigation and priority ranking and seamlessly automates the classification of CVEs into the appropriate CWE hierarchy classes. By harnessing CVEDrill, organizations can now implement cybersecurity countermeasure mitigation with unparalleled accuracy and timeliness, surpassing in this domain the capabilities of state-of-the-art tools like ChaptGPT. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.12062 [pdf, other]

Rationally Correcting Impurity Levels Positions Based on Electrostatic Potential Strategy for Photocatalytic Overall Water Splitting

Authors: Dazhong Sun, Wentao Li, Anqi Shi, Wenxia Zhang, Huabing Shu, Fengfeng Chi, Bing Wang, Xiuyun Zhang, Xianghong Niu

Abstract: Doping to induce suitable impurity levels is an effective strategy to achieve highly efficient photocatalytic overall water splitting (POWS). However, to predict the position of impurity levels, it is not enough to only depend on the projected density of states of the substituted atom in the traditional method. Herein, taking in phosphorus-doped g-C3N5 as a sample, we find that the impurity atom c… ▽ More Doping to induce suitable impurity levels is an effective strategy to achieve highly efficient photocatalytic overall water splitting (POWS). However, to predict the position of impurity levels, it is not enough to only depend on the projected density of states of the substituted atom in the traditional method. Herein, taking in phosphorus-doped g-C3N5 as a sample, we find that the impurity atom can change electrostatic potential gradient and polarity, then significantly affect the spatial electron density around the substituted atom, which further adjusts the impurity level position. Based on the redox potential requirement of POWS, we not only obtain suitable impurity levels, but also expand the visible light absorption range. Simultaneously, the strengthened polarity induced by doping further improve the redox ability of photogenerated carriers. Moreover, the enhanced surface dipoles obviously promote the adsorption and subsequent splitting of water molecules. Our study provides a more comprehensive view to realize accurate regulation of impurity levels in doping engineering and gives reasonable strategies for designing an excellent catalyst of POWS. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 15 pages, 7 figures, 1 table, 37 reference articles

arXiv:2308.08244 [pdf, other]

A Hybrid Wireless Image Transmission Scheme with Diffusion

Authors: Xueyan Niu, Xu Wang, Deniz Gündüz, Bo Bai, Weichao Chen, Guohua Zhou

Abstract: We propose a hybrid joint source-channel coding (JSCC) scheme, in which the conventional digital communication scheme is complemented with a generative refinement component to improve the perceptual quality of the reconstruction. The input image is decomposed into two components: the first is a coarse compressed version, and is transmitted following the conventional separation based approach. An a… ▽ More We propose a hybrid joint source-channel coding (JSCC) scheme, in which the conventional digital communication scheme is complemented with a generative refinement component to improve the perceptual quality of the reconstruction. The input image is decomposed into two components: the first is a coarse compressed version, and is transmitted following the conventional separation based approach. An additional component is obtained through the diffusion process by adding independent Gaussian noise to the input image, and is transmitted using DeepJSCC. The decoder combines the two signals to produce a high quality reconstruction of the source. Experimental results show that the hybrid design provides bandwidth savings and enables graceful performance improvement as the channel quality improves. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.07770 [pdf, other]

Multi-scale Promoted Self-adjusting Correlation Learning for Facial Action Unit Detection

Authors: Xin Liu, Kaishen Yuan, Xuesong Niu, Jingang Shi, Zitong Yu, Huanjing Yue, Jingyu Yang

Abstract: Facial Action Unit (AU) detection is a crucial task in affective computing and social robotics as it helps to identify emotions expressed through facial expressions. Anatomically, there are innumerable correlations between AUs, which contain rich information and are vital for AU detection. Previous methods used fixed AU correlations based on expert experience or statistical rules on specific bench… ▽ More Facial Action Unit (AU) detection is a crucial task in affective computing and social robotics as it helps to identify emotions expressed through facial expressions. Anatomically, there are innumerable correlations between AUs, which contain rich information and are vital for AU detection. Previous methods used fixed AU correlations based on expert experience or statistical rules on specific benchmarks, but it is challenging to comprehensively reflect complex correlations between AUs via hand-crafted settings. There are alternative methods that employ a fully connected graph to learn these dependencies exhaustively. However, these approaches can result in a computational explosion and high dependency with a large dataset. To address these challenges, this paper proposes a novel self-adjusting AU-correlation learning (SACL) method with less computation for AU detection. This method adaptively learns and updates AU correlation graphs by efficiently leveraging the characteristics of different levels of AU motion and emotion representation information extracted in different stages of the network. Moreover, this paper explores the role of multi-scale learning in correlation information extraction, and design a simple yet effective multi-scale feature learning (MSFL) method to promote better performance in AU detection. By integrating AU correlation information with multi-scale features, the proposed method obtains a more robust feature representation for the final AU detection. Extensive experiments show that the proposed method outperforms the state-of-the-art methods on widely used AU detection benchmark datasets, with only 28.7\% and 12.0\% of the parameters and FLOPs of the best method, respectively. The code for this method is available at \url{https://github.com/linuxsino/Self-adjusting-AU}. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 13pages, 7 figures

arXiv:2308.02257 [pdf]

Enhancing Cell Proliferation and Migration by MIR-Carbonyl Vibrational Coupling: Insights from Transcriptome Profiling

Authors: Xingkun Niu, Feng Gao, Shaojie Hou, Shihao Liu, Xinmin Zhao, Jun Guo, Liping Wang, Feng Zhang

Abstract: Cell proliferation and migration highly relate to normal tissue self-healing, therefore it is highly significant for artificial controlling. Recently, vibrational strong coupling between biomolecules and Mid-infrared (MIR) light photons has been successfully used to modify in vitro bioreactions, neuronal signaling and even animal behavior. However, the synergistic effects from molecules to cells r… ▽ More Cell proliferation and migration highly relate to normal tissue self-healing, therefore it is highly significant for artificial controlling. Recently, vibrational strong coupling between biomolecules and Mid-infrared (MIR) light photons has been successfully used to modify in vitro bioreactions, neuronal signaling and even animal behavior. However, the synergistic effects from molecules to cells remains unclear, and the regulation of MIR on cells needs to be explained from the molecular level. Herein, the proliferation rate and migration capacity of fibroblasts were increased by 156% and 162.5%, respectively, by vibratory coupling of 5.6 micrometers photons with carbonyl groups in biomolecules. Through transcriptome sequencing analysis, the regulatory mechanism of infrared light in 5.6 micrometers was explained from the level of signal pathway and cell components. 5.6 micrometers optical high power lasers can regulate cell function through vibrational strong coupling while minimizing photothermal damage. This work not only sheds light on the non-thermal effect on MIR light-based on wound healing, but also provides new evidence to future frequency medicine. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 20 pages, 5 figures

arXiv:2308.00183 [pdf, other]

Hovering Control of Flapping Wings in Tandem with Multi-Rotors

Authors: Aniket Dhole, Bibek Gupta, Adarsh Salagame, Xuejian Niu, Yizhe Xu, Kaushik Venkatesh, Paul Ghanem, Ioannis Mandralis, Eric Sihite, Alireza Ramezani

Abstract: This work briefly covers our efforts to stabilize the flight dynamics of Northeastern's tailless bat-inspired micro aerial vehicle, Aerobat. Flapping robots are not new. A plethora of examples is mainly dominated by insect-style design paradigms that are passively stable. However, Aerobat, in addition for being tailless, possesses morphing wings that add to the inherent complexity of flight contro… ▽ More This work briefly covers our efforts to stabilize the flight dynamics of Northeastern's tailless bat-inspired micro aerial vehicle, Aerobat. Flapping robots are not new. A plethora of examples is mainly dominated by insect-style design paradigms that are passively stable. However, Aerobat, in addition for being tailless, possesses morphing wings that add to the inherent complexity of flight control. The robot can dynamically adjust its wing platform configurations during gait cycles, increasing its efficiency and agility. We employ a guard design with manifold small thrusters to stabilize Aerobat's position and orientation in hovering, a flapping system in tandem with a multi-rotor. For flight control purposes, we take an approach based on assuming the guard cannot observe Aerobat's states. Then, we propose an observer to estimate the unknown states of the guard which are then used for closed-loop hovering control of the Guard-Aerobat platform. △ Less

Submitted 31 July, 2023; originally announced August 2023.

arXiv:2307.11196 [pdf, other]

Exact Community Recovery in the Geometric SBM

Authors: Julia Gaudio, Xiaochun Niu, Ermin Wei

Abstract: We study the problem of exact community recovery in the Geometric Stochastic Block Model (GSBM), where each vertex has an unknown community label as well as a known position, generated according to a Poisson point process in $\mathbb{R}^d$. Edges are formed independently conditioned on the community labels and positions, where vertices may only be connected by an edge if they are within a prescrib… ▽ More We study the problem of exact community recovery in the Geometric Stochastic Block Model (GSBM), where each vertex has an unknown community label as well as a known position, generated according to a Poisson point process in $\mathbb{R}^d$. Edges are formed independently conditioned on the community labels and positions, where vertices may only be connected by an edge if they are within a prescribed distance of each other. The GSBM thus favors the formation of dense local subgraphs, which commonly occur in real-world networks, a property that makes the GSBM qualitatively very different from the standard Stochastic Block Model (SBM). We propose a linear-time algorithm for exact community recovery, which succeeds down to the information-theoretic threshold, confirming a conjecture of Abbe, Baccelli, and Sankararaman. The algorithm involves two phases. The first phase exploits the density of local subgraphs to propagate estimated community labels among sufficiently occupied subregions, and produces an almost-exact vertex labeling. The second phase then refines the initial labels using a Poisson testing procedure. Thus, the GSBM enjoys local to global amplification just as the SBM, with the advantage of admitting an information-theoretically optimal, linear-time algorithm. △ Less

Submitted 5 January, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.10554 [pdf, other]

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

Authors: Peijie Dong, Lujun Li, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan

Abstract: Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between… ▽ More Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV2023

arXiv:2307.06632 [pdf]

FF-LINS: A Consistent Frame-to-Frame Solid-State-LiDAR-Inertial State Estimator

Authors: Hailiang Tang, Tisheng Zhang, Xiaoji Niu, Liqiang Wang, Linfu Wei, Jingnan Liu

Abstract: Most of the existing LiDAR-inertial navigation systems are based on frame-to-map registrations, leading to inconsistency in state estimation. The newest solid-state LiDAR with a non-repetitive scanning pattern makes it possible to achieve a consistent LiDAR-inertial estimator by employing a frame-to-frame data association. In this letter, we propose a robust and consistent frame-to-frame LiDAR-ine… ▽ More Most of the existing LiDAR-inertial navigation systems are based on frame-to-map registrations, leading to inconsistency in state estimation. The newest solid-state LiDAR with a non-repetitive scanning pattern makes it possible to achieve a consistent LiDAR-inertial estimator by employing a frame-to-frame data association. In this letter, we propose a robust and consistent frame-to-frame LiDAR-inertial navigation system (FF-LINS) for solid-state LiDARs. With the INS-centric LiDAR frame processing, the keyframe point-cloud map is built using the accumulated point clouds to construct the frame-to-frame data association. The LiDAR frame-to-frame and the inertial measurement unit (IMU) preintegration measurements are tightly integrated using the factor graph optimization, with online calibration of the LiDAR-IMU extrinsic and time-delay parameters. The experiments on the public and private datasets demonstrate that the proposed FF-LINS achieves superior accuracy and robustness than the state-of-the-art systems. Besides, the LiDAR-IMU extrinsic and time-delay parameters are estimated effectively, and the online calibration notably improves the pose accuracy. The proposed FF-LINS and the employed datasets are open-sourced on GitHub (https://github.com/i2Nav-WHU/FF-LINS). △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.01192 [pdf, other]

doi 10.1103/PhysRevD.108.115023

NANOGrav signal from axion inflation

Authors: Xuce Niu, Moinul Hossain Rahat

Abstract: Several pulsar timing arrays including NANOGrav, EPTA, PPTA, and CPTA have recently reported the observation of a stochastic background of gravitational wave spectrum in the nano-Hz frequencies. An inflationary interpretation of this observation is challenging from various aspects. We report that such a signal can arise from the Chern-Simons coupling in axion inflation, where a pseudoscalar inflat… ▽ More Several pulsar timing arrays including NANOGrav, EPTA, PPTA, and CPTA have recently reported the observation of a stochastic background of gravitational wave spectrum in the nano-Hz frequencies. An inflationary interpretation of this observation is challenging from various aspects. We report that such a signal can arise from the Chern-Simons coupling in axion inflation, where a pseudoscalar inflaton couples to a (massive) $U(1)$ gauge field, leading to efficient production of a transverse gauge mode. Such tachyonic particle production during inflation exponentially enhances the primordial perturbations and leads to a unique parity-violating gravitational wave spectrum, that remains flat near the CMB scales but becomes blue-tilted at smaller scales. We identify the parameter space consistent with various cosmological constraints and show that the resultant gravitational wave signals can provide extra contribution on top of the standard astrophysical contribution from inspiraling supermassive black hole binaries towards explaining the observed excess at NANOGrav. The parity-violating nature of the signal can be probed in future interferometers, distinguishing it from most other new physics signals attempting to explain the NANOGrav result. △ Less

Submitted 15 December, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: 11 pages + references, 6 figures, minor changes to match published version

Journal ref: Phys.Rev.D 108 (2023) 11, 115023

arXiv:2307.00813 [pdf]

Regulating the Hydrophobic Domain in Peptide-Catecholamine Coassembled Nanostructures for Fluorescence Enhancement

Authors: Ruoyang Zhao, Feng Gao, Maoyu Li, Xingkun Niu, Shihao Liu, Xinmin Zhao, Liping Wang, Jun Guo, Feng Zhang

Abstract: Hydrophobic domains provide specific microenvironment for essential functional activities in life. Herein, we studied how the coassembling of peptides with catecholamines regulate the hydrophobic domain-containing nanostructures for fluorescence enhancement. By peptide encoding and coassembling with catecholamines of different hydrophilicities, a series of hierarchical assembling systems were cons… ▽ More Hydrophobic domains provide specific microenvironment for essential functional activities in life. Herein, we studied how the coassembling of peptides with catecholamines regulate the hydrophobic domain-containing nanostructures for fluorescence enhancement. By peptide encoding and coassembling with catecholamines of different hydrophilicities, a series of hierarchical assembling systems were constructed. In combination with molecular dynamics simulation, we experimentally discovered the hydrophobic domain of chromophore microenvironment regulates the fluorescence of coassembled nanostructures. Our results shed light on the rational design of fluorescent bio-coassembled nanoprobes for biomedical applications. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 19 pages, 5 figures

arXiv:2306.03824 [pdf, other]

Understanding Generalization of Federated Learning via Stability: Heterogeneity Matters

Authors: Zhenyu Sun, Xiaochun Niu, Ermin Wei

Abstract: Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications. Good generalization indicates the model can predict unseen data correctly when trained under a limited number of data. Federated learning (FL), which has emerged as a popular distributed learning framework, allows multiple devices or clients to train a shared model without viol… ▽ More Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications. Good generalization indicates the model can predict unseen data correctly when trained under a limited number of data. Federated learning (FL), which has emerged as a popular distributed learning framework, allows multiple devices or clients to train a shared model without violating privacy requirements. While the existing literature has studied extensively the generalization performances of centralized machine learning algorithms, similar analysis in the federated settings is either absent or with very restrictive assumptions on the loss functions. In this paper, we aim to analyze the generalization performances of federated learning by means of algorithmic stability, which measures the change of the output model of an algorithm when perturbing one data point. Three widely-used algorithms are studied, including FedAvg, SCAFFOLD, and FedProx, under convex and non-convex loss functions. Our analysis shows that the generalization performances of models trained by these three algorithms are closely related to the heterogeneity of clients' datasets as well as the convergence behaviors of the algorithms. Particularly, in the i.i.d. setting, our results recover the classical results of stochastic gradient descent (SGD). △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: Submitted to NeurIPS 2023

arXiv:2306.02568 [pdf, other]

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

Authors: Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

Abstract: We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all… ▽ More We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}. △ Less

Submitted 25 June, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted by ICML 2024

arXiv:2305.17131 [pdf, other]

doi 10.18653/v1/2023.acl-short.126

RAMP: Retrieval and Attribute-Marking Enhanced Prompting for Attribute-Controlled Translation

Authors: Gabriele Sarti, Phu Mon Htut, Xing Niu, Benjamin Hsu, Anna Currey, Georgiana Dinu, Maria Nadejde

Abstract: Attribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised… ▽ More Attribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised methods. To address this limitation, we propose Retrieval and Attribute-Marking enhanced Prompting (RAMP), which leverages large multilingual language models to perform ACT in few-shot and zero-shot settings. RAMP improves generation accuracy over the standard prompting approach by (1) incorporating a semantic similarity retrieval component for selecting similar in-context examples, and (2) marking in-context examples with attribute annotations. Our comprehensive experiments show that RAMP is a viable approach in both zero-shot and few-shot settings. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023

Journal ref: Proceedings of ACL (2023) 1476-1490

arXiv:2305.13547 [pdf, other]

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

Authors: Haoqi Zheng, Qihuang Zhong, Liang Ding, Zhiliang Tian, Xin Niu, Dongsheng Li, Dacheng Tao

Abstract: Text classification tasks often encounter few shot scenarios with limited labeled data, and addressing data scarcity is crucial. Data augmentation with mixup has shown to be effective on various text classification tasks. However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulti… ▽ More Text classification tasks often encounter few shot scenarios with limited labeled data, and addressing data scarcity is crucial. Data augmentation with mixup has shown to be effective on various text classification tasks. However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence. In this paper, we propose a self evolution learning (SE) based mixup approach for data augmentation in text classification, which can generate more adaptive and model friendly pesudo samples for the model training. SE focuses on the variation of the model's learning ability. To alleviate the model confidence, we introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up. Through experimental analysis, in addition to improving classification accuracy, we demonstrate that SE also enhances the model's generalize ability. △ Less

Submitted 27 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.12644 [pdf]

PO-VINS: An Efficient Pose-Only LiDAR-Enhanced Visual-Inertial State Estimator

Authors: Hailiang Tang, Xiaoji Niu, Tisheng Zhang, Liqiang Wang, Guan Wang, Jingnan Liu

Abstract: The pose-only (PO) visual representation has been proven to be equivalent to the classical multiple-view geometry, while significantly improving computational efficiency. However, its applicability for real-world navigation in large-scale complex environments has not yet been demonstrated. In this study, we present an efficient pose-only LiDAR-enhanced visual-inertial navigation system (PO-VINS) t… ▽ More The pose-only (PO) visual representation has been proven to be equivalent to the classical multiple-view geometry, while significantly improving computational efficiency. However, its applicability for real-world navigation in large-scale complex environments has not yet been demonstrated. In this study, we present an efficient pose-only LiDAR-enhanced visual-inertial navigation system (PO-VINS) to enhance the real-time performance of the state estimator. In the visual-inertial state estimator (VISE), we propose a pose-only visual-reprojection measurement model that only contains the inertial measurement unit (IMU) pose and extrinsic-parameter states. We further integrated the LiDAR-enhanced method to construct a pose-only LiDAR-depth measurement model. Real-world experiments were conducted in large-scale complex environments, demonstrating that the proposed PO-VISE and LiDAR-enhanced PO-VISE reduce computational complexity by more than 50% and over 20%, respectively. Additionally, the PO-VINS yields the same accuracy as conventional methods. These results indicate that the pose-only solution is efficient and applicable for real-time visual-inertial state estimation. △ Less

Submitted 21 May, 2023; originally announced May 2023.

arXiv:2305.11808 [pdf, other]

Pseudo-Label Training and Model Inertia in Neural Machine Translation

Authors: Benjamin Hsu, Anna Currey, Xing Niu, Maria Nădejde, Georgiana Dinu

Abstract: Like many other machine learning applications, neural machine translation (NMT) benefits from over-parameterized deep neural models. However, these models have been observed to be brittle: NMT model predictions are sensitive to small input changes and can show significant variation across re-training or incremental model updates. This work studies a frequently used method in NMT, pseudo-label trai… ▽ More Like many other machine learning applications, neural machine translation (NMT) benefits from over-parameterized deep neural models. However, these models have been observed to be brittle: NMT model predictions are sensitive to small input changes and can show significant variation across re-training or incremental model updates. This work studies a frequently used method in NMT, pseudo-label training (PLT), which is common to the related techniques of forward-translation (or self-training) and sequence-level knowledge distillation. While the effect of PLT on quality is well-documented, we highlight a lesser-known effect: PLT can enhance a model's stability to model updates and input perturbations, a set of properties we call model inertia. We study inertia effects under different training settings and we identify distribution simplification as a mechanism behind the observed results. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: accepted ICLR 2023

arXiv:2305.10029 [pdf, other]

TextSLAM: Visual SLAM with Semantic Planar Text Features

Authors: Boying Li, Danping Zou, Yuan Huang, Xinghan Niu, Ling Pei, Wenxian Yu

Abstract: We propose a novel visual SLAM method that integrates text objects tightly by treating them as semantic features via fully exploring their geometric and semantic prior. The text object is modeled as a texture-rich planar patch whose semantic meaning is extracted and updated on the fly for better data association. With the full exploration of locally planar characteristics and semantic meaning of t… ▽ More We propose a novel visual SLAM method that integrates text objects tightly by treating them as semantic features via fully exploring their geometric and semantic prior. The text object is modeled as a texture-rich planar patch whose semantic meaning is extracted and updated on the fly for better data association. With the full exploration of locally planar characteristics and semantic meaning of text objects, the SLAM system becomes more accurate and robust even under challenging conditions such as image blurring, large viewpoint changes, and significant illumination variations (day and night). We tested our method in various scenes with the ground truth data. The results show that integrating texture features leads to a more superior SLAM system that can match images across day and night. The reconstructed semantic 3D text map could be useful for navigation and scene understanding in robotic and mixed reality applications. Our project page: https://github.com/SJTU-ViSYS/TextSLAM . △ Less

Submitted 3 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: 19 pages, 23 figures. Whole project page: https://leeby68.github.io/TextSLAM/

arXiv:2305.09318 [pdf, other]

Conditional Rate-Distortion-Perception Trade-Off

Authors: Xueyan Niu, Deniz Gündüz, Bo Bai, Wei Han

Abstract: Recent advances in machine learning-aided lossy compression are incorporating perceptual fidelity into the rate-distortion theory. In this paper, we study the rate-distortion-perception trade-off when the perceptual quality is measured by the total variation distance between the empirical and product distributions of the discrete memoryless source and its reconstruction. We consider the general se… ▽ More Recent advances in machine learning-aided lossy compression are incorporating perceptual fidelity into the rate-distortion theory. In this paper, we study the rate-distortion-perception trade-off when the perceptual quality is measured by the total variation distance between the empirical and product distributions of the discrete memoryless source and its reconstruction. We consider the general setting, where two types of resources are available at both the encoder and decoder: a common side information sequence, correlated with the source sequence, and common randomness. We consider both the strong perceptual constraint and the weaker empirical perceptual constraint. The required communication rate for achieving the distortion and empirical perceptual constraint is the minimum conditional mutual information, and similar result holds for strong perceptual constraint when sufficient common randomness is provided and the output along with the side information is constraint to an independent and identically distributed sequence. △ Less

Submitted 22 May, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.04381 [pdf, other]

Estimating and Correcting Degree Ratio Bias in the Network Scale-up Method

Authors: Ian Laga, Jessica P. Kunke, Tyler H. McCormick, Xiaoyue Niu

Abstract: The Network Scale-up Method (NSUM) uses social networks and answers to "How many X's do you know?" questions to estimate sizes of groups excluded by standard surveys. This paper addresses the bias caused by varying average social network sizes across populations, commonly referred to as the degree ratio bias. This bias is especially important for marginalized populations like sex workers and drug… ▽ More The Network Scale-up Method (NSUM) uses social networks and answers to "How many X's do you know?" questions to estimate sizes of groups excluded by standard surveys. This paper addresses the bias caused by varying average social network sizes across populations, commonly referred to as the degree ratio bias. This bias is especially important for marginalized populations like sex workers and drug users, where members tend to have smaller social networks than the average person. We show how the degree ratio affects size estimates and provide a method to estimate degree ratios without collecting additional data. We demonstrate that our adjustment procedure improves the accuracy of NSUM size estimates using simulations and data from two data sources. △ Less

Submitted 25 March, 2024; v1 submitted 7 May, 2023; originally announced May 2023.

arXiv:2304.14611 [pdf, other]

Computation of Rate-Distortion-Perception Functions With Wasserstein Barycenter

Authors: Chunhui Chen, Xueyan Niu, Wenhao Ye, Shitong Wu, Bo Bai, Weichao Chen, Sian-Jheng Lin

Abstract: The nascent field of Rate-Distortion-Perception (RDP) theory is seeing a surge of research interest due to the application of machine learning techniques in the area of lossy compression. The information RDP function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. However, computin… ▽ More The nascent field of Rate-Distortion-Perception (RDP) theory is seeing a surge of research interest due to the application of machine learning techniques in the area of lossy compression. The information RDP function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. However, computing RDP functions has been a challenge due to the introduction of the perceptual constraint, and existing research often resorts to data-driven methods. In this paper, we show that the information RDP function can be transformed into a Wasserstein Barycenter problem. The nonstrictly convexity brought by the perceptual constraint can be regularized by an entropy regularization term. We prove that the entropy regularized model converges to the original problem. Furthermore, we propose an alternating iteration method based on the Sinkhorn algorithm to numerically solve the regularized optimization problem. Experimental results demonstrate the efficiency and accuracy of the proposed algorithm. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.10254 [pdf, other]

Image-text Retrieval via Preserving Main Semantics of Vision

Authors: Xu Zhang, Xinzheng Niu, Philippe Fournier-Viger, Xudong Dai

Abstract: Image-text retrieval is one of the major tasks of cross-modal retrieval. Several approaches for this task map images and texts into a common space to create correspondences between the two modalities. However, due to the content (semantics) richness of an image, redundant secondary information in an image may cause false matches. To address this issue, this paper presents a semantic optimization a… ▽ More Image-text retrieval is one of the major tasks of cross-modal retrieval. Several approaches for this task map images and texts into a common space to create correspondences between the two modalities. However, due to the content (semantics) richness of an image, redundant secondary information in an image may cause false matches. To address this issue, this paper presents a semantic optimization approach, implemented as a Visual Semantic Loss (VSL), to assist the model in focusing on an image's main content. This approach is inspired by how people typically annotate the content of an image by describing its main content. Thus, we leverage the annotated texts corresponding to an image to assist the model in capturing the main content of the image, reducing the negative impact of secondary content. Extensive experiments on two benchmark datasets (MSCOCO and Flickr30K) demonstrate the superior performance of our method. The code is available at: https://github.com/ZhangXu0963/VSL. △ Less

Submitted 28 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: 6 pages, 3 figures, accepted by ICME2023

arXiv:2304.05604 [pdf, ps, other]

doi 10.1016/j.ijplas.2023.103700

A Continuum Model for Dislocation Climb

Authors: Chutian Huang, Shuyang Dai, Xiaohua Niu, Tianpeng Jiang, Zhijian Yang, Yejun Gu, Yang Xiang

Abstract: Dislocation climb plays an important role in understanding plastic deformation of metallic materials at high temperature. In this paper, we present a continuum formulation for dislocation climb velocity based on densities of dislocations. The obtained continuum formulation is an accurate approximation of the Green's function based discrete dislocation dynamics method (Gu et al. J. Mech. Phys. Soli… ▽ More Dislocation climb plays an important role in understanding plastic deformation of metallic materials at high temperature. In this paper, we present a continuum formulation for dislocation climb velocity based on densities of dislocations. The obtained continuum formulation is an accurate approximation of the Green's function based discrete dislocation dynamics method (Gu et al. J. Mech. Phys. Solids 83:319-337, 2015). The continuum dislocation climb formulation has the advantage of accounting for both the long-range effect of vacancy bulk diffusion and that of the Peach-Koehler climb force, and the two longrange effects are canceled into a short-range effect (integral with fast-decaying kernel) and in some special cases, a completely local effect. This significantly simplifies the calculation in the Green's function based discrete dislocation dynamics method, in which a linear system has to be solved over the entire system for the long-range effect of vacancy diffusion and the long-range Peach-Koehler climb force has to be calculated. This obtained continuum dislocation climb velocity can be applied in any available continuum dislocation dynamics frameworks. We also present numerical validations for this continuum climb velocity and simulation examples for implementation in continuum dislocation dynamics frameworks. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.00137 [pdf, ps, other]

doi 10.1103/PhysRevD.109.L121101

Measurement of the cosmic p+He energy spectrum from 50 GeV to 0.5 PeV with the DAMPE space mission

Authors: DAMPE Collaboration, F. Alemanno, C. Altomare, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De Benedittis, I. De Mitri, F. de Palma, M. Deliyergiyev , et al. (130 additional authors not shown)

Abstract: Recent observations of the light component of the cosmic-ray spectrum have revealed unexpected features that motivate further and more precise measurements up to the highest energies. The Dark Matter Particle Explorer is a satellite-based cosmic-ray experiment that has been operational since December 2015, continuously collecting data on high-energy cosmic particles with very good statistics, ener… ▽ More Recent observations of the light component of the cosmic-ray spectrum have revealed unexpected features that motivate further and more precise measurements up to the highest energies. The Dark Matter Particle Explorer is a satellite-based cosmic-ray experiment that has been operational since December 2015, continuously collecting data on high-energy cosmic particles with very good statistics, energy resolution, and particle identification capabilities. In this work, the latest measurements of the energy spectrum of proton+helium in the energy range from 46 GeV to 464 TeV are presented. Among the most distinctive features of the spectrum, a spectral hardening at 600 GeV has been observed, along with a softening at 29 TeV measured with a 6.6σ significance. Moreover, the detector features and the analysis approach allowed for the extension of the spectral measurement up to the sub-PeV region. Even if with small statistical significance due to the low number of events, data suggest a new spectral hardening at about 150 TeV. △ Less

Submitted 14 August, 2024; v1 submitted 31 March, 2023; originally announced April 2023.

Comments: Published on PRD

arXiv:2303.07490 [pdf, other]

Comparing the Robustness of Simple Network Scale-Up Method (NSUM) Estimators

Authors: Jessica P. Kunke, Ian Laga, Xiaoyue Niu, Tyler H. McCormick

Abstract: The network scale-up method (NSUM) is a cost-effective approach to estimating the size or prevalence of a group of people that is hard to reach through a standard survey. The basic NSUM involves two steps: estimating respondents' degrees by one of various methods (in this paper we focus on the probe group method which uses the number of people a respondent knows in various groups of known size), a… ▽ More The network scale-up method (NSUM) is a cost-effective approach to estimating the size or prevalence of a group of people that is hard to reach through a standard survey. The basic NSUM involves two steps: estimating respondents' degrees by one of various methods (in this paper we focus on the probe group method which uses the number of people a respondent knows in various groups of known size), and estimating the prevalence of the hard-to-reach population of interest using respondents' estimated degrees and the number of people they report knowing in the hard-to-reach group. Each of these two steps involves taking either an average of ratios or a ratio of averages. Using the ratio of averages for each step has so far been the most common approach. However, we present theoretical arguments that using the average of ratios at the second, prevalence-estimation step often has lower mean squared error when the random mixing assumption is violated, which seems likely in practice; this estimator which uses the ratio of averages for degree estimates and the average of ratios for prevalence was proposed early in NSUM development but has largely been unexplored and unused. Simulation results using an example network data set also support these findings. Based on this theoretical and empirical evidence, we suggest that future surveys that use a simple estimator may want to use this mixed estimator, and estimation methods based on this estimator may produce new improvements. △ Less

Submitted 17 January, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Main paper 29 pages, 3 figures, 2 tables; supplement 14 pages, 5 figures

arXiv:2303.05955 [pdf, other]

Neuron Structure Modeling for Generalizable Remote Physiological Measurement

Authors: Hao Lu, Zitong Yu, Xuesong Niu, Yingcong Chen

Abstract: Remote photoplethysmography (rPPG) technology has drawn increasing attention in recent years. It can extract Blood Volume Pulse (BVP) from facial videos, making many applications like health monitoring and emotional analysis more accessible. However, as the BVP signal is easily affected by environmental changes, existing methods struggle to generalize well for unseen domains. In this paper, we sys… ▽ More Remote photoplethysmography (rPPG) technology has drawn increasing attention in recent years. It can extract Blood Volume Pulse (BVP) from facial videos, making many applications like health monitoring and emotional analysis more accessible. However, as the BVP signal is easily affected by environmental changes, existing methods struggle to generalize well for unseen domains. In this paper, we systematically address the domain shift problem in the rPPG measurement task. We show that most domain generalization methods do not work well in this problem, as domain labels are ambiguous in complicated environmental changes. In light of this, we propose a domain-label-free approach called NEuron STructure modeling (NEST). NEST improves the generalization capacity by maximizing the coverage of feature space during training, which reduces the chance for under-optimized feature activation during inference. Besides, NEST can also enrich and enhance domain invariant features across multi-domain. We create and benchmark a large-scale domain generalization protocol for the rPPG measurement task. Extensive experiments show that our approach outperforms the state-of-the-art methods on both cross-dataset and intra-dataset settings. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR2023

arXiv:2301.10038 [pdf, other]

Progressive Meta-Pooling Learning for Lightweight Image Classification Model

Authors: Peijie Dong, Xin Niu, Zhiliang Tian, Lujun Li, Xiaodong Wang, Zimian Wei, Hengyue Pan, Dongsheng Li

Abstract: Practical networks for edge devices adopt shallow depth and small convolutional kernels to save memory and computational cost, which leads to a restricted receptive field. Conventional efficient learning methods focus on lightweight convolution designs, ignoring the role of the receptive field in neural network design. In this paper, we propose the Meta-Pooling framework to make the receptive fiel… ▽ More Practical networks for edge devices adopt shallow depth and small convolutional kernels to save memory and computational cost, which leads to a restricted receptive field. Conventional efficient learning methods focus on lightweight convolution designs, ignoring the role of the receptive field in neural network design. In this paper, we propose the Meta-Pooling framework to make the receptive field learnable for a lightweight network, which consists of parameterized pooling-based operations. Specifically, we introduce a parameterized spatial enhancer, which is composed of pooling operations to provide versatile receptive fields for each layer of a lightweight model. Then, we present a Progressive Meta-Pooling Learning (PMPL) strategy for the parameterized spatial enhancer to acquire a suitable receptive field size. The results on the ImageNet dataset demonstrate that MobileNetV2 using Meta-Pooling achieves top1 accuracy of 74.6\%, which outperforms MobileNetV2 by 2.3\%. △ Less

Submitted 24 January, 2023; originally announced January 2023.

Comments: 5 pages, 2 figures, ICASSP23

arXiv:2301.09850 [pdf, other]

RD-NAS: Enhancing One-shot Supernet Ranking Ability via Ranking Distillation from Zero-cost Proxies

Authors: Peijie Dong, Xin Niu, Lujun Li, Zhiliang Tian, Xiaodong Wang, Zimian Wei, Hengyue Pan, Dongsheng Li

Abstract: Neural architecture search (NAS) has made tremendous progress in the automatic design of effective neural network structures but suffers from a heavy computational burden. One-shot NAS significantly alleviates the burden through weight sharing and improves computational efficiency. Zero-shot NAS further reduces the cost by predicting the performance of the network from its initial state, which con… ▽ More Neural architecture search (NAS) has made tremendous progress in the automatic design of effective neural network structures but suffers from a heavy computational burden. One-shot NAS significantly alleviates the burden through weight sharing and improves computational efficiency. Zero-shot NAS further reduces the cost by predicting the performance of the network from its initial state, which conducts no training. Both methods aim to distinguish between "good" and "bad" architectures, i.e., ranking consistency of predicted and true performance. In this paper, we propose Ranking Distillation one-shot NAS (RD-NAS) to enhance ranking consistency, which utilizes zero-cost proxies as the cheap teacher and adopts the margin ranking loss to distill the ranking knowledge. Specifically, we propose a margin subnet sampler to distill the ranking knowledge from zero-shot NAS to one-shot NAS by introducing Group distance as margin. Our evaluation of the NAS-Bench-201 and ResNet-based search space demonstrates that RD-NAS achieve 10.7\% and 9.65\% improvements in ranking ability, respectively. Our codes are available at https://github.com/pprp/CVPR2022-NAS-competition-Track1-3th-solution △ Less

Submitted 24 January, 2023; originally announced January 2023.

Comments: 6 pages, 2 figures, 4 tables, ICASSP 2023

arXiv:2301.06823 [pdf, other]

A phase field model for the motion of prismatic dislocation loops by both climb and self-climb

Authors: Xiaohua Niu, Xiaodong Yan

Abstract: We study the sharp interface limit and well-posedness of a phase field model for self-climb of prismatic dislocation loops in periodic settings. The model is set up in a Cahn-Hilliard/Allen-Cahn framework featured with degenerate phase-dependent diffusion mobility with an additional stablizing function. Moreover, a nonlocal climb force is added to the chemical potential. We introduce a notion of w… ▽ More We study the sharp interface limit and well-posedness of a phase field model for self-climb of prismatic dislocation loops in periodic settings. The model is set up in a Cahn-Hilliard/Allen-Cahn framework featured with degenerate phase-dependent diffusion mobility with an additional stablizing function. Moreover, a nonlocal climb force is added to the chemical potential. We introduce a notion of weak solutions for the nonlinear model. The existence result is obtained by approximations of the proposed model with nondegenerate mobilities. Lastly, the numerical simulations are performed to validate the phase field model and the simulation results show the big difference for the prismatic dislocation loops in the evolution time and the pattern with and without self-climb contribution. △ Less

Submitted 7 February, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: 36 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2202.13492

arXiv:2212.13766

OVO: One-shot Vision Transformer Search with Online distillation

Authors: Zimian Wei, Hengyue Pan, Xin Niu, Dongsheng Li

Abstract: Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer searc… ▽ More Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer search framework with Online distillation, namely OVO. OVO samples sub-nets for both teacher and student networks for better distillation results. Benefiting from the online distillation, thousands of subnets in the supernet are well-trained without extra finetuning or retraining. In experiments, OVO-Ti achieves 73.32% top-1 accuracy on ImageNet and 75.2% on CIFAR-100, respectively. △ Less

Submitted 24 November, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: The work is not implemented

arXiv:2212.13337 [pdf]

Comprehensive evaluations of a prototype full field-of-view photon counting CT system through phantom studies

Authors: Xiaohui Zhan, Ruoqiao Zhang, Xiaofeng Niu, Ilmar Hein, Brent Budden, Shuoxing Wu, Nicolay Markov, Cameron Clarke, Yi Qiang, Hiroki Taguchi, Keiichi Nomura, Yoshihisa Muramatsu, Zhou Yu, Tatsushi Kobayashi, Richard Thompson, Hiroaki Miyazaki, Hiroaki Nakai

Abstract: Photon counting CT (PCCT) has been a research focus in the last two decades. Recent studies and advancements have demonstrated that systems using semiconductor-based photon counting detectors (PCDs) have the potential to provide better contrast, noise and spatial resolution performance compared to conventional scintillator-based systems. With multi-energy threshold detection, PCD can simultaneousl… ▽ More Photon counting CT (PCCT) has been a research focus in the last two decades. Recent studies and advancements have demonstrated that systems using semiconductor-based photon counting detectors (PCDs) have the potential to provide better contrast, noise and spatial resolution performance compared to conventional scintillator-based systems. With multi-energy threshold detection, PCD can simultaneously provide the photon energy measurement and enable material decomposition for spectral imaging. In this work, we report a performance evaluation of our first CdZnTe-based prototype full-size photon counting CT system through various phantom imaging studies. This prototype system supports a 500 mm scan field-of-view (FOV) and 10 mm cone coverage at isocenter. Phantom scans were acquired using 120 kVp from 50 to 400 mAs to assess the imaging performance on: CT number accuracy, uniformity, noise, spatial resolution, material differentiation and quantification. Both qualitative and quantitative evaluations show that PCCT has superior imaging performance with lower noise and improved spatial resolution compared to conventional energy integrating detector (EID)-CT. Using projection domain material decomposition approach with multiple energy bin measurements, PCCT virtual monoenergetic images (VMIs) have lower noise, and superior performance in quantifying iodine and calcium concentrations. These improvements lead to increased contrast-to-noise ratio (CNR) for both high and low contrast study objects compared to EID-CT. PCCT can also generate super-high resolution (SHR) images using much smaller detector pixel size than EID-CT and dramatically improve image spatial resolution. △ Less

Submitted 10 April, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

arXiv:2212.02638 [pdf, other]

DISH: A Distributed Hybrid Optimization Method Leveraging System Heterogeneity

Authors: Xiaochun Niu, Ermin Wei

Abstract: We study distributed optimization problems over multi-agent networks, including consensus and network flow problems. Existing distributed methods neglect the heterogeneity among agents' computational capabilities, limiting their effectiveness. To address this, we propose DISH, a distributed hybrid method that leverages system heterogeneity. DISH allows agents with higher computational capabilities… ▽ More We study distributed optimization problems over multi-agent networks, including consensus and network flow problems. Existing distributed methods neglect the heterogeneity among agents' computational capabilities, limiting their effectiveness. To address this, we propose DISH, a distributed hybrid method that leverages system heterogeneity. DISH allows agents with higher computational capabilities or lower computational costs to perform local Newton-type updates while others adopt simpler gradient-type updates. Notably, DISH covers existing methods like EXTRA, DIGing, and ESOM-0 as special cases. To analyze DISH's performance with general update directions, we formulate distributed problems as minimax problems and introduce GRAND (gradient-related ascent and descent) and its alternating version, Alt-GRAND, for solving these problems. GRAND generalizes DISH to centralized minimax settings, accommodating various descent ascent update directions, including gradient-type, Newton-type, scaled gradient, and other general directions, within acute angles to the partial gradients. Theoretical analysis establishes global sublinear and linear convergence rates for GRAND and Alt-GRAND in strongly-convex-nonconcave and strongly-convex-PL settings, providing linear rates for DISH. In addition, we derive the local superlinear convergence of Newton-based variations of GRAND in centralized settings. Numerical experiments validate the effectiveness of our methods. △ Less

Submitted 1 August, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.14478 [pdf, other]

DynaVIG: Monocular Vision/INS/GNSS Integrated Navigation and Object Tracking for AGV in Dynamic Scenes

Authors: Ronghe Jin, Yan Wang, Zhi Gao, Xiaoji Niu, Li-Ta Hsu, Jingnan Liu

Abstract: Visual-Inertial Odometry (VIO) usually suffers from drifting over long-time runs, the accuracy is easily affected by dynamic objects. We propose DynaVIG, a navigation and object tracking system based on the integration of Monocular Vision, Inertial Navigation System (INS), and Global Navigation Satellite System (GNSS). Our system aims to provide an accurate global estimation of the navigation stat… ▽ More Visual-Inertial Odometry (VIO) usually suffers from drifting over long-time runs, the accuracy is easily affected by dynamic objects. We propose DynaVIG, a navigation and object tracking system based on the integration of Monocular Vision, Inertial Navigation System (INS), and Global Navigation Satellite System (GNSS). Our system aims to provide an accurate global estimation of the navigation states and object poses for the automated ground vehicle (AGV) in dynamic scenes. Due to the scale ambiguity of the object, a prior height model is proposed to initialize the object pose, and the scale is continuously estimated with the aid of GNSS and INS. To precisely track the object with complex moving, we establish an accurate dynamics model according to its motion state. Then the multi-sensor observations are optimized in a unified framework. Experiments on the KITTI dataset demonstrate that the multisensor fusion can effectively improve the accuracy of navigation and object tracking, compared to state-of-the-art methods. In addition, the proposed system achieves good estimation of the objects that change speed or direction. △ Less

Submitted 25 November, 2022; originally announced November 2022.

arXiv:2211.14331 [pdf, other]

doi 10.1088/1475-7516/2023/02/013

Gravitational Wave Probes of Massive Gauge Bosons at the Cosmological Collider

Authors: Xuce Niu, Moinul Hossain Rahat, Karthik Srinivasan, Wei Xue

Abstract: We extend the reach of the ``cosmological collider'' for massive gauge boson production during inflation from the CMB scales to the interferometer scales. Considering a Chern-Simons coupling between the gauge bosons and the pseudoscalar inflaton, one of the transverse gauge modes is efficiently produced and its inverse decay leaves an imprint in the primordial scalar and tensor perturbations. We s… ▽ More We extend the reach of the ``cosmological collider'' for massive gauge boson production during inflation from the CMB scales to the interferometer scales. Considering a Chern-Simons coupling between the gauge bosons and the pseudoscalar inflaton, one of the transverse gauge modes is efficiently produced and its inverse decay leaves an imprint in the primordial scalar and tensor perturbations. We study the correlation functions of these perturbations and derive the updated constraints on the parameter space from CMB observables. We then extrapolate the tensor power spectrum to smaller scales consistently taking into account the impact of the gauge field on inflationary dynamics. Our results show that the presence of massive gauge fields during inflation can be detected from characteristic gravitational wave signals encompassing the whole range of current and planned interferometers. △ Less

Submitted 13 December, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: 35 pages, 7 figures, references added, minor modification in gravitational wave sensitivity plots

Journal ref: JCAP02(2023)013

arXiv:2211.14324 [pdf, other]

doi 10.1088/1475-7516/2023/05/018

Parity-Odd and Even Trispectrum from Axion Inflation

Authors: Xuce Niu, Moinul Hossain Rahat, Karthik Srinivasan, Wei Xue

Abstract: The four-point correlation function of primordial scalar perturbations has parity-even and parity-odd contributions and the parity-odd signal in cosmological observations is opening a novel window to look for new physics in the inflationary epoch. We study the distinct parity-odd and even prediction from the axion inflation model, in which the inflaton couples to a vector field via a Chern-Simons… ▽ More The four-point correlation function of primordial scalar perturbations has parity-even and parity-odd contributions and the parity-odd signal in cosmological observations is opening a novel window to look for new physics in the inflationary epoch. We study the distinct parity-odd and even prediction from the axion inflation model, in which the inflaton couples to a vector field via a Chern-Simons interaction, and the vector field is considered to be either approximately massless ($m_A \ll $ Hubble scale $H$) or very massive ($m_A \sim H $). The parity-odd signal arises due to one transverse mode of the vector field being predominantly produced during inflation. We adopt the in-in formalism to evaluate the correlation functions. Considering the vector field mode function to be dominated by its real part up to a constant phase, we simplify the formulas for numerical computations. The numerical studies show that the massive and massless vector fields give significant parity-even signals, while the parity-odd contribution is about one to two orders of magnitude smaller. △ Less

Submitted 22 December, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: 22 pages, 10 figures; minor changes, references added

arXiv:2211.04233 [pdf, other]

doi 10.1088/1751-8121/ad330c

Topological extension including quantum jump

Authors: Xiangyu Niu, Junjie Wang

Abstract: Non-Hermitian systems and the Lindblad form master equation have always been regarded as reliable tools in dissipative modeling. Intriguingly, existing literature often obtains an equivalent non-Hermitian Hamiltonian by neglecting the quantum jumping terms in the master equation. However, there lacks investigation into the effects of discarded terms as well as the unified connection between these… ▽ More Non-Hermitian systems and the Lindblad form master equation have always been regarded as reliable tools in dissipative modeling. Intriguingly, existing literature often obtains an equivalent non-Hermitian Hamiltonian by neglecting the quantum jumping terms in the master equation. However, there lacks investigation into the effects of discarded terms as well as the unified connection between these two approaches. In this study, we study the Su-Schrieffer-Heeger model with collective loss and gain from a topological perspective. When the system undergoes no quantum jump events, the corresponding shape matrix exhibits the same topological properties in contrast to the traditional non-Hermitian theory. Conversely, the occurrence of quantum jumps can result in a shift in the positions of the phase transition. Our study provides a qualitative analysis of the impact of quantum jumping terms and reveals their unique role in quantum systems. △ Less

Submitted 26 March, 2024; v1 submitted 8 November, 2022; originally announced November 2022.

Journal ref: J. Phys. A: Math. Theor. 57 (2024) 145302

arXiv:2211.03174 [pdf, other]

Wheel-SLAM: Simultaneous Localization and Terrain Mapping Using One Wheel-mounted IMU

Authors: Yibin Wu, Jian Kuang, Xiaoji Niu, Jens Behley, Lasse Klingbeil, Heiner Kuhlmann

Abstract: A reliable pose estimator robust to environmental disturbances is desirable for mobile robots. To this end, inertial measurement units (IMUs) play an important role because they can perceive the full motion state of the vehicle independently. However, it suffers from accumulative error due to inherent noise and bias instability, especially for low-cost sensors. In our previous studies on Wheel-INS… ▽ More A reliable pose estimator robust to environmental disturbances is desirable for mobile robots. To this end, inertial measurement units (IMUs) play an important role because they can perceive the full motion state of the vehicle independently. However, it suffers from accumulative error due to inherent noise and bias instability, especially for low-cost sensors. In our previous studies on Wheel-INS \cite{niu2021, wu2021}, we proposed to limit the error drift of the pure inertial navigation system (INS) by mounting an IMU to the wheel of the robot to take advantage of rotation modulation. However, Wheel-INS still drifted over a long period of time due to the lack of external correction signals. In this letter, we propose to exploit the environmental perception ability of Wheel-INS to achieve simultaneous localization and mapping (SLAM) with only one IMU. To be specific, we use the road bank angles (mirrored by the robot roll angles estimated by Wheel-INS) as terrain features to enable the loop closure with a Rao-Blackwellized particle filter. The road bank angle is sampled and stored according to the robot position in the grid maps maintained by the particles. The weights of the particles are updated according to the difference between the currently estimated roll sequence and the terrain map. Field experiments suggest the feasibility of the idea to perform SLAM in Wheel-INS using the robot roll angle estimates. In addition, the positioning accuracy is improved significantly (more than 30\%) over Wheel-INS. The source code of our implementation is publicly available (https://github.com/i2Nav-WHU/Wheel-SLAM). △ Less

Submitted 29 November, 2022; v1 submitted 6 November, 2022; originally announced November 2022.

Comments: Accepted to IEEE Robotics and Automation Letters

arXiv:2211.01355 [pdf, other]

MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation

Authors: Anna Currey, Maria Nădejde, Raghavendra Pappagari, Mia Mayer, Stanislas Lauly, Xing Niu, Benjamin Hsu, Georgiana Dinu

Abstract: As generic machine translation (MT) quality has improved, the need for targeted benchmarks that explore fine-grained aspects of quality has increased. In particular, gender accuracy in translation can have implications in terms of output fluency, translation accuracy, and ethics. In this paper, we introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eig… ▽ More As generic machine translation (MT) quality has improved, the need for targeted benchmarks that explore fine-grained aspects of quality has increased. In particular, gender accuracy in translation can have implications in terms of output fluency, translation accuracy, and ethics. In this paper, we introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eight widely-spoken languages. MT-GenEval complements existing benchmarks by providing realistic, gender-balanced, counterfactual data in eight language pairs where the gender of individuals is unambiguous in the input segment, including multi-sentence segments requiring inter-sentential gender agreement. Our data and code is publicly available under a CC BY SA 3.0 license. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted at EMNLP 2022. Data and code: https://github.com/amazon-research/machine-translation-gender-eval

arXiv:2210.12134 [pdf, other]

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

Authors: Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik

Abstract: Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are… ▽ More Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are obtained via an end-to-end ASR model. Modeling directly the subword tokens, compared to modeling of the phonemes and/or full words, has at least two advantages: (i) it provides a unique vocabulary representation, where each token has a semantic meaning, in contrast to the phoneme-level representations, (ii) each subword token has a reusable "sub"-word acoustic pattern (that can be used to construct multiple full words), resulting in a largely reduced vocabulary space than of the full words. To learn the subword representations for the audio-to-intent classification, we extract: (i) acoustic information from an E2E-ASR model, which provides frame-level CTC posterior probabilities for the subword tokens, and (ii) textual information from a pre-trained continuous bag-of-words model capturing the semantic meaning of the subword tokens. The key to our approach is the way it combines acoustic subword-level posteriors with text information using the notion of positional-encoding in order to account for multiple ASR hypotheses simultaneously. We show that our approach provides more robust and richer representations for audio-to-intent classification, and is highly accurate with correctly mitigating 93.3% of unintended user audio from invoking the smart assistant at 99% true positive rate. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2209.07738 [pdf, other]

DMFormer: Closing the Gap Between CNN and Vision Transformers

Authors: Zimian Wei, Hengyue Pan, Lujun Li, Menglong Lu, Xin Niu, Peijie Dong, Dongsheng Li

Abstract: Vision transformers have shown excellent performance in computer vision tasks. As the computation cost of their self-attention mechanism is expensive, recent works tried to replace the self-attention mechanism in vision transformers with convolutional operations, which is more efficient with built-in inductive bias. However, these efforts either ignore multi-level features or lack dynamic prosperi… ▽ More Vision transformers have shown excellent performance in computer vision tasks. As the computation cost of their self-attention mechanism is expensive, recent works tried to replace the self-attention mechanism in vision transformers with convolutional operations, which is more efficient with built-in inductive bias. However, these efforts either ignore multi-level features or lack dynamic prosperity, leading to sub-optimal performance. In this paper, we propose a Dynamic Multi-level Attention mechanism (DMA), which captures different patterns of input images by multiple kernel sizes and enables input-adaptive weights with a gating mechanism. Based on DMA, we present an efficient backbone network named DMFormer. DMFormer adopts the overall architecture of vision transformers, while replacing the self-attention mechanism with our proposed DMA. Extensive experimental results on ImageNet-1K and ADE20K datasets demonstrated that DMFormer achieves state-of-the-art performance, which outperforms similar-sized vision transformers(ViTs) and convolutional neural networks (CNNs). △ Less

Submitted 28 November, 2022; v1 submitted 16 September, 2022; originally announced September 2022.

arXiv:2209.04260 [pdf, other]

doi 10.1103/PhysRevD.106.063026

Search for relativistic fractionally charged particles in space

Authors: DAMPE Collaboration, F. Alemanno, C. Altomare, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De-Benedittis, I. De Mitri, F. de Palma, M. Deliyergiyev, A. Di Giovanni, M. Di Santo , et al. (126 additional authors not shown)

Abstract: More than a century after the performance of the oil drop experiment, the possible existence of fractionally charged particles FCP still remains unsettled. The search for FCPs is crucial for some extensions of the Standard Model in particle physics. Most of the previously conducted searches for FCPs in cosmic rays were based on experiments underground or at high altitudes. However, there have been… ▽ More More than a century after the performance of the oil drop experiment, the possible existence of fractionally charged particles FCP still remains unsettled. The search for FCPs is crucial for some extensions of the Standard Model in particle physics. Most of the previously conducted searches for FCPs in cosmic rays were based on experiments underground or at high altitudes. However, there have been few searches for FCPs in cosmic rays carried out in orbit other than AMS-01 flown by a space shuttle and BESS by a balloon at the top of the atmosphere. In this study, we conduct an FCP search in space based on on-orbit data obtained using the DArk Matter Particle Explorer (DAMPE) satellite over a period of five years. Unlike underground experiments, which require an FCP energy of the order of hundreds of GeV, our FCP search starts at only a few GeV. An upper limit of $6.2\times 10^{-10}~~\mathrm{cm^{-2}sr^{-1} s^{-1}}$ is obtained for the flux. Our results demonstrate that DAMPE exhibits higher sensitivity than experiments of similar types by three orders of magnitude that more stringently restricts the conditions for the existence of FCP in primary cosmic rays. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 19 pages, 6 figures, accepted by PRD

Report number: 106, 063026

Journal ref: Physical Review D 106.6 (2022): 063026

arXiv:2209.00283 [pdf, other]

Conditional graph entropy as an alternating minimization problem

Authors: Viktor Harangi, Xueyan Niu, Bo Bai

Abstract: Conditional graph entropy is known to be the minimal rate for a natural functional compression problem with side information at the receiver. In this paper we show that it can be formulated as an alternating minimization problem, which gives rise to a simple iterative algorithm for numerically computing (conditional) graph entropy. This also leads to a new formula which shows that conditional grap… ▽ More Conditional graph entropy is known to be the minimal rate for a natural functional compression problem with side information at the receiver. In this paper we show that it can be formulated as an alternating minimization problem, which gives rise to a simple iterative algorithm for numerically computing (conditional) graph entropy. This also leads to a new formula which shows that conditional graph entropy is part of a more general framework: the solution of an optimization problem over a convex corner. In the special case of graph entropy (i.e., unconditioned version) this was known due to Csiszár, Körner, Lovász, Marton, and Simonyi. In that case the role of the convex corner was played by the so-called vertex packing polytope. In the conditional version it is a more intricate convex body but the function to minimize is the same. Furthermore, we describe a dual problem that leads to an optimality check and an error bound for the iterative algorithm. △ Less

Submitted 11 September, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

MSC Class: 94A17; 94A29; 05C69

arXiv:2208.14899 [pdf, other]

doi 10.1016/j.ejc.2023.103779

Generalizing Körner's graph entropy to graphons

Authors: Viktor Harangi, Xueyan Niu, Bo Bai

Abstract: Körner introduced the notion of graph entropy in 1973 as the minimal code rate of a natural coding problem where not all pairs of letters can be distinguished in the alphabet. Later it turned out that it can be expressed as the solution of a minimization problem over the so-called vertex-packing polytope. In this paper we generalize this notion to graphons. We show that the analogous minimizatio… ▽ More Körner introduced the notion of graph entropy in 1973 as the minimal code rate of a natural coding problem where not all pairs of letters can be distinguished in the alphabet. Later it turned out that it can be expressed as the solution of a minimization problem over the so-called vertex-packing polytope. In this paper we generalize this notion to graphons. We show that the analogous minimization problem provides an upper bound for graphon entropy. We also give a lower bound in the shape of a maximization problem. The main result of the paper is that for most graphons these two bounds actually coincide and hence precisely determine the entropy in question. Furthermore, graphon entropy has a nice connection to the fractional chromatic number and the fractional clique number. △ Less

Submitted 18 August, 2023; v1 submitted 31 August, 2022; originally announced August 2022.

MSC Class: 94A29; 05C69; 05C80

Journal ref: European Journal of Combinatorics, Volume 114, December 2023, 103779

arXiv:2208.11184 [pdf, other]

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Authors: Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, Jingzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota , et al. (28 additional authors not shown)

Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3… ▽ More This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR. △ Less

Submitted 25 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Camera-ready version

arXiv:2207.05851 [pdf, ps, other]

Sockeye 3: Fast Neural Machine Translation with PyTorch

Authors: Felix Hieber, Michael Denkowski, Tobias Domhan, Barbara Darques Barros, Celina Dong Ye, Xing Niu, Cuong Hoang, Ke Tran, Benjamin Hsu, Maria Nadejde, Surafel Lakew, Prashant Mathur, Anna Currey, Marcello Federico

Abstract: Sockeye 3 is the latest version of the Sockeye toolkit for Neural Machine Translation (NMT). Now based on PyTorch, Sockeye 3 provides faster model implementations and more advanced features with a further streamlined codebase. This enables broader experimentation with faster iteration, efficient training of stronger and faster models, and the flexibility to move new ideas quickly from research to… ▽ More Sockeye 3 is the latest version of the Sockeye toolkit for Neural Machine Translation (NMT). Now based on PyTorch, Sockeye 3 provides faster model implementations and more advanced features with a further streamlined codebase. This enables broader experimentation with faster iteration, efficient training of stronger and faster models, and the flexibility to move new ideas quickly from research to production. When running comparable models, Sockeye 3 is up to 126% faster than other PyTorch implementations on GPUs and up to 292% faster on CPUs. Sockeye 3 is open source software released under the Apache 2.0 license. △ Less

Submitted 2 August, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

arXiv:2206.13329 [pdf, other]

Prior-Guided One-shot Neural Architecture Search

Authors: Peijie Dong, Xin Niu, Lujun Li, Linzhen Xie, Wenbin Zou, Tian Ye, Zimian Wei, Hengyue Pan

Abstract: Neural architecture search methods seek optimal candidates with efficient weight-sharing supernet training. However, recent studies indicate poor ranking consistency about the performance between stand-alone architectures and shared-weight networks. In this paper, we present Prior-Guided One-shot NAS (PGONAS) to strengthen the ranking correlation of supernets. Specifically, we first explore the ef… ▽ More Neural architecture search methods seek optimal candidates with efficient weight-sharing supernet training. However, recent studies indicate poor ranking consistency about the performance between stand-alone architectures and shared-weight networks. In this paper, we present Prior-Guided One-shot NAS (PGONAS) to strengthen the ranking correlation of supernets. Specifically, we first explore the effect of activation functions and propose a balanced sampling strategy based on the Sandwich Rule to alleviate weight coupling in the supernet. Then, FLOPs and Zen-Score are adopted to guide the training of supernet with ranking correlation loss. Our PGONAS ranks 3rd place in the supernet Track Track of CVPR2022 Second lightweight NAS challenge. Code is available in https://github.com/pprp/CVPR2022-NAS?competition-Track1-3th-solution. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Official 3st Place Solution for the Second workshop Neural Architecture Search Second lightweight NAS Challenge 2022 - Track1 Supernet Track. Official leaderboard: https://aistudio.baidu.com/aistudio/competition/detail/149/0/leaderboard CVPR 2022 Workshop: https://cvpr-nas.com/competition

arXiv:2206.03624 [pdf, other]

doi 10.1109/CDC51059.2022.9993156

DISH: A Distributed Hybrid Primal-Dual Optimization Framework to Utilize System Heterogeneity

Authors: Xiaochun Niu, Ermin Wei

Abstract: We consider solving distributed consensus optimization problems over multi-agent networks. Current distributed methods fail to capture the heterogeneity among agents' local computation capacities. We propose DISH as a distributed hybrid primal-dual algorithmic framework to handle and utilize system heterogeneity. Specifically, DISH allows those agents with higher computational capabilities or chea… ▽ More We consider solving distributed consensus optimization problems over multi-agent networks. Current distributed methods fail to capture the heterogeneity among agents' local computation capacities. We propose DISH as a distributed hybrid primal-dual algorithmic framework to handle and utilize system heterogeneity. Specifically, DISH allows those agents with higher computational capabilities or cheaper computational costs to implement Newton-type updates locally, while other agents can adopt the much simpler gradient-type updates. We show that DISH is a general framework and includes EXTRA, DIGing, and ESOM-0 as special cases. Moreover, when all agents take both primal and dual Newton-type updates, DISH approximates Newton's method by estimating both primal and dual Hessians. Theoretically, we show that DISH achieves a linear (Q-linear) convergence rate to the exact optimal solution for strongly convex functions, regardless of agents' choices of gradient-type and Newton-type updates. Finally, we perform numerical studies to demonstrate the efficacy of DISH in practice. To the best of our knowledge, DISH is the first hybrid method allowing heterogeneous local updates for distributed consensus optimization under general network topology with provable convergence and rate guarantees. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2205.08047 [pdf, other]

Perfect Spectral Clustering with Discrete Covariates

Authors: Jonathan Hehir, Xiaoyue Niu, Aleksandra Slavkovic

Abstract: Among community detection methods, spectral clustering enjoys two desirable properties: computational efficiency and theoretical guarantees of consistency. Most studies of spectral clustering consider only the edges of a network as input to the algorithm. Here we consider the problem of performing community detection in the presence of discrete node covariates, where network structure is determine… ▽ More Among community detection methods, spectral clustering enjoys two desirable properties: computational efficiency and theoretical guarantees of consistency. Most studies of spectral clustering consider only the edges of a network as input to the algorithm. Here we consider the problem of performing community detection in the presence of discrete node covariates, where network structure is determined by a combination of a latent block model structure and homophily on the observed covariates. We propose a spectral algorithm that we prove achieves perfect clustering with high probability on a class of large, sparse networks with discrete covariates, effectively separating latent network structure from homophily on observed covariates. To our knowledge, our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering in the setting where edge formation is dependent on both latent and observed factors. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 23 pages, 1 figure

Showing 51–100 of 436 results for author: Niu, X