-
Applications of Moments of Dirichlet Coefficients in Elliptic Curve Families
Authors:
Zoë Batterman,
Aditya Jambhale,
Steven J. Miller,
Akash L. Narayanan,
Kishan Sharma,
Andrew Yang,
Chris Yao
Abstract:
The moments of the coefficients of elliptic curve L-functions are related to numerous arithmetic problems. Rosen and Silverman proved a conjecture of Nagao relating the first moment of one-parameter families satisfying Tate's conjecture to the rank of the corresponding elliptic surface over Q(T); one can also construct families of moderate rank by finding families with large first moments. Michel…
▽ More
The moments of the coefficients of elliptic curve L-functions are related to numerous arithmetic problems. Rosen and Silverman proved a conjecture of Nagao relating the first moment of one-parameter families satisfying Tate's conjecture to the rank of the corresponding elliptic surface over Q(T); one can also construct families of moderate rank by finding families with large first moments. Michel proved that if j(T) is not constant, then the second moment of the family is of size p^2 + O(p^(3/2)); these two moments show that for suitably small support the behavior of zeros near the central point agree with that of eigenvalues from random matrix ensembles, with the higher moments impacting the rate of convergence.
In his thesis, Miller noticed a negative bias in the second moment of every one-parameter family of elliptic curves over the rationals whose second moment had a calculable closed-form expression, specifically the first lower order term which does not average to zero is on average negative. This Bias Conjecture is confirmed for many families; however, these are highly non-generic families whose resulting Legendre sums can be determined. Inspired by the recent successes by Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, Alexey Pozdnyakov and others in investigations of murmurations of elliptic curve coefficients with machine learning techniques, we pose a similar problem for trying to understand the Bias Conjecture. As a start to this program, we numerically investigate the Bias Conjecture for a family whose bias is positive for half the primes. Since the numerics do not offer conclusive evidence that negative bias for the other half is enough to overwhelm the positive bias, the Bias Conjecture cannot be verified for the family.
△ Less
Submitted 17 June, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Anomalous hot electron generation from two-plasmon decay instability driven by broadband laser pulses with intensity modulations
Authors:
C. Yao,
J. Li,
L. Hao,
R. Yan,
C. Wang,
A. Lei,
Y-K. Ding,
J. Zheng
Abstract:
We investigate the hot electrons generated from two-plasmon decay (TPD) instability driven by laser pulses with intensity modulated by a frequency $Δω_m$. Our primary focus lies on scenarios where $Δω_m$ is on the same order of the TPD growth rate $ γ_0$ ( $Δω_m \sim γ_0$), corresponding to moderate laser frequency bandwidths for TPD mitigation. With $Δω_m$ conveniently modeled by a basic two-colo…
▽ More
We investigate the hot electrons generated from two-plasmon decay (TPD) instability driven by laser pulses with intensity modulated by a frequency $Δω_m$. Our primary focus lies on scenarios where $Δω_m$ is on the same order of the TPD growth rate $ γ_0$ ( $Δω_m \sim γ_0$), corresponding to moderate laser frequency bandwidths for TPD mitigation. With $Δω_m$ conveniently modeled by a basic two-color scheme of the laser wave fields in fully-kinetic particle-in-cell simulations, we demonstrate that the energies of TPD modes and hot electrons exhibit intermittent evolution at the frequency $Δω_m$, particularly when $Δω_m \sim γ_0$. With the dynamic TPD behavior, the overall ratio of hot electron energy to the incident laser energy, $f_{hot}$, changes significantly with $Δω_m$. While $f_{hot}$ drops notably with increasing $Δω_m$ at large $Δω_m$ limit as expected, it goes anomalously beyond the hot electron energy ratio for a single-frequency incident laser pulse with the same average intensity when $Δω_m$ falls below a specific threshold frequency $Δω_c$. We find this threshold frequency primarily depends on $γ_0$ and the collisional damping rate of plasma waves, with relatively lower sensitivity to the density scale length. We develop a scaling model characterizing the relation of $Δω_c$ and laser plasma conditions, enabling the potential extention of our findings to more complex and realistic scenarios.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Meta Prompting for AI Systems
Authors:
Yifan Zhang,
Yang Yuan,
Andrew Chi-Chih Yao
Abstract:
In this work, we present a comprehensive study of Meta Prompting (MP), an innovative technique reshaping the utilization of language models (LMs) and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of…
▽ More
In this work, we present a comprehensive study of Meta Prompting (MP), an innovative technique reshaping the utilization of language models (LMs) and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of Meta Prompting, sets it apart from few-shot prompting, and underlines its effectiveness in various AI applications. A key focus is applying Meta Prompting for complex reasoning tasks, showing how it effectively deconstructs intricate problems into simpler sub-problems, enhancing token efficiency, and enabling more equitable problem-solving comparisons, especially against few-shot prompting methods. Additionally, the paper introduces Meta Prompting for prompting tasks, allowing LLMs to self-generate new prompts in a recursive, metaprogramming-like manner. Empirical experiments, including using a Qwen-72B base language model equipped with meta prompt without instruction-tuning to solve MATH problems with accuracy at 46.3%, which surpass the supervised fine-tuned counterpart trained with extensive mathematical QA instruction pairs and even the initial version of GPT-4, solving GSM8K problems with 83.5% accuracy with zero-shot meta-prompted Qwen-72B base language model, and solving the Game of 24 tasks with a 100% success rate using GPT-4, demonstrate the meta prompting's efficacy in achieving high accuracy and efficiency, showcasing Meta Prompting's transformative impact on AI problem-solving The code is available at https://github.com/meta-prompting/meta-prompting.
△ Less
Submitted 15 June, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Probabilistic Constellation Shaping for OFDM-Based ISAC Signaling
Authors:
Zhen Du,
Fan Liu,
Yifeng Xiong,
Tony Xiao Han,
Weijie Yuan,
Yuanhao Cui,
Changhua Yao,
Yonina C. Eldar
Abstract:
Integrated Sensing and Communications (ISAC) has garnered significant attention as a promising technology for the upcoming sixth-generation wireless communication systems (6G). In pursuit of this goal, a common strategy is that a unified waveform, such as Orthogonal Frequency Division Multiplexing (OFDM), should serve dual-functional roles by enabling simultaneous sensing and communications (S&C)…
▽ More
Integrated Sensing and Communications (ISAC) has garnered significant attention as a promising technology for the upcoming sixth-generation wireless communication systems (6G). In pursuit of this goal, a common strategy is that a unified waveform, such as Orthogonal Frequency Division Multiplexing (OFDM), should serve dual-functional roles by enabling simultaneous sensing and communications (S&C) operations. However, the sensing performance of an OFDM communication signal is substantially affected by the randomness of the data symbols mapped from bit streams. Therefore, achieving a balance between preserving communication capability (i.e., the randomness) while improving sensing performance remains a challenging task. To cope with this issue, in this paper we analyze the ambiguity function of the OFDM communication signal modulated by random data. Subsequently, a probabilistic constellation shaping (PCS) method is proposed to devise the probability distributions of constellation points, which is able to strike a scalable S&C tradeoff of the random transmitted signal. Finally, the superiority of the proposed PCS method over conventional uniformly distributed constellations is validated through numerical simulations.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Spatial-Temporal Hypergraph Neural Network for Traffic Forecasting
Authors:
Chengzhi Yao,
Zhi Li,
Junbo Wang
Abstract:
Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the co…
▽ More
Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the complex road network for traffic forecasting shallowly. Despite their effectiveness, these methods are generally limited in fully capturing high-order spatial dependencies caused by road network topology and high-order temporal dependencies caused by traffic dynamics. To tackle the above issues, we focus on the essence of traffic system and propose STHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Network, which combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data. Technically, STHODE consists of a spatial module and a temporal module. On the one hand, we construct a spatial hypergraph and leverage an adaptive MixHop hypergraph ODE network to capture high-order spatial dependencies. On the other hand, we utilize a temporal hypergraph and employ a hyperedge evolving ODE network to capture high-order temporal dependencies. Finally, we aggregate the outputs of stacked STHODE layers to mutually enhance the prediction performance. Extensive experiments conducted on four real-world traffic datasets demonstrate the superior performance of our proposed model compared to various baselines.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond
Authors:
Cong Yao
Abstract:
In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines. Specifically, basic capabilities, including text detection, text recognition, t…
▽ More
In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines. Specifically, basic capabilities, including text detection, text recognition, table structure recognition and layout analysis, are provided. Upon these basic capabilities, we also build a set of fully functional pipelines for document parsing, i.e., general text reading, table parsing, and document structurization, to drive various applications related to documents in real-world scenarios. Moreover, DocXChain is concise, modularized and flexible, such that it can be readily integrated with existing tools, libraries or models (such as LangChain and ChatGPT), to construct more powerful systems that can accomplish more complicated and challenging tasks. The code of DocXChain is publicly available at:~\url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/Applications/DocXChain}
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Self-Pro: A Self-Prompt and Tuning Framework for Graph Neural Networks
Authors:
Chenghua Gong,
Xiang Li,
Jianxiang Yu,
Cheng Yao,
Jiaqi Tan,
Chengcheng Yu
Abstract:
Graphs have become an important modeling tool for web applications, and Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, the performance of traditional GNNs heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-…
▽ More
Graphs have become an important modeling tool for web applications, and Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, the performance of traditional GNNs heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-training strategies vary for graphs with homophily and heterophily, and the objectives for various downstream tasks also differ. This leads to a gap between pretexts and downstream tasks, resulting in ``negative transfer'' and poor performance. Inspired by prompt learning in Natural Language Processing (NLP), many studies turn to bridge the gap and fully leverage the pre-trained model. However, existing methods for graph prompting are tailored to homophily, neglecting inherent heterophily on graphs. Meanwhile, most of them rely on the randomly initialized prompts, which negatively impact on the stability. Therefore, we propose Self-Prompt, a prompting framework for graphs based on the model and data itself. We first introduce asymmetric graph contrastive learning for pretext to address heterophily and align the objectives of pretext and downstream tasks. Then we reuse the component from pre-training phase as the self adapter and introduce self-prompts based on graph itself for task adaptation. Finally, we conduct extensive experiments on 11 benchmark datasets to demonstrate its superiority. We provide our codes at https://github.com/gongchenghua/Self-Pro.
△ Less
Submitted 4 June, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Age Estimation Based on Graph Convolutional Networks and Multi-head Attention Mechanisms
Authors:
Miaomiao Yang,
Changwei Yao,
Shijin Yan
Abstract:
Age estimation technology is a part of facial recognition and has been applied to identity authentication. This technology achieves the development and application of a juvenile anti-addiction system by authenticating users in the game. Convolutional Neural Network (CNN) and Transformer algorithms are widely used in this application scenario. However, these two models cannot flexibly extract and m…
▽ More
Age estimation technology is a part of facial recognition and has been applied to identity authentication. This technology achieves the development and application of a juvenile anti-addiction system by authenticating users in the game. Convolutional Neural Network (CNN) and Transformer algorithms are widely used in this application scenario. However, these two models cannot flexibly extract and model features of faces with irregular shapes, and they are ineffective in capturing key information. Furthermore, the above methods will contain a lot of background information while extracting features, which will interfere with the model. In consequence, it is easy to extract redundant information from images. In this paper, a new modeling idea is proposed to solve this problem, which can flexibly model irregular objects. The Graph Convolutional Network (GCN) is used to extract features from irregular face images effectively, and multi-head attention mechanisms are added to avoid redundant features and capture key region information in the image. This model can effectively improve the accuracy of age estimation and reduce the MAE error value to about 3.64, which is better than the effect of today's age estimation model, to improve the accuracy of face recognition and identity authentication.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
A Trustworthy and Consistent Blockchain Oracle Scheme for Industrial Internet of Things
Authors:
Peng Liu,
Youquan Xian,
Chuanjian Yao,
Peng Wang,
Li-e Wang,
Xianxian Li
Abstract:
Blockchain provides decentralization and trustlessness features for the Industrial Internet of Things (IIoT), which expands the application scenarios of IIoT. To address the problem that the blockchain cannot actively obtain off-chain data, the blockchain oracle is proposed as a bridge between the blockchain and external data. However, the existing oracle schemes are difficult to solve the problem…
▽ More
Blockchain provides decentralization and trustlessness features for the Industrial Internet of Things (IIoT), which expands the application scenarios of IIoT. To address the problem that the blockchain cannot actively obtain off-chain data, the blockchain oracle is proposed as a bridge between the blockchain and external data. However, the existing oracle schemes are difficult to solve the problem of low quality of service caused by frequent data changes and heterogeneous devices in IIoT, and the current oracle node selection schemes are difficult to balance security and quality of service. To tackle these problems, this paper proposes a secure and reliable oracle scheme that can obtain high-quality off-chain data. Specifically, we first design an oracle node selection algorithm based on Verifiable Random Function (VRF) and reputation mechanism to securely select high-quality nodes. Second, we propose a data filtering algorithm based on a sliding window to further improve the consistency of the collected data. We verify the security of the proposed scheme through security analysis. The experimental results show that the proposed scheme can effectively improve the service quality of the oracle.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Femtosecond electron diffraction reveals local disorder and local anharmonicity in thermoelectric SnSe
Authors:
Jingjun Li,
Yingpeng Qi,
Qing Yang,
Luye Yue,
Changyuan Yao,
Zijing Chen,
Sheng Meng,
Dao Xiang,
Jianming Cao
Abstract:
The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characteriz…
▽ More
The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characterizing the 3D atomic configuration of such local disorder and correlating it with the advanced functions remain a big challenge. Time-domain evolution of the local disorder, either static or dynamical, is lost due to the characterization at equilibrium state with conventional probing techniques. With the combination of femtosecond electron diffraction, structure factor calculation and TDDFT-MD simulation, we exclusively identify the static local disorder and the local anharmonicity of it in thermoelectric SnSe. The ultrafast structural dynamics in time domain reveal a dominant static off-symmetry displacement of Sn (~0.4 angstrom) and the anharmonicity of this local disorder induces an ultrafast atomic displacement within 100 fs after photoexcitation. The microscopic picture of the local anharmonicity indicates a direct and first signature of the THz Einstein oscillators in real space. Therefore, a glass-like thermal transport channel with the local disorder, the Einstein oscillators and the local anharmonicity, updates the fundamental insight into the long-debated ultralow thermal conductivity in SnSe. The local disorder over one to a few unit cells is pervasive and indispensable in thermoelectric materials, multiferroic materials and correlated electronic materials. Our method of revealing the 3D local disorder and the local correlated interactions by ultrafast structural dynamics will inspire broad interest in construction of the structure-property relationship in material science.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
IBVC: Interpolation-driven B-frame Video Compression
Authors:
Chenming Xu,
Meiqin Liu,
Chao Yao,
Weisi Lin,
Yao Zhao
Abstract:
Learned B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. However, previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation or video frame interpolation. They suffer from inaccurate quantized motions and inefficient motion compens…
▽ More
Learned B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. However, previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation or video frame interpolation. They suffer from inaccurate quantized motions and inefficient motion compensation. To address these issues, we propose a simple yet effective structure called Interpolation-driven B-frame Video Compression (IBVC). Our approach only involves two major operations: video frame interpolation and artifact reduction compression. IBVC introduces a bit-rate free MEMC based on interpolation, which avoids optical-flow quantization and additional compression distortions. Later, to reduce duplicate bit-rate consumption and focus on unaligned artifacts, a residual guided masking encoder is deployed to adaptively select the meaningful contexts with interpolated multi-scale dependencies. In addition, a conditional spatio-temporal decoder is proposed to eliminate location errors and artifacts instead of using MEMC coding in other methods. The experimental results on B-frame coding demonstrate that IBVC has significant improvements compared to the relevant state-of-the-art methods. Meanwhile, our approach can save bit rates compared with the random access (RA) configuration of H.266 (VTM). The code will be available at https://github.com/ruhig6/IBVC.
△ Less
Submitted 14 March, 2024; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development
Authors:
Runkai Zhao,
Yuwen Heng,
Heng Wang,
Yuanda Gao,
Shilei Liu,
Changhao Yao,
Jiawen Chen,
Weidong Cai
Abstract:
Advanced Driver-Assistance Systems (ADAS) have successfully integrated learning-based techniques into vehicle perception and decision-making. However, their application in 3D lane detection for effective driving environment perception is hindered by the lack of comprehensive LiDAR datasets. The sparse nature of LiDAR point cloud data prevents an efficient manual annotation process. To solve this p…
▽ More
Advanced Driver-Assistance Systems (ADAS) have successfully integrated learning-based techniques into vehicle perception and decision-making. However, their application in 3D lane detection for effective driving environment perception is hindered by the lack of comprehensive LiDAR datasets. The sparse nature of LiDAR point cloud data prevents an efficient manual annotation process. To solve this problem, we present LiSV-3DLane, a large-scale 3D lane dataset that comprises 20k frames of surround-view LiDAR point clouds with enriched semantic annotation. Unlike existing datasets confined to a frontal perspective, LiSV-3DLane provides a full 360-degree spatial panorama around the ego vehicle, capturing complex lane patterns in both urban and highway environments. We leverage the geometric traits of lane lines and the intrinsic spatial attributes of LiDAR data to design a simple yet effective automatic annotation pipeline for generating finer lane labels. To propel future research, we propose a novel LiDAR-based 3D lane detection model, LiLaDet, incorporating the spatial geometry learning of the LiDAR point cloud into Bird's Eye View (BEV) based lane identification. Experimental results indicate that LiLaDet outperforms existing camera- and LiDAR-based approaches in the 3D lane detection task on the K-Lane dataset and our LiSV-3DLane.
△ Less
Submitted 15 March, 2024; v1 submitted 24 September, 2023;
originally announced September 2023.
-
The Reversed Zeckendorf Game
Authors:
Zoë X. Batterman,
Aditya Jambhale,
Steven J. Miller,
Akash L. Narayanan,
Kishan Sharma,
Andrew K. Yang,
Chris Yao
Abstract:
Zeckendorf proved that every natural number $n$ can be expressed uniquely as a sum of non-consecutive Fibonacci numbers, called its Zeckendorf decomposition. Baird-Smith, Epstein, Flint, and Miller created the Zeckendorf game, a two-player game played on partitions of $n$ into Fibonacci numbers which always terminates at a Zeckendorf decomposition, and proved that Player 2 has a winning strategy f…
▽ More
Zeckendorf proved that every natural number $n$ can be expressed uniquely as a sum of non-consecutive Fibonacci numbers, called its Zeckendorf decomposition. Baird-Smith, Epstein, Flint, and Miller created the Zeckendorf game, a two-player game played on partitions of $n$ into Fibonacci numbers which always terminates at a Zeckendorf decomposition, and proved that Player 2 has a winning strategy for $n\geq 3$. Since their proof was non-constructive, other authors have studied the game to find a constructive winning strategy, and lacking success there turned to related problems. For example, Cheigh, Moura, Jeong, Duke, Milgrim, Miller, and Ngamlamai studied minimum and maximum game lengths and randomly played games. We explore a new direction and introduce the reversed Zeckendorf game, which starts at the ending state of the Zeckendorf game and flips all the moves, so the reversed game ends with all pieces in the first bin. We show that Player 1 has a winning strategy for $n = F_{i+1} + F_{i-2}$ and solve various modified games.
△ Less
Submitted 4 October, 2023; v1 submitted 22 September, 2023;
originally announced September 2023.
-
Vision Grid Transformer for Document Layout Analysis
Authors:
Cheng Da,
Chuwei Luo,
Qi Zheng,
Cong Yao
Abstract:
Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI. However, for the document layout analysis (DLA) task, existing document pre-trained models, even those pre-trained in a multi-modal fashion, usually rely on either textual features or visual features. Grid-based models for DLA are multi-modality but largely neglect the effect of pre-…
▽ More
Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI. However, for the document layout analysis (DLA) task, existing document pre-trained models, even those pre-trained in a multi-modal fashion, usually rely on either textual features or visual features. Grid-based models for DLA are multi-modality but largely neglect the effect of pre-training. To fully leverage multi-modal information and exploit pre-training techniques to learn better representation for DLA, in this paper, we present VGT, a two-stream Vision Grid Transformer, in which Grid Transformer (GiT) is proposed and pre-trained for 2D token-level and segment-level semantic understanding. Furthermore, a new dataset named D$^4$LA, which is so far the most diverse and detailed manually-annotated benchmark for document layout analysis, is curated and released. Experiment results have illustrated that the proposed VGT model achieves new state-of-the-art results on DLA tasks, e.g. PubLayNet ($95.7\%$$\rightarrow$$96.2\%$), DocBank ($79.6\%$$\rightarrow$$84.1\%$), and D$^4$LA ($67.7\%$$\rightarrow$$68.8\%$). The code and models as well as the D$^4$LA dataset will be made publicly available ~\url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery}.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Authors:
Changxu Cheng,
Peng Wang,
Cheng Da,
Qi Zheng,
Cong Yao
Abstract:
The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work well on short or seen-length text, lacking the capability of recognizing longer text or performing length extrapolation. This is a crucial issue, since the lengths of the text to be recognized are usually not g…
▽ More
The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work well on short or seen-length text, lacking the capability of recognizing longer text or performing length extrapolation. This is a crucial issue, since the lengths of the text to be recognized are usually not given in advance in real-world applications, but it has not been adequately investigated in previous works. Therefore, we propose in this paper a method called Length-Insensitive Scene TExt Recognizer (LISTER), which remedies the limitation regarding the robustness to various text lengths. Specifically, a Neighbor Decoder is proposed to obtain accurate character attention maps with the assistance of a novel neighbor matrix regardless of the text lengths. Besides, a Feature Enhancement Module is devised to model the long-range dependency with low computation cost, which is able to perform iterations with the neighbor decoder to enhance the feature map progressively. To the best of our knowledge, we are the first to achieve effective length-insensitive scene text recognition. Extensive experiments demonstrate that the proposed LISTER algorithm exhibits obvious superiority on long text recognition and the ability for length extrapolation, while comparing favourably with the previous state-of-the-art methods on standard benchmarks for STR (mainly short text).
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
The dissolving limit and large volume limit of Einstein-Bogomol'nyi metrics
Authors:
Chengjian Yao
Abstract:
We study the limits of Einstein-Bogomol'nyi metrics on $\mathbf{P}^1$, which is the solution to a dimensional reduction of Einstein-Maxwell-Higgs system in dimension four, in two regimes. In one regime called the "dissolving limit" where the volume of the metrics is approaching the admissible lower bound, it exhibits a pattern that all the vortices are dissolving similar to the Bradlow limit in th…
▽ More
We study the limits of Einstein-Bogomol'nyi metrics on $\mathbf{P}^1$, which is the solution to a dimensional reduction of Einstein-Maxwell-Higgs system in dimension four, in two regimes. In one regime called the "dissolving limit" where the volume of the metrics is approaching the admissible lower bound, it exhibits a pattern that all the vortices are dissolving similar to the Bradlow limit in the study of vortices on Riemann surfaces. In another regime called the "large volume limit" where the volume of of the metrics is approaching infinity, the magnetic field is concentrating around the zeros of the Higgs field. In the meantime, the volume-normalized underlying metric is approaching the Euclidean cone metric determined by the Higgs field in the case of stable Higgs field. Moreover, by studying the large volume limit of Yang's solution for a strictly polystable Higgs field, for each natural number $N'$ we recover the Einstein-Bogomol'nyi metrics on $\mathbf{C}$ which is asymptotically cylindrical at exponential rate and with total string number $N'$ firstly discovered by Linet and Yang.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Cumulative Reasoning with Large Language Models
Authors:
Yifan Zhang,
Jingqin Yang,
Yang Yuan,
Andrew Chi-Chih Yao
Abstract:
Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective compositio…
▽ More
Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. Additionally, CR sets new state-of-the-art on the MATH dataset, achieving a 4.2% increase from previous methods and a 43% relative improvement in the most challenging problems. By extending CR to incorporate a code environment without external aids like retrieval or web browsing, we further harness the computational and logical reasoning capabilities of LMs, achieving a remarkable 72.2% accuracy on the MATH dataset and outperforming the PAL/PoT method by 38.8%. Our work not only sets new state-of-the-art but also paves the way toward more sophisticated AI reasoning methods. The code is available at https://github.com/iiis-ai/cumulative-reasoning.
△ Less
Submitted 1 April, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Direct Power Flow Controller with Continuous Full Regulation Range
Authors:
Chong Yao,
Youjun Zhang
Abstract:
For enhancing power flow control in power transmission, a simplified new structure of direct power flow controller with continuous full regulation range (F-DPFC) was proposed. It has only one-stage power conversion and comprises of a three-phase transformer in parallel and a three-phase trans-former in series with grid, three single-phase full-bridge ac units, and a three-phase filter. Compared wi…
▽ More
For enhancing power flow control in power transmission, a simplified new structure of direct power flow controller with continuous full regulation range (F-DPFC) was proposed. It has only one-stage power conversion and comprises of a three-phase transformer in parallel and a three-phase trans-former in series with grid, three single-phase full-bridge ac units, and a three-phase filter. Compared with previous DPFC, the proposed one dispenses with two complex three-phase se-lection switches which connect with high-voltage grid directly, and has a continuous 360° adjustment range of compensation voltage by taking place of buck-type ac unit with full-bridge type ac unit, and then expanding the limit of its duty cycle from [0,1] to [-1,1]. Within a large smooth zone replacing six separate zones, the proposed F-DPFC can regulate the ampli-tude and phase angle of grid node voltage respectively and simultaneously, and then the active and reactive power flow in grid can be controlled smoothly and effectively. The new structure is easy to achieve modular expansion and enables it to operate under high voltage and power conditions. Its struc-ture and operational principle were analyzed in detail, and a prototype was developed. The experimental results verified the feasibility and the correctness of the theoretical analysis.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Influence of cation vacancy concentrations on ultra-low thermal conductivity in $(1-x)$BiVO$_4$-$x$Bi$_{2/3}$MoO$_4$ scheelite solid solutions
Authors:
Guillaume F. Nataf,
Hicham Ait Laasri,
Damien Brault,
Tatiana Chartier,
Chalit Ya,
Fabian Delorme,
Isabelle Monot-Laffez,
Fabien Giovannelli
Abstract:
Bismuth vanadate - bismuth molybdate solid-solution was prepared to elaborate ceramics with different amounts of cation vacancies. Dense ceramics with similar microstructures were obtained and the evolution of their melting point, specific heat, thermal diffusivity, and conductivity as a function of the amount of vacancy was evaluated. At room temperature, the thermal conductivity decreases from 1…
▽ More
Bismuth vanadate - bismuth molybdate solid-solution was prepared to elaborate ceramics with different amounts of cation vacancies. Dense ceramics with similar microstructures were obtained and the evolution of their melting point, specific heat, thermal diffusivity, and conductivity as a function of the amount of vacancy was evaluated. At room temperature, the thermal conductivity decreases from 1.74 W m$^{-1}$ K$^{-1}$ for BiVO$_{4}$ (x=0) to 1.12 W m$^{-1}$ K$^{-1}$ for Bi$_{0.867}$$\square$$_{0.133}$Mo$_{0.4}$V$_{0.6}$O$_{4}$ (x=0.4). Moreover, we show that a very small amount of vacancy (1.7%, x=0.05) is enough to provide a large decrease in thermal conductivity by more than 15%, in agreement with a mass fluctuation scattering model. However, the temperature of the melting point also decreases with increasing amount of vacancy. Our results suggest adding only a very small amount of vacancy as the best strategy to obtain superior materials for thermal barriers and thermoelectric devices, with ultra-low thermal conductivity and high-temperature stability.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition
Authors:
Cheng Da,
Peng Wang,
Cong Yao
Abstract:
Due to the enormous technical challenges and wide range of applications, scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this tough problem, numerous innovative methods have been successively proposed, and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the…
▽ More
Due to the enormous technical challenges and wide range of applications, scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this tough problem, numerous innovative methods have been successively proposed, and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet functionally powerful vision STR model, which is built upon ViT and a tailored Adaptive Addressing and Aggregation (A$^3$) module. It already outperforms most previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, \ie, subword representations (BPE and WordPiece) widely used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. To produce the final recognition results, two strategies for effectively fusing the multi-granularity predictions are devised. The resultant algorithm (termed MGP-STR) is able to push the performance envelope of STR to an even higher level. Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition. Moreover, it also achieves state-of-the-art results on widely-used handwritten benchmarks as well as more challenging scene text datasets, demonstrating the generality of the proposed MGP-STR algorithm. The source code and models will be available at: \url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/MGP-STR}.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Hilbert series for ALP EFTs
Authors:
Christophe Grojean,
Jonathan Kley,
Chang-Yuan Yao
Abstract:
Axions and axion-like particles (ALPs) are ubiquitous in popular attempts to solve supercalifragilisticexpialidocious puzzles of Nature. A widespread and vivid experimental programme spanning a vast range of mass scales and decades of couplings strives to find evidence for these elusive but theoretically well-motivated particles. In the absence of clear guiding principle, effective field theories…
▽ More
Axions and axion-like particles (ALPs) are ubiquitous in popular attempts to solve supercalifragilisticexpialidocious puzzles of Nature. A widespread and vivid experimental programme spanning a vast range of mass scales and decades of couplings strives to find evidence for these elusive but theoretically well-motivated particles. In the absence of clear guiding principle, effective field theories (EFTs) prove to be an efficient tool in this experimental quest. Hilbert series technologies are a privileged instrument of the EFT toolbox to enumerate and classify operators. In this work, we compute explicitly the Hilbert series capturing the interactions of a generic ALP to the Standard Model particles above and below the electroweak symmetry scale, which allow us to build bases of operators up to dimension 8. In particular, we revealed a remarkable structure of the Hilbert series that isolates the shift-symmetry breaking and preserving interactions. In addition, with the Hilbert series method, we enumerate the sources of CP violation in terms of CP-even, CP-odd and CP-violating operators. Furthermore, we provide an ancillary file of the Hilbert series up to dimension 15 to supplement our findings, which can be used for further analysis and exploration.
△ Less
Submitted 30 August, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
FedDCT: A Dynamic Cross-Tier Federated Learning Scheme in Wireless Communication Networks
Authors:
Peng Liu,
Youquan Xian,
Chuanjian Yao,
Xiaoyun Gan,
Lianghaojie Zhou,
Jianyong Jiang,
Dongcheng Li
Abstract:
With the rapid proliferation of Internet of Things (IoT) devices and the growing concern for data privacy among the public, Federated Learning (FL) has gained significant attention as a privacy-preserving machine learning paradigm. FL enables the training of a global model among clients without exposing local data. However, when a federated learning system runs on wireless communication networks,…
▽ More
With the rapid proliferation of Internet of Things (IoT) devices and the growing concern for data privacy among the public, Federated Learning (FL) has gained significant attention as a privacy-preserving machine learning paradigm. FL enables the training of a global model among clients without exposing local data. However, when a federated learning system runs on wireless communication networks, limited wireless resources, heterogeneity of clients, and network transmission failures affect its performance and accuracy. In this study, we propose a novel dynamic cross-tier FL scheme, named FedDCT to increase training accuracy and performance in wireless communication networks. We utilize a tiering algorithm that dynamically divides clients into different tiers according to specific indicators and assigns specific timeout thresholds to each tier to reduce the training time required. To improve the accuracy of the model without increasing the training time, we introduce a cross-tier client selection algorithm that can effectively select the tiers and participants. Simulation experiments show that our scheme can make the model converge faster and achieve a higher accuracy in wireless communication networks.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks
Authors:
Xu Han,
Anmin Liu,
Chenxuan Yao,
Yanbo Fan,
Kun He
Abstract:
Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods gen…
▽ More
Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods generally use the sign function to generate perturbations on the gradient update, that offers a roughly correct direction and has gained great success. But little work pays attention to its possible limitation. In this work, we observe that the deviation between the original gradient and the generated noise may lead to inaccurate gradient update estimation and suboptimal solutions for adversarial transferability. To this end, we propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM). Specifically, we use data rescaling to substitute the sign function without extra computational cost. We further propose a Depth First Sampling method to eliminate the fluctuation of rescaling and stabilize the gradient update. Our method could be used in any gradient-based attacks and is extensible to be integrated with various input transformation or ensemble methods to further improve the adversarial transferability. Extensive experiments on the standard ImageNet dataset show that our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Searching for heavy neutral lepton and lepton number violation through VBS at high-energy muon colliders
Authors:
Tong Li,
Chang-Yuan Yao,
Man Yuan
Abstract:
High-energy muon collider can play as an emitter of electroweak gauge bosons and thus leads to substantial vector boson scattering (VBS) processes. In this work, we investigate the production of heavy neutral lepton (HNL) $N$ and lepton number violation (LNV) signature through VBS at high-energy muon colliders. VBS induces LNV processes…
▽ More
High-energy muon collider can play as an emitter of electroweak gauge bosons and thus leads to substantial vector boson scattering (VBS) processes. In this work, we investigate the production of heavy neutral lepton (HNL) $N$ and lepton number violation (LNV) signature through VBS at high-energy muon colliders. VBS induces LNV processes $W^\pm Z/γ\to \ell^\pm N \to \ell^\pm \ell^\pm W^\mp\to \ell^\pm \ell^\pm q\bar{q}'$ with an on-shell HNL $N$ at $μ^+μ^-$ colliders. In analogy to neutrinoless double-beta decay with the HNL in t-channel, the LNV signature $W^+W^+\to \ell^+\ell^+$ can also happen via VBS at same-sign muon collider. They provide clean and robust LNV signatures to tell the nature of Majorana HNLs and thus have more advantageous benefits than direct $μμ$ annihilation. We analyze the potential of searching for Majorana HNL and obtain the exclusion limits on mixing $V_{\ell N}$. Based on this same-sign lepton signature, we also obtain the sensitivity of muon collider to the Weinberg operator.
△ Less
Submitted 3 September, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Some invariants of $U(1,1;\mathbb{H})$ and diagonalization
Authors:
Cailing Yao,
Bingzhe Hou,
Xiaoqi Feng
Abstract:
Denote by $\mathbb{H}$ the set of all quaternions. We are interested in the group $U(1,1;\mathbb{H})$, which is a subgroup of $2\times 2$ quaternionic matrix group and is sometimes called $Sp(1,1)$. As well known, $U(1,1;\mathbb{H})$ corresponds to the quaternionic Möbius transformations on the unit ball in $\mathbb{H}$. In this article, some similar invariants on $U(1,1;\mathbb{H})$ are discussed…
▽ More
Denote by $\mathbb{H}$ the set of all quaternions. We are interested in the group $U(1,1;\mathbb{H})$, which is a subgroup of $2\times 2$ quaternionic matrix group and is sometimes called $Sp(1,1)$. As well known, $U(1,1;\mathbb{H})$ corresponds to the quaternionic Möbius transformations on the unit ball in $\mathbb{H}$. In this article, some similar invariants on $U(1,1;\mathbb{H})$ are discussed. Our main result shows that each matrix $T\in U(1,1;\mathbb{H})$, which corresponds to an elliptic quaternionic Möbius transformation $g_T(z)$, could be $U(1,1;\mathbb{H})$-similar to a diagonal matrix.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Conditional Text Image Generation with Diffusion Models
Authors:
Yuanzhi Zhu,
Zhaohai Li,
Tianwei Wang,
Mengchao He,
Cong Yao
Abstract:
Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images. In this paper, we explore the problem of text image generation, by taking advantage of the powerful abilities of Diffusion Mo…
▽ More
Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images. In this paper, we explore the problem of text image generation, by taking advantage of the powerful abilities of Diffusion Models in generating photo-realistic and diverse image samples with given conditions, and propose a method called Conditional Text Image Generation with Diffusion Models (CTIG-DM for short). To conform to the characteristics of text images, we devise three conditions: image condition, text condition, and style condition, which can be used to control the attributes, contents, and styles of the samples in the image generation process. Specifically, four text image generation modes, namely: (1) synthesis mode, (2) augmentation mode, (3) recovery mode, and (4) imitation mode, can be derived by combining and configuring these three conditions. Extensive experiments on both handwritten and scene text demonstrate that the proposed CTIG-DM is able to produce image samples that simulate real-world complexity and diversity, and thus can boost the performance of existing text recognizers. Besides, CTIG-DM shows its appealing potential in domain adaptation and generating images containing Out-Of-Vocabulary (OOV) words.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Redesigning spectroscopic sensors with programmable photonic circuits
Authors:
Chunhui Yao,
Kangning Xu,
Wanlu Zhang,
Minjia Chen,
Qixiang Cheng,
Richard Penty
Abstract:
Optical spectroscopic sensors are a powerful tool to reveal light-matter interactions in many fields, such as physics, biology, chemistry, and astronomy. Miniaturizing the currently bulky spectrometers has become imperative for the wide range of applications that demand in situ or even in vitro characterization systems, a field that is growing rapidly. Benchtop spectrometers are capable of offerin…
▽ More
Optical spectroscopic sensors are a powerful tool to reveal light-matter interactions in many fields, such as physics, biology, chemistry, and astronomy. Miniaturizing the currently bulky spectrometers has become imperative for the wide range of applications that demand in situ or even in vitro characterization systems, a field that is growing rapidly. Benchtop spectrometers are capable of offering superior resolution and spectral range, but at the expense of requiring a large size. In this paper, we propose a novel method that redesigns spectroscopic sensors via the use of programmable photonic circuits. Drawing from compressive sensing theory, we start by investigating the most ideal sampling matrix for a reconstructive spectrometer and reveal that a sufficiently large number of sampling channels is a prerequisite for both fine resolution and low reconstruction error. This number is, however, still considerably smaller than that of the reconstructed spectral pixels, benefitting from the nature of reconstruction algorithms. We then show that the cascading of a few engineered MZI elements can be readily programmed to create an exponentially scalable number of such sampling spectral responses over an ultra-broad bandwidth, allowing for ultra-high resolution down to single-digit picometers without incurring additional hardware costs. Experimentally, we implement an on-chip spectrometer with a fully-programmable 6-stage cascaded MZI structure and demonstrate a < 10 pm resolution with a > 200 nm bandwidth using only 729 sampling channels. This achieves a bandwidth-to-resolution ratio of over 20,000, which is, to our best knowledge, about one order of magnitude greater than any reported miniaturized spectrometers to date. We further illustrate that by employing dispersion-engineered waveguide components, the device bandwidth can be extended to over 400 nm.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections
Authors:
Chun-Han Yao,
Amit Raj,
Wei-Chih Hung,
Yuanzhen Li,
Michael Rubinstein,
Ming-Hsuan Yang,
Varun Jampani
Abstract:
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guide…
▽ More
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guided by 2D diffusion priors from Stable Diffusion. First, we enhance the input images with occlusions/truncation via 2D diffusion to obtain cleaner mask estimates and semantic features. Second, we perform diffusion-guided 3D optimization to estimate shape and texture that are of high-fidelity and faithful to input images. We also propose a novel technique to calculate more stable image-level gradients via diffusion models compared to existing alternatives. Finally, we produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations. Extensive evaluations on multiple existing datasets as well as newly introduced noisy web image collections with occlusions and truncation demonstrate that ARTIC3D outputs are more robust to noisy images, higher quality in terms of shape and texture details, and more realistic when animated. Project page: https://chhankyao.github.io/artic3d/
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
A globally convergent difference-of-convex algorithmic framework and application to log-determinant optimization problems
Authors:
Chaorui Yao,
Xin Jiang
Abstract:
The difference-of-convex algorithm (DCA) is a conceptually simple method for the minimization of (possibly) nonconvex functions that are expressed as the difference of two convex functions. At each iteration, DCA constructs a global overestimator of the objective and solves the resulting convex subproblem. Despite its conceptual simplicity, the theoretical understanding and algorithmic framework o…
▽ More
The difference-of-convex algorithm (DCA) is a conceptually simple method for the minimization of (possibly) nonconvex functions that are expressed as the difference of two convex functions. At each iteration, DCA constructs a global overestimator of the objective and solves the resulting convex subproblem. Despite its conceptual simplicity, the theoretical understanding and algorithmic framework of DCA needs further investigation. In this paper, global convergence of DCA at a linear rate is established under an extended Polyak--Łojasiewicz condition. The proposed condition holds for a class of DC programs with a bounded, closed, and convex constraint set, for which global convergence of DCA cannot be covered by existing analyses. Moreover, the DCProx computational framework is proposed, in which the DCA subproblems are solved by a primal--dual proximal algorithm with Bregman distances. With a suitable choice of Bregman distances, DCProx has simple update rules with cheap per-iteration complexity. As an application, DCA is applied to several fundamental problems in network information theory, for which no existing numerical methods are able to compute the global optimum. For these problems, our analysis proves the global convergence of DCA, and more importantly, DCProx solves the DCA subproblems efficiently. Numerical experiments are conducted to verify the efficiency of DCProx.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
I/O-efficient iterative matrix inversion with photonic integrated circuits
Authors:
Minjia Chen,
Yizhi Wang,
Chunhui Yao,
Adrian Wonfor,
Shuai Yang,
Richard Penty,
Qixiang Cheng
Abstract:
Photonic integrated circuits have been extensively explored for optical processing with the aim of breaking the speed bottleneck of digital electronics. However, the input/output (IO) bottleneck remains one of the key barriers. Here we report a novel photonic iterative processor (PIP) for matrix-inversion-intensive applications. The direct reuse of inputted data in the optical domain unlocks the p…
▽ More
Photonic integrated circuits have been extensively explored for optical processing with the aim of breaking the speed bottleneck of digital electronics. However, the input/output (IO) bottleneck remains one of the key barriers. Here we report a novel photonic iterative processor (PIP) for matrix-inversion-intensive applications. The direct reuse of inputted data in the optical domain unlocks the potential to break the IO bottleneck. We demonstrate notable IO advantages with a lossless PIP for real-valued matrix inversion and integral-differential equation solving, as well as a coherent PIP with optical loops integrated on-chip, enabling complex-valued computation and a net inversion time of 1.2 ns. Furthermore, we estimate at least an order of magnitude enhancement in IO efficiency of a PIP over photonic single-pass processors and the state-of-the-art electronic processors for reservoir training tasks and MIMO precoding tasks, indicating the huge potential of PIP technology in practical applications.
△ Less
Submitted 22 May, 2024; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Improved Projection-free Online Continuous Submodular Maximization
Authors:
Yucheng Liao,
Yuanyu Wan,
Chang Yao,
Mingli Song
Abstract:
We investigate the problem of online learning with monotone and continuous DR-submodular reward functions, which has received great attention recently. To efficiently handle this problem, especially in the case with complicated decision sets, previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations and linear optimiza…
▽ More
We investigate the problem of online learning with monotone and continuous DR-submodular reward functions, which has received great attention recently. To efficiently handle this problem, especially in the case with complicated decision sets, previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations and linear optimization steps in total. However, it only attains a $(1-1/e)$-regret bound of $O(T^{4/5})$. In this paper, we propose an improved projection-free algorithm, namely POBGA, which reduces the regret bound to $O(T^{3/4})$ while keeping the same computational complexity as Mono-FW. Instead of modifying Mono-FW, our key idea is to make a novel combination of a projection-based algorithm called online boosting gradient ascent, an infeasible projection technique, and a blocking technique. Furthermore, we consider the decentralized setting and develop a variant of POBGA, which not only reduces the current best regret bound of efficient projection-free algorithms for this setting from $O(T^{4/5})$ to $O(T^{3/4})$, but also reduces the total communication complexity from $O(T)$ to $O(\sqrt{T})$.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Mask Attack Detection Using Vascular-weighted Motion-robust rPPG Signals
Authors:
Chenglin Yao,
Jianfeng Ren,
Ruibin Bai,
Heshan Du,
Jiang Liu,
Xudong Jiang
Abstract:
Detecting 3D mask attacks to a face recognition system is challenging. Although genuine faces and 3D face masks show significantly different remote photoplethysmography (rPPG) signals, rPPG-based face anti-spoofing methods often suffer from performance degradation due to unstable face alignment in the video sequence and weak rPPG signals. To enhance the rPPG signal in a motion-robust way, a landma…
▽ More
Detecting 3D mask attacks to a face recognition system is challenging. Although genuine faces and 3D face masks show significantly different remote photoplethysmography (rPPG) signals, rPPG-based face anti-spoofing methods often suffer from performance degradation due to unstable face alignment in the video sequence and weak rPPG signals. To enhance the rPPG signal in a motion-robust way, a landmark-anchored face stitching method is proposed to align the faces robustly and precisely at the pixel-wise level by using both SIFT keypoints and facial landmarks. To better encode the rPPG signal, a weighted spatial-temporal representation is proposed, which emphasizes the face regions with rich blood vessels. In addition, characteristics of rPPG signals in different color spaces are jointly utilized. To improve the generalization capability, a lightweight EfficientNet with a Gated Recurrent Unit (GRU) is designed to extract both spatial and temporal features from the rPPG spatial-temporal representation for classification. The proposed method is compared with the state-of-the-art methods on five benchmark datasets under both intra-dataset and cross-dataset evaluations. The proposed method shows a significant and consistent improvement in performance over other state-of-the-art rPPG-based methods for face spoofing detection.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Non-stationary Online Convex Optimization with Arbitrary Delays
Authors:
Yuanyu Wan,
Chang Yao,
Mingli Song,
Lijun Zhang
Abstract:
Online convex optimization (OCO) with arbitrary delays, in which gradients or other information of functions could be arbitrarily delayed, has received increasing attention recently. Different from previous studies that focus on stationary environments, this paper investigates the delayed OCO in non-stationary environments, and aims to minimize the dynamic regret with respect to any sequence of co…
▽ More
Online convex optimization (OCO) with arbitrary delays, in which gradients or other information of functions could be arbitrarily delayed, has received increasing attention recently. Different from previous studies that focus on stationary environments, this paper investigates the delayed OCO in non-stationary environments, and aims to minimize the dynamic regret with respect to any sequence of comparators. To this end, we first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order. Despite its simplicity, our novel analysis shows that the dynamic regret of DOGD can be automatically bounded by $O(\sqrt{\bar{d}T}(P_T+1))$ under mild assumptions, and $O(\sqrt{dT}(P_T+1))$ in the worst case, where $\bar{d}$ and $d$ denote the average and maximum delay respectively, $T$ is the time horizon, and $P_T$ is the path-length of comparators. Furthermore, we develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(\sqrt{\bar{d}T(P_T+1)})$ and $O(\sqrt{dT(P_T+1)})$, respectively. The key idea is to run multiple DOGD with different learning rates, and utilize a meta-algorithm to track the best one based on their delayed performance. Finally, we demonstrate that our improved algorithm is optimal in the worst case by deriving a matching lower bound.
△ Less
Submitted 23 June, 2024; v1 submitted 20 May, 2023;
originally announced May 2023.
-
Screentone-Aware Manga Super-Resolution Using DeepLearning
Authors:
Chih-Yuan Yao,
Husan-Ting Chou,
Yu-Sheng Lin,
Kuo-wei Chen
Abstract:
Manga, as a widely beloved form of entertainment around the world, have shifted from paper to electronic screens with the proliferation of handheld devices. However, as the demand for image quality increases with screen development, high-quality images can hinder transmission and affect the viewing experience. Traditional vectorization methods require a significant amount of manual parameter adjus…
▽ More
Manga, as a widely beloved form of entertainment around the world, have shifted from paper to electronic screens with the proliferation of handheld devices. However, as the demand for image quality increases with screen development, high-quality images can hinder transmission and affect the viewing experience. Traditional vectorization methods require a significant amount of manual parameter adjustment to process screentone. Using deep learning, lines and screentone can be automatically extracted and image resolution can be enhanced. Super-resolution can convert low-resolution images to high-resolution images while maintaining low transmission rates and providing high-quality results. However, traditional Super Resolution methods for improving manga resolution do not consider the meaning of screentone density, resulting in changes to screentone density and loss of meaning. In this paper, we aims to address this issue by first classifying the regions and lines of different screentone in the manga using deep learning algorithm, then using corresponding super-resolution models for quality enhancement based on the different classifications of each block, and finally combining them to obtain images that maintain the meaning of screentone and lines in the manga while improving image resolution.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
A Diagonal Splitting Algorithm for Adaptive Group Testing
Authors:
Chaorui Yao,
Pavlos Nikolopoulos,
Christina Fragouli
Abstract:
Group testing enables to identify infected individuals in a population using a smaller number of tests than individual testing. To achieve this, group testing algorithms commonly assume knowledge of the number of infected individuals; nonadaptive and several adaptive algorithms fall in this category. Some adaptive algorithms, like binary splitting, operate without this assumption, but require a nu…
▽ More
Group testing enables to identify infected individuals in a population using a smaller number of tests than individual testing. To achieve this, group testing algorithms commonly assume knowledge of the number of infected individuals; nonadaptive and several adaptive algorithms fall in this category. Some adaptive algorithms, like binary splitting, operate without this assumption, but require a number of stages that may scale linearly with the size of the population. In this paper we contribute a new algorithm that enables a balance between the number of tests and the number of stages used, and which we term diagonal group testing. Diagonal group testing, like binary splitting, does not require knowledge of the number of infected individuals, yet unlike binary splitting, is order-optimal w.r.t. the expected number of tests it requires and is guaranteed to succeed in a small number of stages that scales at most logarithmically with the size of the population. Numerical evaluations, for diagonal group testing and a hybrid approach we propose, support our theoretical findings.
△ Less
Submitted 14 May, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Topological regularity for solutions to the generalised Hopf equation
Authors:
Gaven Martin,
Cong Yao
Abstract:
The generalised Hopf equation is the first order nonlinear equation with data $Φ$ a holomorphic functions and $η\geq 1$ a positive weight, \[ h_w\,\overline{h_\wbar}\,η(w) = Φ.\] The Hopf equation is the special case $η(w)=\tildeη(h(w))$ and reflects that $h$ is harmonic with respect to the conformal metric $\sqrt{\tildeη(z)}|dz|$. This article obtains conditions on the data to ensure that a solut…
▽ More
The generalised Hopf equation is the first order nonlinear equation with data $Φ$ a holomorphic functions and $η\geq 1$ a positive weight, \[ h_w\,\overline{h_\wbar}\,η(w) = Φ.\] The Hopf equation is the special case $η(w)=\tildeη(h(w))$ and reflects that $h$ is harmonic with respect to the conformal metric $\sqrt{\tildeη(z)}|dz|$. This article obtains conditions on the data to ensure that a solution is open and discrete. We also prove a strong uniqueness result.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
Quantum transport theory of hybrid superconducting systems
Authors:
Chuan-Zhe Yao,
Hon-Lam Lai,
Wei-Min Zhang
Abstract:
We present a quantum transport theory for hybrid superconducting systems based on our exact master equation approach. The total transient transport current is decomposed into components that describe coherent transports through different paths of particle and hole channels. We show that the coherent transports are resultant interferences of numerous repeated tunneling processes and cannot be rende…
▽ More
We present a quantum transport theory for hybrid superconducting systems based on our exact master equation approach. The total transient transport current is decomposed into components that describe coherent transports through different paths of particle and hole channels. We show that the coherent transports are resultant interferences of numerous repeated tunneling processes and cannot be rendered as a simple normal transmission or Andreev reflection as usually described quantum transport involving superconductivity. As a practical application, we find that the coherent transport currents passing through a pair of well-separated Majorana zero modes vanish due to the totally destructive interference between the particle and hole channels.
△ Less
Submitted 2 November, 2023; v1 submitted 22 April, 2023;
originally announced April 2023.
-
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Authors:
Chuwei Luo,
Changxu Cheng,
Qi Zheng,
Cong Yao
Abstract:
Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has…
▽ More
Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (\eg, the F1 score of RE on FUNSD is boosted from 80.35\% to 89.45\%). The code and models are publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/GeoLayoutLM
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Authors:
Zhibo Yang,
Rujiao Long,
Pengfei Wang,
Sibo Song,
Humen Zhong,
Wenqing Cheng,
Xiang Bai,
Cong Yao
Abstract:
Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these bench…
▽ More
Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these benchmarks. As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common. All these factors may lead to failures in information extraction. Therefore, as the second contribution, we explore an alternative approach to precisely and robustly extract key information from document images under such tough conditions. Specifically, in contrast to previous methods, which usually either incorporate visual information into a multi-modal architecture or train text spotting and information extraction in an end-to-end fashion, we explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities, which could largely benefit entity labeling and linking. Extensive experiments on standard benchmarks in this field as well as the proposed dataset demonstrate that the proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models. Dataset is available at https://www.modelscope.cn/datasets/damo/SIBR/summary.
△ Less
Submitted 28 March, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
SigVIC: Spatial Importance Guided Variable-Rate Image Compression
Authors:
Jiaming Liang,
Meiqin Liu,
Chao Yao,
Chunyu Lin,
Yao Zhao
Abstract:
Variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression that trains multiple models for different rate-distortion tradeoffs. One of the most common approaches for variable-rate is to channel-wisely or spatial-uniformly scale the internal features. However, the diversity of spatial importance is instructive for bit allocation of image compression. In…
▽ More
Variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression that trains multiple models for different rate-distortion tradeoffs. One of the most common approaches for variable-rate is to channel-wisely or spatial-uniformly scale the internal features. However, the diversity of spatial importance is instructive for bit allocation of image compression. In this paper, we introduce a Spatial Importance Guided Variable-rate Image Compression (SigVIC), in which a spatial gating unit (SGU) is designed for adaptively learning a spatial importance mask. Then, a spatial scaling network (SSN) takes the spatial importance mask to guide the feature scaling and bit allocation for variable-rate. Moreover, to improve the quality of decoded image, Top-K shallow features are selected to refine the decoded features through a shallow feature fusion module (SFFM). Experiments show that our method outperforms other learning-based methods (whether variable-rate or not) and traditional codecs, with storage saving and high flexibility.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Limit of the Wulff crystal when approaching criticality for isoperimetry in 2D percolation
Authors:
Chang-Long Yao
Abstract:
We consider isoperimetric sets, i.e., sets with minimal vertex boundary for a prescribed volume, of the infinite cluster of supercritical site percolation on the triangular lattice. Let $p$ be the percolation parameter and let $p_c$ be the critical point. By adapting the proof of Biskup, Louidor, Procaccia and Rosenthal [6] for isoperimetry in bond percolation on the square lattice, we show that t…
▽ More
We consider isoperimetric sets, i.e., sets with minimal vertex boundary for a prescribed volume, of the infinite cluster of supercritical site percolation on the triangular lattice. Let $p$ be the percolation parameter and let $p_c$ be the critical point. By adapting the proof of Biskup, Louidor, Procaccia and Rosenthal [6] for isoperimetry in bond percolation on the square lattice, we show that the isoperimetric sets, when suitably rescaled, converge almost surely to a translation of the normalized Wulff crystal $\widehat{W}_p$. More importantly, we prove that $\widehat{W}_p$ tends to a Euclidean disk as $p\downarrow p_c$. This settles the site version of a conjecture proposed in [6]. A key input to the proof is the convergence of the limit shapes for near-critical Bernoulli first-passage percolation proved by the author recently.
△ Less
Submitted 22 November, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
LORE: Logical Location Regression Network for Table Structure Recognition
Authors:
Hangdi Xing,
Feiyu Gao,
Rujiao Long,
Jiajun Bu,
Qi Zheng,
Liangcheng Li,
Cong Yao,
Zhi Yu
Abstract:
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate the corresponding markup sequences from the table images. However, they either count on additional heuristic rules to recover the table structures, or require a huge amount…
▽ More
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate the corresponding markup sequences from the table images. However, they either count on additional heuristic rules to recover the table structures, or require a huge amount of training data and time-consuming sequential decoders. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time combines logical location regression together with spatial location regression of table cells. Our proposed LORE is conceptually simpler, easier to train and more accurate than previous TSR models of other paradigms. Experiments on standard benchmarks demonstrate that LORE consistently outperforms prior arts. Code is available at https:// github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LORE-TSR.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Video Question Answering Using CLIP-Guided Visual-Text Attention
Authors:
Shuhong Ye,
Weikai Kong,
Chenglin Yao,
Jianfeng Ren,
Xudong Jiang
Abstract:
Cross-modal learning of video and text plays a key role in Video Question Answering (VideoQA). In this paper, we propose a visual-text attention mechanism to utilize the Contrastive Language-Image Pre-training (CLIP) trained on lots of general domain language-image pairs to guide the cross-modal learning for VideoQA. Specifically, we first extract video features using a TimeSformer and text featur…
▽ More
Cross-modal learning of video and text plays a key role in Video Question Answering (VideoQA). In this paper, we propose a visual-text attention mechanism to utilize the Contrastive Language-Image Pre-training (CLIP) trained on lots of general domain language-image pairs to guide the cross-modal learning for VideoQA. Specifically, we first extract video features using a TimeSformer and text features using a BERT from the target application domain, and utilize CLIP to extract a pair of visual-text features from the general-knowledge domain through the domain-specific learning. We then propose a Cross-domain Learning to extract the attention information between visual and linguistic features across the target domain and general domain. The set of CLIP-guided visual-text features are integrated to predict the answer. The proposed method is evaluated on MSVD-QA and MSRVTT-QA datasets, and outperforms state-of-the-art methods.
△ Less
Submitted 8 March, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Confidence-based Event-centric Online Video Question Answering on a Newly Constructed ATBS Dataset
Authors:
Weikai Kong,
Shuhong Ye,
Chenglin Yao,
Jianfeng Ren
Abstract:
Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O^2VQA). It requires an online state-updating mechanism for the so…
▽ More
Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O^2VQA). It requires an online state-updating mechanism for the solver to decide if the collected information is sufficient to conclude an answer. We then propose a Confidence-based Event-centric Online Video Question Answering (CEO-VQA) model to solve this problem. Furthermore, a dataset called Answer Target in Background Stream (ATBS) is constructed to evaluate this newly developed online VideoQA application. Compared to the baseline VideoQA method that watches the whole video, the experimental results show that the proposed method achieves a significant performance gain.
△ Less
Submitted 7 March, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
A Comprehensive Evaluation Study on Risk Level Classification of Melanoma by Computer Vision on ISIC 2016-2020 Datasets
Authors:
Chengdong Yao
Abstract:
Skin cancer is the most common type of cancer. Specifically, melanoma is the cause of 75% of skin cancer deaths, although it is the least common skin cancer. Better detection of melanoma could have a positive impact on millions of people. The ISIC archive contains the largest publicly available collection of dermatoscopic images of skin lesions. In this research, we investigate the efficacy of app…
▽ More
Skin cancer is the most common type of cancer. Specifically, melanoma is the cause of 75% of skin cancer deaths, although it is the least common skin cancer. Better detection of melanoma could have a positive impact on millions of people. The ISIC archive contains the largest publicly available collection of dermatoscopic images of skin lesions. In this research, we investigate the efficacy of applying advanced deep learning techniques in computer vision to identify melanoma in images of skin lesions. Through reviewing previous methods, including pre-trained models, deep-learning classifiers, transfer learning, etc., we demonstrate the applicability of the popular deep learning methods on critical clinical problems such as identifying melanoma. Finally, we proposed a processing flow with a validation AUC greater than 94% and a sensitivity greater than 90% on ISIC 2016 - 2020 datasets.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
Designing a Wayfinding Robot for People with Visual Impairments
Authors:
Shuijing Liu,
Aamir Hasan,
Kaiwen Hong,
Chun-Kai Yao,
Justin Lin,
Weihang Liang,
Megan A. Bayles,
Wendy A. Rogers,
Katherine Driggs-Campbell
Abstract:
People with visual impairments (PwVI) often have difficulties navigating through unfamiliar indoor environments. However, current wayfinding tools are fairly limited. In this short paper, we present our in-progress work on a wayfinding robot for PwVI. The robot takes an audio command from the user that specifies the intended destination. Then, the robot autonomously plans a path to navigate to the…
▽ More
People with visual impairments (PwVI) often have difficulties navigating through unfamiliar indoor environments. However, current wayfinding tools are fairly limited. In this short paper, we present our in-progress work on a wayfinding robot for PwVI. The robot takes an audio command from the user that specifies the intended destination. Then, the robot autonomously plans a path to navigate to the goal. We use sensors to estimate the real-time position of the user, which is fed to the planner to improve the safety and comfort of the user. In addition, the robot describes the surroundings to the user periodically to prevent disorientation and potential accidents. We demonstrate the feasibility of our design in a public indoor environment. Finally, we analyze the limitations of our current design, as well as our insights and future work. A demonstration video can be found at https://youtu.be/BS9r5bkIass.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Revealing the origin of neutrino masses through the Type II Seesaw mechanism at high-energy muon colliders
Authors:
Tong Li,
Chang-Yuan Yao,
Man Yuan
Abstract:
The future muon collider can play as an ideal machine to search for new physics at high energies. In this work, we study the search potential of the heavy Higgs triplet in the Type II Seesaw mechanism at muon colliders with high collision energy and high luminosity. The latest neutrino oscillation data are taken into account for realizing the leptonic decay modes of the charged Higgs bosons…
▽ More
The future muon collider can play as an ideal machine to search for new physics at high energies. In this work, we study the search potential of the heavy Higgs triplet in the Type II Seesaw mechanism at muon colliders with high collision energy and high luminosity. The latest neutrino oscillation data are taken into account for realizing the leptonic decay modes of the charged Higgs bosons $(H^{\pm\pm},~H^{\pm})$ in the Type II Seesaw. We show the impact of neutrino mass and mixing parameters on the purely leptonic decays. The pair production of doubly charged Higgs $H^{++}H^{--}$ is through direct $μ^+μ^-$ annihilation and vector boson fusion (VBF) processes at muon collider. The associated production $H^{\pm\pm}H^{\mp}$ can only be induced by VBF processes. We simulate both the purely leptonic and bosonic signal channels of charged Higgs bosons in Type II Seesaw, together with the Standard Model backgrounds. We show the required luminosity for the discovery of the charged Higgses and the reachable limits on the leptonic decay branching fractions.
△ Less
Submitted 27 February, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Giant Nernst effect in the crossover between Fermi liquid and strange metal
Authors:
Yusen Yang,
Qian Tao,
Yuqiang Fang,
Guoxiong Tang,
Chao Yao,
Xiaoxian Yan,
Chenxi Jiang,
Xiangfan Xu,
Fuqiang Huang,
Wenxin Ding,
Yu Wang,
Zhiqiang Mao,
Hui Xing,
Zhu-An Xu
Abstract:
The strange-metal state is a crucial problem in condensed matter physics highlighted by its ubiquity in almost all major correlated systems[1-7]. Its understanding could provide important insight into high-Tc superconductivity[2] and quantum criticality[8]. However, with the Fermi liquid theory failing in strange metals, understanding the highly unconventional behaviors has been a long-standing ch…
▽ More
The strange-metal state is a crucial problem in condensed matter physics highlighted by its ubiquity in almost all major correlated systems[1-7]. Its understanding could provide important insight into high-Tc superconductivity[2] and quantum criticality[8]. However, with the Fermi liquid theory failing in strange metals, understanding the highly unconventional behaviors has been a long-standing challenge. Fundamental aspects of strange metals remain elusive, including the nature of their charge carriers[1]. Here, we report the observation of a giant Nernst response in the strange-metal state in a two-dimensional superconductor 2M-WS2. A giant Nernst coefficient comparable to the vortex Nernst signal in superconducting cuprates, and its high sensitivity to carrier mobility, are found when the system enters the strange-metal state from the Fermi liquid state. The temperature and magnetic field dependence of the giant Nernst peak rule out the relevance of both Landau quasiparticles and superconductivity. Instead, the giant Nernst peak at the crossover indicates a dramatic change in carrier entropy when entering the strange-metal state. The presence of such an anomalous Nernst response is further confirmed in other iconic strange metals, suggesting its universality and places stringent experimental constraints on the mechanism of strange metals.
△ Less
Submitted 16 January, 2023;
originally announced January 2023.
-
Systematic study of one-loop realizations of $d=7$ long-range $0νββ$ decay operators
Authors:
Ping-Tao Chen,
Gui-Jun Ding,
Chang-Yuan Yao
Abstract:
We study the systematical one-loop decomposition of the dimension-7 long-range $0νββ$ decay operators. We find that there are 3 genuine one-loop topologies and 8 diagrams. The procedure to determine the SM quantum number assignments for both internal and external fields is presented. The Majorana neutrino mass in long-range $0νββ$ models is discussed. We also present a one-loop $0νββ$ decay model…
▽ More
We study the systematical one-loop decomposition of the dimension-7 long-range $0νββ$ decay operators. We find that there are 3 genuine one-loop topologies and 8 diagrams. The procedure to determine the SM quantum number assignments for both internal and external fields is presented. The Majorana neutrino mass in long-range $0νββ$ models is discussed. We also present a one-loop $0νββ$ decay model which produces Majorana neutrino mass at three-loop level. The phenomenological predictions for light neutrino mass and $0νββ$ decay half-life time including both mass mechanism and long-range contribution are studied.
△ Less
Submitted 23 March, 2023; v1 submitted 6 January, 2023;
originally announced January 2023.
-
Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble
Authors:
Chun-Han Yao,
Wei-Chih Hung,
Yuanzhen Li,
Michael Rubinstein,
Ming-Hsuan Yang,
Varun Jampani
Abstract:
Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem. Most prior methods rely on large-scale image datasets, dense temporal correspondence, or human annotations like camera pose, 2D keypoints, and shape templates. We propose Hi-LASSIE, which performs 3D articulated recon…
▽ More
Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem. Most prior methods rely on large-scale image datasets, dense temporal correspondence, or human annotations like camera pose, 2D keypoints, and shape templates. We propose Hi-LASSIE, which performs 3D articulated reconstruction from only 20-30 online images in the wild without any user-defined shape or skeleton templates. We follow the recent work of LASSIE that tackles a similar problem setting and make two significant advances. First, instead of relying on a manually annotated 3D skeleton, we automatically estimate a class-specific skeleton from the selected reference image. Second, we improve the shape reconstructions with novel instance-specific optimization strategies that allow reconstructions to faithful fit on each instance while preserving the class-specific priors learned across all images. Experiments on in-the-wild image ensembles show that Hi-LASSIE obtains higher fidelity state-of-the-art 3D reconstructions despite requiring minimum user input.
△ Less
Submitted 25 March, 2023; v1 submitted 21 December, 2022;
originally announced December 2022.