Search | arXiv e-print repository

Applications of Moments of Dirichlet Coefficients in Elliptic Curve Families

Authors: Zoë Batterman, Aditya Jambhale, Steven J. Miller, Akash L. Narayanan, Kishan Sharma, Andrew Yang, Chris Yao

Abstract: The moments of the coefficients of elliptic curve L-functions are related to numerous arithmetic problems. Rosen and Silverman proved a conjecture of Nagao relating the first moment of one-parameter families satisfying Tate's conjecture to the rank of the corresponding elliptic surface over Q(T); one can also construct families of moderate rank by finding families with large first moments. Michel… ▽ More The moments of the coefficients of elliptic curve L-functions are related to numerous arithmetic problems. Rosen and Silverman proved a conjecture of Nagao relating the first moment of one-parameter families satisfying Tate's conjecture to the rank of the corresponding elliptic surface over Q(T); one can also construct families of moderate rank by finding families with large first moments. Michel proved that if j(T) is not constant, then the second moment of the family is of size p^2 + O(p^(3/2)); these two moments show that for suitably small support the behavior of zeros near the central point agree with that of eigenvalues from random matrix ensembles, with the higher moments impacting the rate of convergence. In his thesis, Miller noticed a negative bias in the second moment of every one-parameter family of elliptic curves over the rationals whose second moment had a calculable closed-form expression, specifically the first lower order term which does not average to zero is on average negative. This Bias Conjecture is confirmed for many families; however, these are highly non-generic families whose resulting Legendre sums can be determined. Inspired by the recent successes by Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, Alexey Pozdnyakov and others in investigations of murmurations of elliptic curve coefficients with machine learning techniques, we pose a similar problem for trying to understand the Bias Conjecture. As a start to this program, we numerically investigate the Bias Conjecture for a family whose bias is positive for half the primes. Since the numerics do not offer conclusive evidence that negative bias for the other half is enough to overwhelm the positive bias, the Bias Conjecture cannot be verified for the family. △ Less

Submitted 17 June, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

MSC Class: 11G05; 11G40

arXiv:2311.14956 [pdf, other]

Anomalous hot electron generation from two-plasmon decay instability driven by broadband laser pulses with intensity modulations

Authors: C. Yao, J. Li, L. Hao, R. Yan, C. Wang, A. Lei, Y-K. Ding, J. Zheng

Abstract: We investigate the hot electrons generated from two-plasmon decay (TPD) instability driven by laser pulses with intensity modulated by a frequency $Δω_m$. Our primary focus lies on scenarios where $Δω_m$ is on the same order of the TPD growth rate $ γ_0$ ( $Δω_m \sim γ_0$), corresponding to moderate laser frequency bandwidths for TPD mitigation. With $Δω_m$ conveniently modeled by a basic two-colo… ▽ More We investigate the hot electrons generated from two-plasmon decay (TPD) instability driven by laser pulses with intensity modulated by a frequency $Δω_m$. Our primary focus lies on scenarios where $Δω_m$ is on the same order of the TPD growth rate $ γ_0$ ( $Δω_m \sim γ_0$), corresponding to moderate laser frequency bandwidths for TPD mitigation. With $Δω_m$ conveniently modeled by a basic two-color scheme of the laser wave fields in fully-kinetic particle-in-cell simulations, we demonstrate that the energies of TPD modes and hot electrons exhibit intermittent evolution at the frequency $Δω_m$, particularly when $Δω_m \sim γ_0$. With the dynamic TPD behavior, the overall ratio of hot electron energy to the incident laser energy, $f_{hot}$, changes significantly with $Δω_m$. While $f_{hot}$ drops notably with increasing $Δω_m$ at large $Δω_m$ limit as expected, it goes anomalously beyond the hot electron energy ratio for a single-frequency incident laser pulse with the same average intensity when $Δω_m$ falls below a specific threshold frequency $Δω_c$. We find this threshold frequency primarily depends on $γ_0$ and the collisional damping rate of plasma waves, with relatively lower sensitivity to the density scale length. We develop a scaling model characterizing the relation of $Δω_c$ and laser plasma conditions, enabling the potential extention of our findings to more complex and realistic scenarios. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.11482 [pdf, other]

Meta Prompting for AI Systems

Authors: Yifan Zhang, Yang Yuan, Andrew Chi-Chih Yao

Abstract: In this work, we present a comprehensive study of Meta Prompting (MP), an innovative technique reshaping the utilization of language models (LMs) and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of… ▽ More In this work, we present a comprehensive study of Meta Prompting (MP), an innovative technique reshaping the utilization of language models (LMs) and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of Meta Prompting, sets it apart from few-shot prompting, and underlines its effectiveness in various AI applications. A key focus is applying Meta Prompting for complex reasoning tasks, showing how it effectively deconstructs intricate problems into simpler sub-problems, enhancing token efficiency, and enabling more equitable problem-solving comparisons, especially against few-shot prompting methods. Additionally, the paper introduces Meta Prompting for prompting tasks, allowing LLMs to self-generate new prompts in a recursive, metaprogramming-like manner. Empirical experiments, including using a Qwen-72B base language model equipped with meta prompt without instruction-tuning to solve MATH problems with accuracy at 46.3%, which surpass the supervised fine-tuned counterpart trained with extensive mathematical QA instruction pairs and even the initial version of GPT-4, solving GSM8K problems with 83.5% accuracy with zero-shot meta-prompted Qwen-72B base language model, and solving the Game of 24 tasks with a 100% success rate using GPT-4, demonstrate the meta prompting's efficacy in achieving high accuracy and efficiency, showcasing Meta Prompting's transformative impact on AI problem-solving The code is available at https://github.com/meta-prompting/meta-prompting. △ Less

Submitted 15 June, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

arXiv:2310.18090 [pdf, ps, other]

Probabilistic Constellation Shaping for OFDM-Based ISAC Signaling

Authors: Zhen Du, Fan Liu, Yifeng Xiong, Tony Xiao Han, Weijie Yuan, Yuanhao Cui, Changhua Yao, Yonina C. Eldar

Abstract: Integrated Sensing and Communications (ISAC) has garnered significant attention as a promising technology for the upcoming sixth-generation wireless communication systems (6G). In pursuit of this goal, a common strategy is that a unified waveform, such as Orthogonal Frequency Division Multiplexing (OFDM), should serve dual-functional roles by enabling simultaneous sensing and communications (S&C)… ▽ More Integrated Sensing and Communications (ISAC) has garnered significant attention as a promising technology for the upcoming sixth-generation wireless communication systems (6G). In pursuit of this goal, a common strategy is that a unified waveform, such as Orthogonal Frequency Division Multiplexing (OFDM), should serve dual-functional roles by enabling simultaneous sensing and communications (S&C) operations. However, the sensing performance of an OFDM communication signal is substantially affected by the randomness of the data symbols mapped from bit streams. Therefore, achieving a balance between preserving communication capability (i.e., the randomness) while improving sensing performance remains a challenging task. To cope with this issue, in this paper we analyze the ambiguity function of the OFDM communication signal modulated by random data. Subsequently, a probabilistic constellation shaping (PCS) method is proposed to devise the probability distributions of constellation points, which is able to strike a scalable S&C tradeoff of the random transmitted signal. Finally, the superiority of the proposed PCS method over conventional uniformly distributed constellations is validated through numerical simulations. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.16070 [pdf, other]

Spatial-Temporal Hypergraph Neural Network for Traffic Forecasting

Authors: Chengzhi Yao, Zhi Li, Junbo Wang

Abstract: Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the co… ▽ More Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the complex road network for traffic forecasting shallowly. Despite their effectiveness, these methods are generally limited in fully capturing high-order spatial dependencies caused by road network topology and high-order temporal dependencies caused by traffic dynamics. To tackle the above issues, we focus on the essence of traffic system and propose STHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Network, which combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data. Technically, STHODE consists of a spatial module and a temporal module. On the one hand, we construct a spatial hypergraph and leverage an adaptive MixHop hypergraph ODE network to capture high-order spatial dependencies. On the other hand, we utilize a temporal hypergraph and employ a hyperedge evolving ODE network to capture high-order temporal dependencies. Finally, we aggregate the outputs of stacked STHODE layers to mutually enhance the prediction performance. Extensive experiments conducted on four real-world traffic datasets demonstrate the superior performance of our proposed model compared to various baselines. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.12430 [pdf, other]

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

Authors: Cong Yao

Abstract: In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines. Specifically, basic capabilities, including text detection, text recognition, t… ▽ More In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines. Specifically, basic capabilities, including text detection, text recognition, table structure recognition and layout analysis, are provided. Upon these basic capabilities, we also build a set of fully functional pipelines for document parsing, i.e., general text reading, table parsing, and document structurization, to drive various applications related to documents in real-world scenarios. Moreover, DocXChain is concise, modularized and flexible, such that it can be readily integrated with existing tools, libraries or models (such as LangChain and ChatGPT), to construct more powerful systems that can accomplish more complicated and challenging tasks. The code of DocXChain is publicly available at:~\url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/Applications/DocXChain} △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 4 pages, 4 figures, 2 tables

arXiv:2310.10362 [pdf, other]

Self-Pro: A Self-Prompt and Tuning Framework for Graph Neural Networks

Authors: Chenghua Gong, Xiang Li, Jianxiang Yu, Cheng Yao, Jiaqi Tan, Chengcheng Yu

Abstract: Graphs have become an important modeling tool for web applications, and Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, the performance of traditional GNNs heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-… ▽ More Graphs have become an important modeling tool for web applications, and Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, the performance of traditional GNNs heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-training strategies vary for graphs with homophily and heterophily, and the objectives for various downstream tasks also differ. This leads to a gap between pretexts and downstream tasks, resulting in ``negative transfer'' and poor performance. Inspired by prompt learning in Natural Language Processing (NLP), many studies turn to bridge the gap and fully leverage the pre-trained model. However, existing methods for graph prompting are tailored to homophily, neglecting inherent heterophily on graphs. Meanwhile, most of them rely on the randomly initialized prompts, which negatively impact on the stability. Therefore, we propose Self-Prompt, a prompting framework for graphs based on the model and data itself. We first introduce asymmetric graph contrastive learning for pretext to address heterophily and align the objectives of pretext and downstream tasks. Then we reuse the component from pre-training phase as the self adapter and introduce self-prompts based on graph itself for task adaptation. Finally, we conduct extensive experiments on 11 benchmark datasets to demonstrate its superiority. We provide our codes at https://github.com/gongchenghua/Self-Pro. △ Less

Submitted 4 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted at ECML-PKDD 2024

arXiv:2310.08064 [pdf]

Age Estimation Based on Graph Convolutional Networks and Multi-head Attention Mechanisms

Authors: Miaomiao Yang, Changwei Yao, Shijin Yan

Abstract: Age estimation technology is a part of facial recognition and has been applied to identity authentication. This technology achieves the development and application of a juvenile anti-addiction system by authenticating users in the game. Convolutional Neural Network (CNN) and Transformer algorithms are widely used in this application scenario. However, these two models cannot flexibly extract and m… ▽ More Age estimation technology is a part of facial recognition and has been applied to identity authentication. This technology achieves the development and application of a juvenile anti-addiction system by authenticating users in the game. Convolutional Neural Network (CNN) and Transformer algorithms are widely used in this application scenario. However, these two models cannot flexibly extract and model features of faces with irregular shapes, and they are ineffective in capturing key information. Furthermore, the above methods will contain a lot of background information while extracting features, which will interfere with the model. In consequence, it is easy to extract redundant information from images. In this paper, a new modeling idea is proposed to solve this problem, which can flexibly model irregular objects. The Graph Convolutional Network (GCN) is used to extract features from irregular face images effectively, and multi-head attention mechanisms are added to avoid redundant features and capture key region information in the image. This model can effectively improve the accuracy of age estimation and reduce the MAE error value to about 3.64, which is better than the effect of today's age estimation model, to improve the accuracy of face recognition and identity authentication. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.04975 [pdf, ps, other]

A Trustworthy and Consistent Blockchain Oracle Scheme for Industrial Internet of Things

Authors: Peng Liu, Youquan Xian, Chuanjian Yao, Peng Wang, Li-e Wang, Xianxian Li

Abstract: Blockchain provides decentralization and trustlessness features for the Industrial Internet of Things (IIoT), which expands the application scenarios of IIoT. To address the problem that the blockchain cannot actively obtain off-chain data, the blockchain oracle is proposed as a bridge between the blockchain and external data. However, the existing oracle schemes are difficult to solve the problem… ▽ More Blockchain provides decentralization and trustlessness features for the Industrial Internet of Things (IIoT), which expands the application scenarios of IIoT. To address the problem that the blockchain cannot actively obtain off-chain data, the blockchain oracle is proposed as a bridge between the blockchain and external data. However, the existing oracle schemes are difficult to solve the problem of low quality of service caused by frequent data changes and heterogeneous devices in IIoT, and the current oracle node selection schemes are difficult to balance security and quality of service. To tackle these problems, this paper proposes a secure and reliable oracle scheme that can obtain high-quality off-chain data. Specifically, we first design an oracle node selection algorithm based on Verifiable Random Function (VRF) and reputation mechanism to securely select high-quality nodes. Second, we propose a data filtering algorithm based on a sliding window to further improve the consistency of the collected data. We verify the security of the proposed scheme through security analysis. The experimental results show that the proposed scheme can effectively improve the service quality of the oracle. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: Rejected after the third round of review of IEEE Internet of Things Journal

arXiv:2310.00890 [pdf]

doi 10.1002/adma.202313742

Femtosecond electron diffraction reveals local disorder and local anharmonicity in thermoelectric SnSe

Authors: Jingjun Li, Yingpeng Qi, Qing Yang, Luye Yue, Changyuan Yao, Zijing Chen, Sheng Meng, Dao Xiang, Jianming Cao

Abstract: The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characteriz… ▽ More The microscopic arrangement of atoms and molecules is the determining factor in how materials behave and perform. Beyond the long-range periodicity, the local disorder with local structures deviating from the average lattice structure plays a vital role in determining the physical properties of the phonon, electron and spin subsystems in crystalline functional materials. Experimentally characterizing the 3D atomic configuration of such local disorder and correlating it with the advanced functions remain a big challenge. Time-domain evolution of the local disorder, either static or dynamical, is lost due to the characterization at equilibrium state with conventional probing techniques. With the combination of femtosecond electron diffraction, structure factor calculation and TDDFT-MD simulation, we exclusively identify the static local disorder and the local anharmonicity of it in thermoelectric SnSe. The ultrafast structural dynamics in time domain reveal a dominant static off-symmetry displacement of Sn (~0.4 angstrom) and the anharmonicity of this local disorder induces an ultrafast atomic displacement within 100 fs after photoexcitation. The microscopic picture of the local anharmonicity indicates a direct and first signature of the THz Einstein oscillators in real space. Therefore, a glass-like thermal transport channel with the local disorder, the Einstein oscillators and the local anharmonicity, updates the fundamental insight into the long-debated ultralow thermal conductivity in SnSe. The local disorder over one to a few unit cells is pervasive and indispensable in thermoelectric materials, multiferroic materials and correlated electronic materials. Our method of revealing the 3D local disorder and the local correlated interactions by ultrafast structural dynamics will inspire broad interest in construction of the structure-property relationship in material science. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Report number: 2313742

Journal ref: Adv. Mater. 2313742 (2024)

arXiv:2309.13835 [pdf, other]

doi 10.1016/j.patcog.2024.110465

IBVC: Interpolation-driven B-frame Video Compression

Authors: Chenming Xu, Meiqin Liu, Chao Yao, Weisi Lin, Yao Zhao

Abstract: Learned B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. However, previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation or video frame interpolation. They suffer from inaccurate quantized motions and inefficient motion compens… ▽ More Learned B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. However, previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation or video frame interpolation. They suffer from inaccurate quantized motions and inefficient motion compensation. To address these issues, we propose a simple yet effective structure called Interpolation-driven B-frame Video Compression (IBVC). Our approach only involves two major operations: video frame interpolation and artifact reduction compression. IBVC introduces a bit-rate free MEMC based on interpolation, which avoids optical-flow quantization and additional compression distortions. Later, to reduce duplicate bit-rate consumption and focus on unaligned artifacts, a residual guided masking encoder is deployed to adaptively select the meaningful contexts with interpolated multi-scale dependencies. In addition, a conditional spatio-temporal decoder is proposed to eliminate location errors and artifacts instead of using MEMC coding in other methods. The experimental results on B-frame coding demonstrate that IBVC has significant improvements compared to the relevant state-of-the-art methods. Meanwhile, our approach can save bit rates compared with the random access (RA) configuration of H.266 (VTM). The code will be available at https://github.com/ruhig6/IBVC. △ Less

Submitted 14 March, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: Submitted to Pattern Recognition

arXiv:2309.13596 [pdf, other]

Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development

Authors: Runkai Zhao, Yuwen Heng, Heng Wang, Yuanda Gao, Shilei Liu, Changhao Yao, Jiawen Chen, Weidong Cai

Abstract: Advanced Driver-Assistance Systems (ADAS) have successfully integrated learning-based techniques into vehicle perception and decision-making. However, their application in 3D lane detection for effective driving environment perception is hindered by the lack of comprehensive LiDAR datasets. The sparse nature of LiDAR point cloud data prevents an efficient manual annotation process. To solve this p… ▽ More Advanced Driver-Assistance Systems (ADAS) have successfully integrated learning-based techniques into vehicle perception and decision-making. However, their application in 3D lane detection for effective driving environment perception is hindered by the lack of comprehensive LiDAR datasets. The sparse nature of LiDAR point cloud data prevents an efficient manual annotation process. To solve this problem, we present LiSV-3DLane, a large-scale 3D lane dataset that comprises 20k frames of surround-view LiDAR point clouds with enriched semantic annotation. Unlike existing datasets confined to a frontal perspective, LiSV-3DLane provides a full 360-degree spatial panorama around the ego vehicle, capturing complex lane patterns in both urban and highway environments. We leverage the geometric traits of lane lines and the intrinsic spatial attributes of LiDAR data to design a simple yet effective automatic annotation pipeline for generating finer lane labels. To propel future research, we propose a novel LiDAR-based 3D lane detection model, LiLaDet, incorporating the spatial geometry learning of the LiDAR point cloud into Bird's Eye View (BEV) based lane identification. Experimental results indicate that LiLaDet outperforms existing camera- and LiDAR-based approaches in the 3D lane detection task on the K-Lane dataset and our LiSV-3DLane. △ Less

Submitted 15 March, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: Accepted by ICRA2024

arXiv:2309.12748 [pdf, other]

The Reversed Zeckendorf Game

Authors: Zoë X. Batterman, Aditya Jambhale, Steven J. Miller, Akash L. Narayanan, Kishan Sharma, Andrew K. Yang, Chris Yao

Abstract: Zeckendorf proved that every natural number $n$ can be expressed uniquely as a sum of non-consecutive Fibonacci numbers, called its Zeckendorf decomposition. Baird-Smith, Epstein, Flint, and Miller created the Zeckendorf game, a two-player game played on partitions of $n$ into Fibonacci numbers which always terminates at a Zeckendorf decomposition, and proved that Player 2 has a winning strategy f… ▽ More Zeckendorf proved that every natural number $n$ can be expressed uniquely as a sum of non-consecutive Fibonacci numbers, called its Zeckendorf decomposition. Baird-Smith, Epstein, Flint, and Miller created the Zeckendorf game, a two-player game played on partitions of $n$ into Fibonacci numbers which always terminates at a Zeckendorf decomposition, and proved that Player 2 has a winning strategy for $n\geq 3$. Since their proof was non-constructive, other authors have studied the game to find a constructive winning strategy, and lacking success there turned to related problems. For example, Cheigh, Moura, Jeong, Duke, Milgrim, Miller, and Ngamlamai studied minimum and maximum game lengths and randomly played games. We explore a new direction and introduce the reversed Zeckendorf game, which starts at the ending state of the Zeckendorf game and flips all the moves, so the reversed game ends with all pieces in the first bin. We show that Player 1 has a winning strategy for $n = F_{i+1} + F_{i-2}$ and solve various modified games. △ Less

Submitted 4 October, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 25 Pages, 4 figures

MSC Class: 11B39 (Primary); 05C57; 65Q30; 91A05; 91A46 (Secondary)

arXiv:2308.14978 [pdf, other]

Vision Grid Transformer for Document Layout Analysis

Authors: Cheng Da, Chuwei Luo, Qi Zheng, Cong Yao

Abstract: Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI. However, for the document layout analysis (DLA) task, existing document pre-trained models, even those pre-trained in a multi-modal fashion, usually rely on either textual features or visual features. Grid-based models for DLA are multi-modality but largely neglect the effect of pre-… ▽ More Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI. However, for the document layout analysis (DLA) task, existing document pre-trained models, even those pre-trained in a multi-modal fashion, usually rely on either textual features or visual features. Grid-based models for DLA are multi-modality but largely neglect the effect of pre-training. To fully leverage multi-modal information and exploit pre-training techniques to learn better representation for DLA, in this paper, we present VGT, a two-stream Vision Grid Transformer, in which Grid Transformer (GiT) is proposed and pre-trained for 2D token-level and segment-level semantic understanding. Furthermore, a new dataset named D$^4$LA, which is so far the most diverse and detailed manually-annotated benchmark for document layout analysis, is curated and released. Experiment results have illustrated that the proposed VGT model achieves new state-of-the-art results on DLA tasks, e.g. PubLayNet ($95.7\%$$\rightarrow$$96.2\%$), DocBank ($79.6\%$$\rightarrow$$84.1\%$), and D$^4$LA ($67.7\%$$\rightarrow$$68.8\%$). The code and models as well as the D$^4$LA dataset will be made publicly available ~\url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery}. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023

arXiv:2308.12774 [pdf, other]

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

Authors: Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao

Abstract: The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work well on short or seen-length text, lacking the capability of recognizing longer text or performing length extrapolation. This is a crucial issue, since the lengths of the text to be recognized are usually not g… ▽ More The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work well on short or seen-length text, lacking the capability of recognizing longer text or performing length extrapolation. This is a crucial issue, since the lengths of the text to be recognized are usually not given in advance in real-world applications, but it has not been adequately investigated in previous works. Therefore, we propose in this paper a method called Length-Insensitive Scene TExt Recognizer (LISTER), which remedies the limitation regarding the robustness to various text lengths. Specifically, a Neighbor Decoder is proposed to obtain accurate character attention maps with the assistance of a novel neighbor matrix regardless of the text lengths. Besides, a Feature Enhancement Module is devised to model the long-range dependency with low computation cost, which is able to perform iterations with the neighbor decoder to enhance the feature map progressively. To the best of our knowledge, we are the first to achieve effective length-insensitive scene text recognition. Extensive experiments demonstrate that the proposed LISTER algorithm exhibits obvious superiority on long text recognition and the ability for length extrapolation, while comparing favourably with the previous state-of-the-art methods on standard benchmarks for STR (mainly short text). △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2308.09365 [pdf, ps, other]

The dissolving limit and large volume limit of Einstein-Bogomol'nyi metrics

Authors: Chengjian Yao

Abstract: We study the limits of Einstein-Bogomol'nyi metrics on $\mathbf{P}^1$, which is the solution to a dimensional reduction of Einstein-Maxwell-Higgs system in dimension four, in two regimes. In one regime called the "dissolving limit" where the volume of the metrics is approaching the admissible lower bound, it exhibits a pattern that all the vortices are dissolving similar to the Bradlow limit in th… ▽ More We study the limits of Einstein-Bogomol'nyi metrics on $\mathbf{P}^1$, which is the solution to a dimensional reduction of Einstein-Maxwell-Higgs system in dimension four, in two regimes. In one regime called the "dissolving limit" where the volume of the metrics is approaching the admissible lower bound, it exhibits a pattern that all the vortices are dissolving similar to the Bradlow limit in the study of vortices on Riemann surfaces. In another regime called the "large volume limit" where the volume of of the metrics is approaching infinity, the magnetic field is concentrating around the zeros of the Higgs field. In the meantime, the volume-normalized underlying metric is approaching the Euclidean cone metric determined by the Higgs field in the case of stable Higgs field. Moreover, by studying the large volume limit of Yang's solution for a strictly polystable Higgs field, for each natural number $N'$ we recover the Einstein-Bogomol'nyi metrics on $\mathbf{C}$ which is asymptotically cylindrical at exponential rate and with total string number $N'$ firstly discovered by Linet and Yang. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 26 pages, 2 figures

MSC Class: 53C07; 53C25 (Primary) 83C22; 83C50 (Secondary)

arXiv:2308.04371 [pdf, other]

Cumulative Reasoning with Large Language Models

Authors: Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao

Abstract: Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective compositio… ▽ More Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. Additionally, CR sets new state-of-the-art on the MATH dataset, achieving a 4.2% increase from previous methods and a 43% relative improvement in the most challenging problems. By extending CR to incorporate a code environment without external aids like retrieval or web browsing, we further harness the computational and logical reasoning capabilities of LMs, achieving a remarkable 72.2% accuracy on the MATH dataset and outperforming the PAL/PoT method by 38.8%. Our work not only sets new state-of-the-art but also paves the way toward more sophisticated AI reasoning methods. The code is available at https://github.com/iiis-ai/cumulative-reasoning. △ Less

Submitted 1 April, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

arXiv:2307.15272 [pdf]

Direct Power Flow Controller with Continuous Full Regulation Range

Authors: Chong Yao, Youjun Zhang

Abstract: For enhancing power flow control in power transmission, a simplified new structure of direct power flow controller with continuous full regulation range (F-DPFC) was proposed. It has only one-stage power conversion and comprises of a three-phase transformer in parallel and a three-phase trans-former in series with grid, three single-phase full-bridge ac units, and a three-phase filter. Compared wi… ▽ More For enhancing power flow control in power transmission, a simplified new structure of direct power flow controller with continuous full regulation range (F-DPFC) was proposed. It has only one-stage power conversion and comprises of a three-phase transformer in parallel and a three-phase trans-former in series with grid, three single-phase full-bridge ac units, and a three-phase filter. Compared with previous DPFC, the proposed one dispenses with two complex three-phase se-lection switches which connect with high-voltage grid directly, and has a continuous 360° adjustment range of compensation voltage by taking place of buck-type ac unit with full-bridge type ac unit, and then expanding the limit of its duty cycle from [0,1] to [-1,1]. Within a large smooth zone replacing six separate zones, the proposed F-DPFC can regulate the ampli-tude and phase angle of grid node voltage respectively and simultaneously, and then the active and reactive power flow in grid can be controlled smoothly and effectively. The new structure is easy to achieve modular expansion and enables it to operate under high voltage and power conditions. Its struc-ture and operational principle were analyzed in detail, and a prototype was developed. The experimental results verified the feasibility and the correctness of the theoretical analysis. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 9 pages,20 figures

arXiv:2307.14641 [pdf]

doi 10.1016/j.oceram.2023.100406

Influence of cation vacancy concentrations on ultra-low thermal conductivity in $(1-x)$BiVO$_4$-$x$Bi$_{2/3}$MoO$_4$ scheelite solid solutions

Authors: Guillaume F. Nataf, Hicham Ait Laasri, Damien Brault, Tatiana Chartier, Chalit Ya, Fabian Delorme, Isabelle Monot-Laffez, Fabien Giovannelli

Abstract: Bismuth vanadate - bismuth molybdate solid-solution was prepared to elaborate ceramics with different amounts of cation vacancies. Dense ceramics with similar microstructures were obtained and the evolution of their melting point, specific heat, thermal diffusivity, and conductivity as a function of the amount of vacancy was evaluated. At room temperature, the thermal conductivity decreases from 1… ▽ More Bismuth vanadate - bismuth molybdate solid-solution was prepared to elaborate ceramics with different amounts of cation vacancies. Dense ceramics with similar microstructures were obtained and the evolution of their melting point, specific heat, thermal diffusivity, and conductivity as a function of the amount of vacancy was evaluated. At room temperature, the thermal conductivity decreases from 1.74 W m$^{-1}$ K$^{-1}$ for BiVO$_{4}$ (x=0) to 1.12 W m$^{-1}$ K$^{-1}$ for Bi$_{0.867}$$\square$$_{0.133}$Mo$_{0.4}$V$_{0.6}$O$_{4}$ (x=0.4). Moreover, we show that a very small amount of vacancy (1.7%, x=0.05) is enough to provide a large decrease in thermal conductivity by more than 15%, in agreement with a mass fluctuation scattering model. However, the temperature of the melting point also decreases with increasing amount of vacancy. Our results suggest adding only a very small amount of vacancy as the best strategy to obtain superior materials for thermal barriers and thermoelectric devices, with ultra-low thermal conductivity and high-temperature stability. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 17 pages, 5 figures

Journal ref: Open Ceramics 15, 100406 (2023)

arXiv:2307.13244 [pdf, other]

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

Authors: Cheng Da, Peng Wang, Cong Yao

Abstract: Due to the enormous technical challenges and wide range of applications, scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this tough problem, numerous innovative methods have been successively proposed, and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the… ▽ More Due to the enormous technical challenges and wide range of applications, scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this tough problem, numerous innovative methods have been successively proposed, and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet functionally powerful vision STR model, which is built upon ViT and a tailored Adaptive Addressing and Aggregation (A$^3$) module. It already outperforms most previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, \ie, subword representations (BPE and WordPiece) widely used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. To produce the final recognition results, two strategies for effectively fusing the multi-granularity predictions are devised. The resultant algorithm (termed MGP-STR) is able to push the performance envelope of STR to an even higher level. Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition. Moreover, it also achieves state-of-the-art results on widely-used handwritten benchmarks as well as more challenging scene text datasets, demonstrating the generality of the proposed MGP-STR algorithm. The source code and models will be available at: \url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/MGP-STR}. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: submitted to TPAMI; an extension to our previous ECCV 2022 paper arXiv:2209.03592

arXiv:2307.08563 [pdf, other]

Hilbert series for ALP EFTs

Authors: Christophe Grojean, Jonathan Kley, Chang-Yuan Yao

Abstract: Axions and axion-like particles (ALPs) are ubiquitous in popular attempts to solve supercalifragilisticexpialidocious puzzles of Nature. A widespread and vivid experimental programme spanning a vast range of mass scales and decades of couplings strives to find evidence for these elusive but theoretically well-motivated particles. In the absence of clear guiding principle, effective field theories… ▽ More Axions and axion-like particles (ALPs) are ubiquitous in popular attempts to solve supercalifragilisticexpialidocious puzzles of Nature. A widespread and vivid experimental programme spanning a vast range of mass scales and decades of couplings strives to find evidence for these elusive but theoretically well-motivated particles. In the absence of clear guiding principle, effective field theories (EFTs) prove to be an efficient tool in this experimental quest. Hilbert series technologies are a privileged instrument of the EFT toolbox to enumerate and classify operators. In this work, we compute explicitly the Hilbert series capturing the interactions of a generic ALP to the Standard Model particles above and below the electroweak symmetry scale, which allow us to build bases of operators up to dimension 8. In particular, we revealed a remarkable structure of the Hilbert series that isolates the shift-symmetry breaking and preserving interactions. In addition, with the Hilbert series method, we enumerate the sources of CP violation in terms of CP-even, CP-odd and CP-violating operators. Furthermore, we provide an ancillary file of the Hilbert series up to dimension 15 to supplement our findings, which can be used for further analysis and exploration. △ Less

Submitted 30 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 33 pages + appendices, 2 figures, 13 tables, added discussion about CP, updated the ancillary file

Report number: DESY-23-098, HU-EP-23/39

arXiv:2307.04420 [pdf, ps, other]

FedDCT: A Dynamic Cross-Tier Federated Learning Scheme in Wireless Communication Networks

Authors: Peng Liu, Youquan Xian, Chuanjian Yao, Xiaoyun Gan, Lianghaojie Zhou, Jianyong Jiang, Dongcheng Li

Abstract: With the rapid proliferation of Internet of Things (IoT) devices and the growing concern for data privacy among the public, Federated Learning (FL) has gained significant attention as a privacy-preserving machine learning paradigm. FL enables the training of a global model among clients without exposing local data. However, when a federated learning system runs on wireless communication networks,… ▽ More With the rapid proliferation of Internet of Things (IoT) devices and the growing concern for data privacy among the public, Federated Learning (FL) has gained significant attention as a privacy-preserving machine learning paradigm. FL enables the training of a global model among clients without exposing local data. However, when a federated learning system runs on wireless communication networks, limited wireless resources, heterogeneity of clients, and network transmission failures affect its performance and accuracy. In this study, we propose a novel dynamic cross-tier FL scheme, named FedDCT to increase training accuracy and performance in wireless communication networks. We utilize a tiering algorithm that dynamically divides clients into different tiers according to specific indicators and assigns specific timeout thresholds to each tier to reduce the training time required. To improve the accuracy of the model without increasing the training time, we introduce a cross-tier client selection algorithm that can effectively select the tiers and participants. Simulation experiments show that our scheme can make the model converge faster and achieve a higher accuracy in wireless communication networks. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2307.02828 [pdf, other]

Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks

Authors: Xu Han, Anmin Liu, Chenxuan Yao, Yanbo Fan, Kun He

Abstract: Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods gen… ▽ More Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods generally use the sign function to generate perturbations on the gradient update, that offers a roughly correct direction and has gained great success. But little work pays attention to its possible limitation. In this work, we observe that the deviation between the original gradient and the generated noise may lead to inaccurate gradient update estimation and suboptimal solutions for adversarial transferability. To this end, we propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM). Specifically, we use data rescaling to substitute the sign function without extra computational cost. We further propose a Depth First Sampling method to eliminate the fluctuation of rescaling and stabilize the gradient update. Our method could be used in any gradient-based attacks and is extensible to be integrated with various input transformation or ensemble methods to further improve the adversarial transferability. Extensive experiments on the standard ImageNet dataset show that our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 10 pages, 6 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:2204.02887

arXiv:2306.17368 [pdf, other]

Searching for heavy neutral lepton and lepton number violation through VBS at high-energy muon colliders

Authors: Tong Li, Chang-Yuan Yao, Man Yuan

Abstract: High-energy muon collider can play as an emitter of electroweak gauge bosons and thus leads to substantial vector boson scattering (VBS) processes. In this work, we investigate the production of heavy neutral lepton (HNL) $N$ and lepton number violation (LNV) signature through VBS at high-energy muon colliders. VBS induces LNV processes… ▽ More High-energy muon collider can play as an emitter of electroweak gauge bosons and thus leads to substantial vector boson scattering (VBS) processes. In this work, we investigate the production of heavy neutral lepton (HNL) $N$ and lepton number violation (LNV) signature through VBS at high-energy muon colliders. VBS induces LNV processes $W^\pm Z/γ\to \ell^\pm N \to \ell^\pm \ell^\pm W^\mp\to \ell^\pm \ell^\pm q\bar{q}'$ with an on-shell HNL $N$ at $μ^+μ^-$ colliders. In analogy to neutrinoless double-beta decay with the HNL in t-channel, the LNV signature $W^+W^+\to \ell^+\ell^+$ can also happen via VBS at same-sign muon collider. They provide clean and robust LNV signatures to tell the nature of Majorana HNLs and thus have more advantageous benefits than direct $μμ$ annihilation. We analyze the potential of searching for Majorana HNL and obtain the exclusion limits on mixing $V_{\ell N}$. Based on this same-sign lepton signature, we also obtain the sensitivity of muon collider to the Weinberg operator. △ Less

Submitted 3 September, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 24 pages, 8 figures, 2 tables. Accepted for publication in JHEP

Report number: DESY-23-092

arXiv:2306.12052 [pdf, ps, other]

Some invariants of $U(1,1;\mathbb{H})$ and diagonalization

Authors: Cailing Yao, Bingzhe Hou, Xiaoqi Feng

Abstract: Denote by $\mathbb{H}$ the set of all quaternions. We are interested in the group $U(1,1;\mathbb{H})$, which is a subgroup of $2\times 2$ quaternionic matrix group and is sometimes called $Sp(1,1)$. As well known, $U(1,1;\mathbb{H})$ corresponds to the quaternionic Möbius transformations on the unit ball in $\mathbb{H}$. In this article, some similar invariants on $U(1,1;\mathbb{H})$ are discussed… ▽ More Denote by $\mathbb{H}$ the set of all quaternions. We are interested in the group $U(1,1;\mathbb{H})$, which is a subgroup of $2\times 2$ quaternionic matrix group and is sometimes called $Sp(1,1)$. As well known, $U(1,1;\mathbb{H})$ corresponds to the quaternionic Möbius transformations on the unit ball in $\mathbb{H}$. In this article, some similar invariants on $U(1,1;\mathbb{H})$ are discussed. Our main result shows that each matrix $T\in U(1,1;\mathbb{H})$, which corresponds to an elliptic quaternionic Möbius transformation $g_T(z)$, could be $U(1,1;\mathbb{H})$-similar to a diagonal matrix. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 17 pages

MSC Class: 15A20; 15B33; 30G35 (Primary); 15A18; 16R30 (Secondary)

arXiv:2306.10804 [pdf, other]

Conditional Text Image Generation with Diffusion Models

Authors: Yuanzhi Zhu, Zhaohai Li, Tianwei Wang, Mengchao He, Cong Yao

Abstract: Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images. In this paper, we explore the problem of text image generation, by taking advantage of the powerful abilities of Diffusion Mo… ▽ More Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating enough real text images. In this paper, we explore the problem of text image generation, by taking advantage of the powerful abilities of Diffusion Models in generating photo-realistic and diverse image samples with given conditions, and propose a method called Conditional Text Image Generation with Diffusion Models (CTIG-DM for short). To conform to the characteristics of text images, we devise three conditions: image condition, text condition, and style condition, which can be used to control the attributes, contents, and styles of the samples in the image generation process. Specifically, four text image generation modes, namely: (1) synthesis mode, (2) augmentation mode, (3) recovery mode, and (4) imitation mode, can be derived by combining and configuring these three conditions. Extensive experiments on both handwritten and scene text demonstrate that the proposed CTIG-DM is able to produce image samples that simulate real-world complexity and diversity, and thus can boost the performance of existing text recognizers. Besides, CTIG-DM shows its appealing potential in domain adaptation and generating images containing Out-Of-Vocabulary (OOV) words. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.05729 [pdf]

Redesigning spectroscopic sensors with programmable photonic circuits

Authors: Chunhui Yao, Kangning Xu, Wanlu Zhang, Minjia Chen, Qixiang Cheng, Richard Penty

Abstract: Optical spectroscopic sensors are a powerful tool to reveal light-matter interactions in many fields, such as physics, biology, chemistry, and astronomy. Miniaturizing the currently bulky spectrometers has become imperative for the wide range of applications that demand in situ or even in vitro characterization systems, a field that is growing rapidly. Benchtop spectrometers are capable of offerin… ▽ More Optical spectroscopic sensors are a powerful tool to reveal light-matter interactions in many fields, such as physics, biology, chemistry, and astronomy. Miniaturizing the currently bulky spectrometers has become imperative for the wide range of applications that demand in situ or even in vitro characterization systems, a field that is growing rapidly. Benchtop spectrometers are capable of offering superior resolution and spectral range, but at the expense of requiring a large size. In this paper, we propose a novel method that redesigns spectroscopic sensors via the use of programmable photonic circuits. Drawing from compressive sensing theory, we start by investigating the most ideal sampling matrix for a reconstructive spectrometer and reveal that a sufficiently large number of sampling channels is a prerequisite for both fine resolution and low reconstruction error. This number is, however, still considerably smaller than that of the reconstructed spectral pixels, benefitting from the nature of reconstruction algorithms. We then show that the cascading of a few engineered MZI elements can be readily programmed to create an exponentially scalable number of such sampling spectral responses over an ultra-broad bandwidth, allowing for ultra-high resolution down to single-digit picometers without incurring additional hardware costs. Experimentally, we implement an on-chip spectrometer with a fully-programmable 6-stage cascaded MZI structure and demonstrate a < 10 pm resolution with a > 200 nm bandwidth using only 729 sampling channels. This achieves a bandwidth-to-resolution ratio of over 20,000, which is, to our best knowledge, about one order of magnitude greater than any reported miniaturized spectrometers to date. We further illustrate that by employing dispersion-engineered waveguide components, the device bandwidth can be extended to over 400 nm. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.04619 [pdf, other]

ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections

Authors: Chun-Han Yao, Amit Raj, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

Abstract: Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guide… ▽ More Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guided by 2D diffusion priors from Stable Diffusion. First, we enhance the input images with occlusions/truncation via 2D diffusion to obtain cleaner mask estimates and semantic features. Second, we perform diffusion-guided 3D optimization to estimate shape and texture that are of high-fidelity and faithful to input images. We also propose a novel technique to calculate more stable image-level gradients via diffusion models compared to existing alternatives. Finally, we produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations. Extensive evaluations on multiple existing datasets as well as newly introduced noisy web image collections with occlusions and truncation demonstrate that ARTIC3D outputs are more robust to noisy images, higher quality in terms of shape and texture details, and more realistic when animated. Project page: https://chhankyao.github.io/artic3d/ △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Project page: https://chhankyao.github.io/artic3d/

arXiv:2306.02001 [pdf, ps, other]

A globally convergent difference-of-convex algorithmic framework and application to log-determinant optimization problems

Authors: Chaorui Yao, Xin Jiang

Abstract: The difference-of-convex algorithm (DCA) is a conceptually simple method for the minimization of (possibly) nonconvex functions that are expressed as the difference of two convex functions. At each iteration, DCA constructs a global overestimator of the objective and solves the resulting convex subproblem. Despite its conceptual simplicity, the theoretical understanding and algorithmic framework o… ▽ More The difference-of-convex algorithm (DCA) is a conceptually simple method for the minimization of (possibly) nonconvex functions that are expressed as the difference of two convex functions. At each iteration, DCA constructs a global overestimator of the objective and solves the resulting convex subproblem. Despite its conceptual simplicity, the theoretical understanding and algorithmic framework of DCA needs further investigation. In this paper, global convergence of DCA at a linear rate is established under an extended Polyak--Łojasiewicz condition. The proposed condition holds for a class of DC programs with a bounded, closed, and convex constraint set, for which global convergence of DCA cannot be covered by existing analyses. Moreover, the DCProx computational framework is proposed, in which the DCA subproblems are solved by a primal--dual proximal algorithm with Bregman distances. With a suitable choice of Bregman distances, DCProx has simple update rules with cheap per-iteration complexity. As an application, DCA is applied to several fundamental problems in network information theory, for which no existing numerical methods are able to compute the global optimum. For these problems, our analysis proves the global convergence of DCA, and more importantly, DCProx solves the DCA subproblems efficiently. Numerical experiments are conducted to verify the efficiency of DCProx. △ Less

Submitted 3 June, 2023; originally announced June 2023.

arXiv:2305.18548 [pdf]

I/O-efficient iterative matrix inversion with photonic integrated circuits

Authors: Minjia Chen, Yizhi Wang, Chunhui Yao, Adrian Wonfor, Shuai Yang, Richard Penty, Qixiang Cheng

Abstract: Photonic integrated circuits have been extensively explored for optical processing with the aim of breaking the speed bottleneck of digital electronics. However, the input/output (IO) bottleneck remains one of the key barriers. Here we report a novel photonic iterative processor (PIP) for matrix-inversion-intensive applications. The direct reuse of inputted data in the optical domain unlocks the p… ▽ More Photonic integrated circuits have been extensively explored for optical processing with the aim of breaking the speed bottleneck of digital electronics. However, the input/output (IO) bottleneck remains one of the key barriers. Here we report a novel photonic iterative processor (PIP) for matrix-inversion-intensive applications. The direct reuse of inputted data in the optical domain unlocks the potential to break the IO bottleneck. We demonstrate notable IO advantages with a lossless PIP for real-valued matrix inversion and integral-differential equation solving, as well as a coherent PIP with optical loops integrated on-chip, enabling complex-valued computation and a net inversion time of 1.2 ns. Furthermore, we estimate at least an order of magnitude enhancement in IO efficiency of a PIP over photonic single-pass processors and the state-of-the-art electronic processors for reservoir training tasks and MIMO precoding tasks, indicating the huge potential of PIP technology in practical applications. △ Less

Submitted 22 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.18442 [pdf, other]

Improved Projection-free Online Continuous Submodular Maximization

Authors: Yucheng Liao, Yuanyu Wan, Chang Yao, Mingli Song

Abstract: We investigate the problem of online learning with monotone and continuous DR-submodular reward functions, which has received great attention recently. To efficiently handle this problem, especially in the case with complicated decision sets, previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations and linear optimiza… ▽ More We investigate the problem of online learning with monotone and continuous DR-submodular reward functions, which has received great attention recently. To efficiently handle this problem, especially in the case with complicated decision sets, previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations and linear optimization steps in total. However, it only attains a $(1-1/e)$-regret bound of $O(T^{4/5})$. In this paper, we propose an improved projection-free algorithm, namely POBGA, which reduces the regret bound to $O(T^{3/4})$ while keeping the same computational complexity as Mono-FW. Instead of modifying Mono-FW, our key idea is to make a novel combination of a projection-based algorithm called online boosting gradient ascent, an infeasible projection technique, and a blocking technique. Furthermore, we consider the decentralized setting and develop a variant of POBGA, which not only reduces the current best regret bound of efficient projection-free algorithms for this setting from $O(T^{4/5})$ to $O(T^{3/4})$, but also reduces the total communication complexity from $O(T)$ to $O(\sqrt{T})$. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.15940 [pdf, other]

Mask Attack Detection Using Vascular-weighted Motion-robust rPPG Signals

Authors: Chenglin Yao, Jianfeng Ren, Ruibin Bai, Heshan Du, Jiang Liu, Xudong Jiang

Abstract: Detecting 3D mask attacks to a face recognition system is challenging. Although genuine faces and 3D face masks show significantly different remote photoplethysmography (rPPG) signals, rPPG-based face anti-spoofing methods often suffer from performance degradation due to unstable face alignment in the video sequence and weak rPPG signals. To enhance the rPPG signal in a motion-robust way, a landma… ▽ More Detecting 3D mask attacks to a face recognition system is challenging. Although genuine faces and 3D face masks show significantly different remote photoplethysmography (rPPG) signals, rPPG-based face anti-spoofing methods often suffer from performance degradation due to unstable face alignment in the video sequence and weak rPPG signals. To enhance the rPPG signal in a motion-robust way, a landmark-anchored face stitching method is proposed to align the faces robustly and precisely at the pixel-wise level by using both SIFT keypoints and facial landmarks. To better encode the rPPG signal, a weighted spatial-temporal representation is proposed, which emphasizes the face regions with rich blood vessels. In addition, characteristics of rPPG signals in different color spaces are jointly utilized. To improve the generalization capability, a lightweight EfficientNet with a Gated Recurrent Unit (GRU) is designed to extract both spatial and temporal features from the rPPG spatial-temporal representation for classification. The proposed method is compared with the state-of-the-art methods on five benchmark datasets under both intra-dataset and cross-dataset evaluations. The proposed method shows a significant and consistent improvement in performance over other state-of-the-art rPPG-based methods for face spoofing detection. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.12131 [pdf, other]

Non-stationary Online Convex Optimization with Arbitrary Delays

Authors: Yuanyu Wan, Chang Yao, Mingli Song, Lijun Zhang

Abstract: Online convex optimization (OCO) with arbitrary delays, in which gradients or other information of functions could be arbitrarily delayed, has received increasing attention recently. Different from previous studies that focus on stationary environments, this paper investigates the delayed OCO in non-stationary environments, and aims to minimize the dynamic regret with respect to any sequence of co… ▽ More Online convex optimization (OCO) with arbitrary delays, in which gradients or other information of functions could be arbitrarily delayed, has received increasing attention recently. Different from previous studies that focus on stationary environments, this paper investigates the delayed OCO in non-stationary environments, and aims to minimize the dynamic regret with respect to any sequence of comparators. To this end, we first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order. Despite its simplicity, our novel analysis shows that the dynamic regret of DOGD can be automatically bounded by $O(\sqrt{\bar{d}T}(P_T+1))$ under mild assumptions, and $O(\sqrt{dT}(P_T+1))$ in the worst case, where $\bar{d}$ and $d$ denote the average and maximum delay respectively, $T$ is the time horizon, and $P_T$ is the path-length of comparators. Furthermore, we develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(\sqrt{\bar{d}T(P_T+1)})$ and $O(\sqrt{dT(P_T+1)})$, respectively. The key idea is to run multiple DOGD with different learning rates, and utilize a meta-algorithm to track the best one based on their delayed performance. Finally, we demonstrate that our improved algorithm is optimal in the worst case by deriving a matching lower bound. △ Less

Submitted 23 June, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

Comments: Camera-ready Version for ICML2024

arXiv:2305.08325 [pdf, other]

Screentone-Aware Manga Super-Resolution Using DeepLearning

Authors: Chih-Yuan Yao, Husan-Ting Chou, Yu-Sheng Lin, Kuo-wei Chen

Abstract: Manga, as a widely beloved form of entertainment around the world, have shifted from paper to electronic screens with the proliferation of handheld devices. However, as the demand for image quality increases with screen development, high-quality images can hinder transmission and affect the viewing experience. Traditional vectorization methods require a significant amount of manual parameter adjus… ▽ More Manga, as a widely beloved form of entertainment around the world, have shifted from paper to electronic screens with the proliferation of handheld devices. However, as the demand for image quality increases with screen development, high-quality images can hinder transmission and affect the viewing experience. Traditional vectorization methods require a significant amount of manual parameter adjustment to process screentone. Using deep learning, lines and screentone can be automatically extracted and image resolution can be enhanced. Super-resolution can convert low-resolution images to high-resolution images while maintaining low transmission rates and providing high-quality results. However, traditional Super Resolution methods for improving manga resolution do not consider the meaning of screentone density, resulting in changes to screentone density and loss of meaning. In this paper, we aims to address this issue by first classifying the regions and lines of different screentone in the manga using deep learning algorithm, then using corresponding super-resolution models for quality enhancement based on the different classifications of each block, and finally combining them to obtain images that maintain the meaning of screentone and lines in the manga while improving image resolution. △ Less

Submitted 14 May, 2023; originally announced May 2023.

arXiv:2305.06737 [pdf, other]

A Diagonal Splitting Algorithm for Adaptive Group Testing

Authors: Chaorui Yao, Pavlos Nikolopoulos, Christina Fragouli

Abstract: Group testing enables to identify infected individuals in a population using a smaller number of tests than individual testing. To achieve this, group testing algorithms commonly assume knowledge of the number of infected individuals; nonadaptive and several adaptive algorithms fall in this category. Some adaptive algorithms, like binary splitting, operate without this assumption, but require a nu… ▽ More Group testing enables to identify infected individuals in a population using a smaller number of tests than individual testing. To achieve this, group testing algorithms commonly assume knowledge of the number of infected individuals; nonadaptive and several adaptive algorithms fall in this category. Some adaptive algorithms, like binary splitting, operate without this assumption, but require a number of stages that may scale linearly with the size of the population. In this paper we contribute a new algorithm that enables a balance between the number of tests and the number of stages used, and which we term diagonal group testing. Diagonal group testing, like binary splitting, does not require knowledge of the number of infected individuals, yet unlike binary splitting, is order-optimal w.r.t. the expected number of tests it requires and is guaranteed to succeed in a small number of stages that scales at most logarithmically with the size of the population. Numerical evaluations, for diagonal group testing and a hybrid approach we propose, support our theoretical findings. △ Less

Submitted 14 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2304.14761 [pdf, other]

Topological regularity for solutions to the generalised Hopf equation

Authors: Gaven Martin, Cong Yao

Abstract: The generalised Hopf equation is the first order nonlinear equation with data $Φ$ a holomorphic functions and $η\geq 1$ a positive weight, \[ h_w\,\overline{h_\wbar}\,η(w) = Φ.\] The Hopf equation is the special case $η(w)=\tildeη(h(w))$ and reflects that $h$ is harmonic with respect to the conformal metric $\sqrt{\tildeη(z)}|dz|$. This article obtains conditions on the data to ensure that a solut… ▽ More The generalised Hopf equation is the first order nonlinear equation with data $Φ$ a holomorphic functions and $η\geq 1$ a positive weight, \[ h_w\,\overline{h_\wbar}\,η(w) = Φ.\] The Hopf equation is the special case $η(w)=\tildeη(h(w))$ and reflects that $h$ is harmonic with respect to the conformal metric $\sqrt{\tildeη(z)}|dz|$. This article obtains conditions on the data to ensure that a solution is open and discrete. We also prove a strong uniqueness result. △ Less

Submitted 28 April, 2023; originally announced April 2023.

arXiv:2304.11402 [pdf, other]

doi 10.1103/PhysRevB.108.195402

Quantum transport theory of hybrid superconducting systems

Authors: Chuan-Zhe Yao, Hon-Lam Lai, Wei-Min Zhang

Abstract: We present a quantum transport theory for hybrid superconducting systems based on our exact master equation approach. The total transient transport current is decomposed into components that describe coherent transports through different paths of particle and hole channels. We show that the coherent transports are resultant interferences of numerous repeated tunneling processes and cannot be rende… ▽ More We present a quantum transport theory for hybrid superconducting systems based on our exact master equation approach. The total transient transport current is decomposed into components that describe coherent transports through different paths of particle and hole channels. We show that the coherent transports are resultant interferences of numerous repeated tunneling processes and cannot be rendered as a simple normal transmission or Andreev reflection as usually described quantum transport involving superconductivity. As a practical application, we find that the coherent transport currents passing through a pair of well-separated Majorana zero modes vanish due to the totally destructive interference between the particle and hole channels. △ Less

Submitted 2 November, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

Journal ref: Phys. Rev. B 108, 195402 (2023)

arXiv:2304.10759 [pdf, other]

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

Authors: Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao

Abstract: Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has… ▽ More Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (\eg, the F1 score of RE on FUNSD is boosted from 80.35\% to 89.45\%). The code and models are publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/GeoLayoutLM △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: CVPR 2023 Highlight

arXiv:2303.13095 [pdf, other]

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

Authors: Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao

Abstract: Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these bench… ▽ More Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these benchmarks. As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common. All these factors may lead to failures in information extraction. Therefore, as the second contribution, we explore an alternative approach to precisely and robustly extract key information from document images under such tough conditions. Specifically, in contrast to previous methods, which usually either incorporate visual information into a multi-modal architecture or train text spotting and information extraction in an end-to-end fashion, we explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities, which could largely benefit entity labeling and linking. Extensive experiments on standard benchmarks in this field as well as the proposed dataset demonstrate that the proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models. Dataset is available at https://www.modelscope.cn/datasets/damo/SIBR/summary. △ Less

Submitted 28 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.09112 [pdf, other]

SigVIC: Spatial Importance Guided Variable-Rate Image Compression

Authors: Jiaming Liang, Meiqin Liu, Chao Yao, Chunyu Lin, Yao Zhao

Abstract: Variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression that trains multiple models for different rate-distortion tradeoffs. One of the most common approaches for variable-rate is to channel-wisely or spatial-uniformly scale the internal features. However, the diversity of spatial importance is instructive for bit allocation of image compression. In… ▽ More Variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression that trains multiple models for different rate-distortion tradeoffs. One of the most common approaches for variable-rate is to channel-wisely or spatial-uniformly scale the internal features. However, the diversity of spatial importance is instructive for bit allocation of image compression. In this paper, we introduce a Spatial Importance Guided Variable-rate Image Compression (SigVIC), in which a spatial gating unit (SGU) is designed for adaptively learning a spatial importance mask. Then, a spatial scaling network (SSN) takes the spatial importance mask to guide the feature scaling and bit allocation for variable-rate. Moreover, to improve the quality of decoded image, Top-K shallow features are selected to refine the decoded features through a shallow feature fusion module (SFFM). Experiments show that our method outperforms other learning-based methods (whether variable-rate or not) and traditional codecs, with storage saving and high flexibility. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE ICASSP2023 (Camera Ready)

arXiv:2303.04401 [pdf, other]

doi 10.1214/23-EJP1061

Limit of the Wulff crystal when approaching criticality for isoperimetry in 2D percolation

Authors: Chang-Long Yao

Abstract: We consider isoperimetric sets, i.e., sets with minimal vertex boundary for a prescribed volume, of the infinite cluster of supercritical site percolation on the triangular lattice. Let $p$ be the percolation parameter and let $p_c$ be the critical point. By adapting the proof of Biskup, Louidor, Procaccia and Rosenthal [6] for isoperimetry in bond percolation on the square lattice, we show that t… ▽ More We consider isoperimetric sets, i.e., sets with minimal vertex boundary for a prescribed volume, of the infinite cluster of supercritical site percolation on the triangular lattice. Let $p$ be the percolation parameter and let $p_c$ be the critical point. By adapting the proof of Biskup, Louidor, Procaccia and Rosenthal [6] for isoperimetry in bond percolation on the square lattice, we show that the isoperimetric sets, when suitably rescaled, converge almost surely to a translation of the normalized Wulff crystal $\widehat{W}_p$. More importantly, we prove that $\widehat{W}_p$ tends to a Euclidean disk as $p\downarrow p_c$. This settles the site version of a conjecture proposed in [6]. A key input to the proof is the convergence of the limit shapes for near-critical Bernoulli first-passage percolation proved by the author recently. △ Less

Submitted 22 November, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: 20 pages, 6 figures. To appear in Electronic Journal of Probability

MSC Class: 60K35; 82B43

Journal ref: Electronic Journal of Probability 28, no. 165, 1-20 (2023)

arXiv:2303.03730 [pdf, other]

LORE: Logical Location Regression Network for Table Structure Recognition

Authors: Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu

Abstract: Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate the corresponding markup sequences from the table images. However, they either count on additional heuristic rules to recover the table structures, or require a huge amount… ▽ More Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate the corresponding markup sequences from the table images. However, they either count on additional heuristic rules to recover the table structures, or require a huge amount of training data and time-consuming sequential decoders. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time combines logical location regression together with spatial location regression of table cells. Our proposed LORE is conceptually simpler, easier to train and more accurate than previous TSR models of other paradigms. Experiments on standard benchmarks demonstrate that LORE consistently outperforms prior arts. Code is available at https:// github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LORE-TSR. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.03131 [pdf, other]

Video Question Answering Using CLIP-Guided Visual-Text Attention

Authors: Shuhong Ye, Weikai Kong, Chenglin Yao, Jianfeng Ren, Xudong Jiang

Abstract: Cross-modal learning of video and text plays a key role in Video Question Answering (VideoQA). In this paper, we propose a visual-text attention mechanism to utilize the Contrastive Language-Image Pre-training (CLIP) trained on lots of general domain language-image pairs to guide the cross-modal learning for VideoQA. Specifically, we first extract video features using a TimeSformer and text featur… ▽ More Cross-modal learning of video and text plays a key role in Video Question Answering (VideoQA). In this paper, we propose a visual-text attention mechanism to utilize the Contrastive Language-Image Pre-training (CLIP) trained on lots of general domain language-image pairs to guide the cross-modal learning for VideoQA. Specifically, we first extract video features using a TimeSformer and text features using a BERT from the target application domain, and utilize CLIP to extract a pair of visual-text features from the general-knowledge domain through the domain-specific learning. We then propose a Cross-domain Learning to extract the attention information between visual and linguistic features across the target domain and general domain. The set of CLIP-guided visual-text features are integrated to predict the answer. The proposed method is evaluated on MSVD-QA and MSRVTT-QA datasets, and outperforms state-of-the-art methods. △ Less

Submitted 8 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Submitted to the 2023 IEEE International Conference on Image Processing (ICIP 2023)

ACM Class: I.2.10

arXiv:2303.03105 [pdf, other]

Confidence-based Event-centric Online Video Question Answering on a Newly Constructed ATBS Dataset

Authors: Weikai Kong, Shuhong Ye, Chenglin Yao, Jianfeng Ren

Abstract: Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O^2VQA). It requires an online state-updating mechanism for the so… ▽ More Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O^2VQA). It requires an online state-updating mechanism for the solver to decide if the collected information is sufficient to conclude an answer. We then propose a Confidence-based Event-centric Online Video Question Answering (CEO-VQA) model to solve this problem. Furthermore, a dataset called Answer Target in Background Stream (ATBS) is constructed to evaluate this newly developed online VideoQA application. Compared to the baseline VideoQA method that watches the whole video, the experimental results show that the proposed method achieves a significant performance gain. △ Less

Submitted 7 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted for publication at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

arXiv:2302.09528 [pdf]

A Comprehensive Evaluation Study on Risk Level Classification of Melanoma by Computer Vision on ISIC 2016-2020 Datasets

Authors: Chengdong Yao

Abstract: Skin cancer is the most common type of cancer. Specifically, melanoma is the cause of 75% of skin cancer deaths, although it is the least common skin cancer. Better detection of melanoma could have a positive impact on millions of people. The ISIC archive contains the largest publicly available collection of dermatoscopic images of skin lesions. In this research, we investigate the efficacy of app… ▽ More Skin cancer is the most common type of cancer. Specifically, melanoma is the cause of 75% of skin cancer deaths, although it is the least common skin cancer. Better detection of melanoma could have a positive impact on millions of people. The ISIC archive contains the largest publicly available collection of dermatoscopic images of skin lesions. In this research, we investigate the efficacy of applying advanced deep learning techniques in computer vision to identify melanoma in images of skin lesions. Through reviewing previous methods, including pre-trained models, deep-learning classifiers, transfer learning, etc., we demonstrate the applicability of the popular deep learning methods on critical clinical problems such as identifying melanoma. Finally, we proposed a processing flow with a validation AUC greater than 94% and a sensitivity greater than 90% on ISIC 2016 - 2020 datasets. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: 9 pages, 12 figures, 11 tables

arXiv:2302.09144 [pdf]

Designing a Wayfinding Robot for People with Visual Impairments

Authors: Shuijing Liu, Aamir Hasan, Kaiwen Hong, Chun-Kai Yao, Justin Lin, Weihang Liang, Megan A. Bayles, Wendy A. Rogers, Katherine Driggs-Campbell

Abstract: People with visual impairments (PwVI) often have difficulties navigating through unfamiliar indoor environments. However, current wayfinding tools are fairly limited. In this short paper, we present our in-progress work on a wayfinding robot for PwVI. The robot takes an audio command from the user that specifies the intended destination. Then, the robot autonomously plans a path to navigate to the… ▽ More People with visual impairments (PwVI) often have difficulties navigating through unfamiliar indoor environments. However, current wayfinding tools are fairly limited. In this short paper, we present our in-progress work on a wayfinding robot for PwVI. The robot takes an audio command from the user that specifies the intended destination. Then, the robot autonomously plans a path to navigate to the goal. We use sensors to estimate the real-time position of the user, which is fed to the planner to improve the safety and comfort of the user. In addition, the robot describes the surroundings to the user periodically to prevent disorientation and potential accidents. We demonstrate the feasibility of our design in a public indoor environment. Finally, we analyze the limitations of our current design, as well as our insights and future work. A demonstration video can be found at https://youtu.be/BS9r5bkIass. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: Presented at ICRA 2022 Workshop on Intelligent Control Methods and Machine Learning Algorithms for Human-Robot Interaction and Assistive Robotics

arXiv:2301.07274 [pdf, other]

doi 10.1007/JHEP03(2023)137

Revealing the origin of neutrino masses through the Type II Seesaw mechanism at high-energy muon colliders

Authors: Tong Li, Chang-Yuan Yao, Man Yuan

Abstract: The future muon collider can play as an ideal machine to search for new physics at high energies. In this work, we study the search potential of the heavy Higgs triplet in the Type II Seesaw mechanism at muon colliders with high collision energy and high luminosity. The latest neutrino oscillation data are taken into account for realizing the leptonic decay modes of the charged Higgs bosons… ▽ More The future muon collider can play as an ideal machine to search for new physics at high energies. In this work, we study the search potential of the heavy Higgs triplet in the Type II Seesaw mechanism at muon colliders with high collision energy and high luminosity. The latest neutrino oscillation data are taken into account for realizing the leptonic decay modes of the charged Higgs bosons $(H^{\pm\pm},~H^{\pm})$ in the Type II Seesaw. We show the impact of neutrino mass and mixing parameters on the purely leptonic decays. The pair production of doubly charged Higgs $H^{++}H^{--}$ is through direct $μ^+μ^-$ annihilation and vector boson fusion (VBF) processes at muon collider. The associated production $H^{\pm\pm}H^{\mp}$ can only be induced by VBF processes. We simulate both the purely leptonic and bosonic signal channels of charged Higgs bosons in Type II Seesaw, together with the Standard Model backgrounds. We show the required luminosity for the discovery of the charged Higgses and the reachable limits on the leptonic decay branching fractions. △ Less

Submitted 27 February, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: 33 pages, 18 figures, 4 tables. Version accepted for publication in JHEP

Report number: DESY-23-006

arXiv:2301.06523 [pdf]

doi 10.1038/s41567-022-01904-5

Giant Nernst effect in the crossover between Fermi liquid and strange metal

Authors: Yusen Yang, Qian Tao, Yuqiang Fang, Guoxiong Tang, Chao Yao, Xiaoxian Yan, Chenxi Jiang, Xiangfan Xu, Fuqiang Huang, Wenxin Ding, Yu Wang, Zhiqiang Mao, Hui Xing, Zhu-An Xu

Abstract: The strange-metal state is a crucial problem in condensed matter physics highlighted by its ubiquity in almost all major correlated systems[1-7]. Its understanding could provide important insight into high-Tc superconductivity[2] and quantum criticality[8]. However, with the Fermi liquid theory failing in strange metals, understanding the highly unconventional behaviors has been a long-standing ch… ▽ More The strange-metal state is a crucial problem in condensed matter physics highlighted by its ubiquity in almost all major correlated systems[1-7]. Its understanding could provide important insight into high-Tc superconductivity[2] and quantum criticality[8]. However, with the Fermi liquid theory failing in strange metals, understanding the highly unconventional behaviors has been a long-standing challenge. Fundamental aspects of strange metals remain elusive, including the nature of their charge carriers[1]. Here, we report the observation of a giant Nernst response in the strange-metal state in a two-dimensional superconductor 2M-WS2. A giant Nernst coefficient comparable to the vortex Nernst signal in superconducting cuprates, and its high sensitivity to carrier mobility, are found when the system enters the strange-metal state from the Fermi liquid state. The temperature and magnetic field dependence of the giant Nernst peak rule out the relevance of both Landau quasiparticles and superconductivity. Instead, the giant Nernst peak at the crossover indicates a dramatic change in carrier entropy when entering the strange-metal state. The presence of such an anomalous Nernst response is further confirmed in other iconic strange metals, suggesting its universality and places stringent experimental constraints on the mechanism of strange metals. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Journal ref: Further revised version published in Nature Physics 2023

arXiv:2301.02503 [pdf, other]

doi 10.1007/JHEP03(2023)138

Systematic study of one-loop realizations of $d=7$ long-range $0νββ$ decay operators

Authors: Ping-Tao Chen, Gui-Jun Ding, Chang-Yuan Yao

Abstract: We study the systematical one-loop decomposition of the dimension-7 long-range $0νββ$ decay operators. We find that there are 3 genuine one-loop topologies and 8 diagrams. The procedure to determine the SM quantum number assignments for both internal and external fields is presented. The Majorana neutrino mass in long-range $0νββ$ models is discussed. We also present a one-loop $0νββ$ decay model… ▽ More We study the systematical one-loop decomposition of the dimension-7 long-range $0νββ$ decay operators. We find that there are 3 genuine one-loop topologies and 8 diagrams. The procedure to determine the SM quantum number assignments for both internal and external fields is presented. The Majorana neutrino mass in long-range $0νββ$ models is discussed. We also present a one-loop $0νββ$ decay model which produces Majorana neutrino mass at three-loop level. The phenomenological predictions for light neutrino mass and $0νββ$ decay half-life time including both mass mechanism and long-range contribution are studied. △ Less

Submitted 23 March, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: 42 pages, 19 figures

Report number: DESY-22-211

arXiv:2212.11042 [pdf, other]

Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble

Authors: Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

Abstract: Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem. Most prior methods rely on large-scale image datasets, dense temporal correspondence, or human annotations like camera pose, 2D keypoints, and shape templates. We propose Hi-LASSIE, which performs 3D articulated recon… ▽ More Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem. Most prior methods rely on large-scale image datasets, dense temporal correspondence, or human annotations like camera pose, 2D keypoints, and shape templates. We propose Hi-LASSIE, which performs 3D articulated reconstruction from only 20-30 online images in the wild without any user-defined shape or skeleton templates. We follow the recent work of LASSIE that tackles a similar problem setting and make two significant advances. First, instead of relying on a manually annotated 3D skeleton, we automatically estimate a class-specific skeleton from the selected reference image. Second, we improve the shape reconstructions with novel instance-specific optimization strategies that allow reconstructions to faithful fit on each instance while preserving the class-specific priors learned across all images. Experiments on in-the-wild image ensembles show that Hi-LASSIE obtains higher fidelity state-of-the-art 3D reconstructions despite requiring minimum user input. △ Less

Submitted 25 March, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

Comments: Project page: https://chhankyao.github.io/hi-lassie/

Showing 51–100 of 330 results for author: Yao, C