Search | arXiv e-print repository

Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM

Authors: Peiran Yao, Kostyantyn Guzhva, Denilson Barbosa

Abstract: Symbolic sentence meaning representations, such as AMR (Abstract Meaning Representation) provide expressive and structured semantic graphs that act as intermediates that simplify downstream NLP tasks. However, the instruction-following capability of large language models (LLMs) offers a shortcut to effectively solve NLP tasks, questioning the utility of semantic graphs. Meanwhile, recent work has… ▽ More Symbolic sentence meaning representations, such as AMR (Abstract Meaning Representation) provide expressive and structured semantic graphs that act as intermediates that simplify downstream NLP tasks. However, the instruction-following capability of large language models (LLMs) offers a shortcut to effectively solve NLP tasks, questioning the utility of semantic graphs. Meanwhile, recent work has also shown the difficulty of using meaning representations merely as a helpful auxiliary for LLMs. We revisit the position of semantic graphs in syntactic simplification, the task of simplifying sentence structures while preserving their meaning, which requires semantic understanding, and evaluate it on a new complex and natural dataset. The AMR-based method that we propose, AMRS$^3$, demonstrates that state-of-the-art meaning representations can lead to easy-to-implement simplification methods with competitive performance and unique advantages in cost, interpretability, and generalization. With AMRS$^3$ as an anchor, we discover that syntactic simplification is a task where semantic graphs are helpful in LLM prompting. We propose AMRCoC prompting that guides LLMs to emulate graph algorithms for explicit symbolic reasoning on AMR graphs, and show its potential for improving LLM on semantic-centered tasks like syntactic simplification. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted at TextGraphs-17 @ ACL 2024

arXiv:2406.14629 [pdf, other]

Can LLMs Learn by Teaching? A Preliminary Study

Authors: Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

Abstract: Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching not only improves students but also improves teachers. We ask: Can LLMs also learn by teaching (LbT)? If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In… ▽ More Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching not only improves students but also improves teachers. We ask: Can LLMs also learn by teaching (LbT)? If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In this paper, we provide a preliminary exploration of this ambitious agenda. We show that LbT ideas can be incorporated into existing LLM training/prompting pipelines and provide noticeable improvements. Specifically, we design three methods, each mimicking one of the three levels of LbT in humans: observing students' feedback, learning from the feedback, and learning iteratively, with the goals of improving answer accuracy without training and improving models' inherent capability with fine-tuning. The findings are encouraging. For example, similar to LbT in human, we see that: (1) LbT can induce weak-to-strong generalization: strong models can improve themselves by teaching other weak models; (2) Diversity in students might help: teaching multiple students could be better than teaching one student or the teacher itself. We hope that this early promise can inspire future research on LbT and more broadly adopting the advanced techniques in education to improve LLMs. The code is available at https://github.com/imagination-research/lbt. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2405.16702 [pdf, other]

Accurate and Nuanced Open-QA Evaluation Through Textual Entailment

Authors: Peiran Yao, Denilson Barbosa

Abstract: Open-domain question answering (Open-QA) is a common task for evaluating large language models (LLMs). However, current Open-QA evaluations are criticized for the ambiguity in questions and the lack of semantic understanding in evaluators. Complex evaluators, powered by foundation models or LLMs and pertaining to semantic equivalence, still deviate from human judgments by a large margin. We propos… ▽ More Open-domain question answering (Open-QA) is a common task for evaluating large language models (LLMs). However, current Open-QA evaluations are criticized for the ambiguity in questions and the lack of semantic understanding in evaluators. Complex evaluators, powered by foundation models or LLMs and pertaining to semantic equivalence, still deviate from human judgments by a large margin. We propose to study the entailment relations of answers to identify more informative and more general system answers, offering a much closer evaluation to human judgment on both NaturalQuestions and TriviaQA while being learning-free. The entailment-based evaluation we propose allows the assignment of bonus or partial marks by quantifying the inference gap between answers, enabling a nuanced ranking of answer correctness that has higher AUC than current methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: To appear at ACL 2024 (Findings)

arXiv:2404.18771 [pdf, other]

KBX: Verified Model Synchronization via Formal Bidirectional Transformation

Authors: Jianhong Zhao, Yongwang Zhao, Peisen Yao, Fanlang Zeng, Bohua Zhan, Kui Ren

Abstract: Complex safety-critical systems require multiple models for a comprehensive description, resulting in error-prone development and laborious verification. Bidirectional transformation (BX) is an approach to automatically synchronizing these models. However, existing BX frameworks lack formal verification to enforce these models' consistency rigorously. This paper introduces KBX, a formal bidirectio… ▽ More Complex safety-critical systems require multiple models for a comprehensive description, resulting in error-prone development and laborious verification. Bidirectional transformation (BX) is an approach to automatically synchronizing these models. However, existing BX frameworks lack formal verification to enforce these models' consistency rigorously. This paper introduces KBX, a formal bidirectional transformation framework for verified model synchronization. First, we present a matching logic-based BX model, providing a logical foundation for constructing BX definitions within the $\mathbb{K}$ framework. Second, we propose algorithms to synthesize formal BX definitions from unidirectional ones, which allows developers to focus on crafting the unidirectional definitions while disregarding the reverse direction and missing information recovery for synchronization. Afterward, we harness $\mathbb{K}$ to generate a formal synchronizer from the synthesized definitions for consistency maintenance and verification. To evaluate the effectiveness of KBX, we conduct a comparative analysis against existing BX frameworks. Furthermore, we demonstrate the application of KBX in constructing a BX between UML and HCSP for real-world scenarios, showcasing an 82.8\% reduction in BX development effort compared to manual specification writing in $\mathbb{K}$. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2401.00454 [pdf, other]

Quantum and Classical Communication Complexity of Permutation-Invariant Functions

Authors: Ziyi Guan, Yunqi Huang, Penghui Yao, Zekun Ye

Abstract: This paper gives a nearly tight characterization of the quantum communication complexity of the permutation-invariant Boolean functions. With such a characterization, we show that the quantum and randomized communication complexity of the permutation-invariant Boolean functions are quadratically equivalent (up to a logarithmic factor). Our results extend a recent line of research regarding query c… ▽ More This paper gives a nearly tight characterization of the quantum communication complexity of the permutation-invariant Boolean functions. With such a characterization, we show that the quantum and randomized communication complexity of the permutation-invariant Boolean functions are quadratically equivalent (up to a logarithmic factor). Our results extend a recent line of research regarding query complexity \cite{AA14, Cha19, BCG+20} to communication complexity, showing symmetry prevents exponential quantum speedups. Furthermore, we show the Log-rank Conjecture holds for any non-trivial total permutation-invariant Boolean function. Moreover, we establish a relationship between the quantum/classical communication complexity and the approximate rank of permutation-invariant Boolean functions. This implies the correctness of the Log-approximate-rank Conjecture for permutation-invariant Boolean functions in both randomized and quantum settings (up to a logarithmic factor). △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: accepted in STACS 2024

arXiv:2312.04360 [pdf, other]

The Computational Advantage of MIP* Vanishes in the Presence of Noise

Authors: Yangjing Dong, Honghao Fu, Anand Natarajan, Minglong Qin, Haochen Xu, Penghui Yao

Abstract: Quantum multiprover interactive proof systems with entanglement MIP* are much more powerful than their classical counterpart MIP (Babai et al. '91, Ji et al. '20): while MIP = NEXP, the quantum class MIP* is equal to RE, a class including the halting problem. This is because the provers in MIP* can share unbounded quantum entanglement. However, recent works of Qin and Yao '21 and '23 have shown th… ▽ More Quantum multiprover interactive proof systems with entanglement MIP* are much more powerful than their classical counterpart MIP (Babai et al. '91, Ji et al. '20): while MIP = NEXP, the quantum class MIP* is equal to RE, a class including the halting problem. This is because the provers in MIP* can share unbounded quantum entanglement. However, recent works of Qin and Yao '21 and '23 have shown that this advantage is significantly reduced if the provers' shared state contains noise. This paper attempts to exactly characterize the effect of noise on the computational power of quantum multiprover interactive proof systems. We investigate the quantum two-prover one-round interactive system MIP*[poly, O(1)], where the verifier sends polynomially many bits to the provers and the provers send back constantly many bits. We show noise completely destroys the computational advantage given by shared entanglement in this model. Specifically, we show that if the provers are allowed to share arbitrarily many EPR states, where each EPR state is affected by an arbitrarily small constant amount of noise, the resulting complexity class is contained in NEXP = MIP. This improves significantly on the previous best-known bound of NEEEXP (nondeterministic triply exponential time) by Qin and Yao '21. We also show that this collapse in power is due to the noise, rather than the O(1) answer size, by showing that allowing for noiseless EPR states gives the class the full power of RE = MIP*[poly, poly]. Along the way, we develop two technical tools of independent interest. First, we give a new, deterministic tester for the positivity of an exponentially large matrix, provided it has a low-degree Fourier decomposition in terms of Pauli matrices. Secondly, we develop a new invariance principle for smooth matrix functions having bounded third-order Fréchet derivatives or which are Lipschitz continous. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Comments are welcome!

arXiv:2311.04723 [pdf, other]

Communication Complexity of Common Randomness Generation with Isotropic States

Authors: Yangjing Dong, Penghui Yao

Abstract: This paper addresses the problem of generating a common random string with min-entropy k using an unlimited supply of noisy EPR pairs or quantum isotropic states, with minimal communication between Alice and Bob. The paper considers two communication models -- one-way classical communication and one-way quantum communication, and derives upper bounds on the optimal common randomness rate for both… ▽ More This paper addresses the problem of generating a common random string with min-entropy k using an unlimited supply of noisy EPR pairs or quantum isotropic states, with minimal communication between Alice and Bob. The paper considers two communication models -- one-way classical communication and one-way quantum communication, and derives upper bounds on the optimal common randomness rate for both models. We show that in the case of classical communication, quantum isotropic states have no advantage over noisy classical correlation[GR16]. In the case of quantum communication, we demonstrate that the common randomness rate can be increased by using superdense coding on quantum isotropic states. We also prove an upper bound on the optimal common randomness rate achievable by using one-way quantum communication. As an application, our result yields upper bounds on the classical capacity of the noiseless quantum channel assisted by noisy entanglement[HHH+01]. △ Less

Submitted 24 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 20 pages, 2 figures. Update funding information

arXiv:2310.14464 [pdf, ps, other]

A Cryptographic Perspective on the Verifiability of Quantum Advantage

Authors: Nai-Hui Chia, Honghao Fu, Fang Song, Penghui Yao

Abstract: In recent years, achieving verifiable quantum advantage on a NISQ device has emerged as an important open problem in quantum information. The sampling-based quantum advantages are not known to have efficient verification methods. This paper investigates the verification of quantum advantage from a cryptographic perspective. We establish a strong connection between the verifiability of quantum adva… ▽ More In recent years, achieving verifiable quantum advantage on a NISQ device has emerged as an important open problem in quantum information. The sampling-based quantum advantages are not known to have efficient verification methods. This paper investigates the verification of quantum advantage from a cryptographic perspective. We establish a strong connection between the verifiability of quantum advantage and cryptographic and complexity primitives, including efficiently samplable, statistically far but computationally indistinguishable pairs of (mixed) quantum states ($\mathsf{EFI}$), pseudorandom states ($\mathsf{PRS}$), and variants of minimum circuit size problems ($\mathsf{MCSP}$). Specifically, we prove that a) a sampling-based quantum advantage is either verifiable or can be used to build $\mathsf{EFI}$ and even $\mathsf{PRS}$ and b) polynomial-time algorithms for a variant of $\mathsf{MCSP}$ would imply efficient verification of quantum advantages. Our work shows that the quest for verifiable quantum advantages may lead to applications of quantum cryptography, and the construction of quantum primitives can provide new insights into the verifiability of quantum advantages. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 21 pages, 2 figures

arXiv:2310.11478 [pdf, other]

ASP: Automatic Selection of Proxy dataset for efficient AutoML

Authors: Peng Yao, Chao Liao, Jiyuan Jia, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang

Abstract: Deep neural networks have gained great success due to the increasing amounts of data, and diverse effective neural network designs. However, it also brings a heavy computing burden as the amount of training data is proportional to the training time. In addition, a well-behaved model requires repeated trials of different structure designs and hyper-parameters, which may take a large amount of time… ▽ More Deep neural networks have gained great success due to the increasing amounts of data, and diverse effective neural network designs. However, it also brings a heavy computing burden as the amount of training data is proportional to the training time. In addition, a well-behaved model requires repeated trials of different structure designs and hyper-parameters, which may take a large amount of time even with state-of-the-art (SOTA) hyper-parameter optimization (HPO) algorithms and neural architecture search (NAS) algorithms. In this paper, we propose an Automatic Selection of Proxy dataset framework (ASP) aimed to dynamically find the informative proxy subsets of training data at each epoch, reducing the training data size as well as saving the AutoML processing time. We verify the effectiveness and generalization of ASP on CIFAR10, CIFAR100, ImageNet16-120, and ImageNet-1k, across various public model benchmarks. The experiment results show that ASP can obtain better results than other data selection methods at all selection ratios. ASP can also enable much more efficient AutoML processing with a speedup of 2x-20x while obtaining better architectures and better hyper-parameters compared to utilizing the entire dataset. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: This paper was actually finished in 2021

arXiv:2310.11117 [pdf, other]

USDC: Unified Static and Dynamic Compression for Visual Transformer

Authors: Huan Yuan, Chao Liao, Jianchao Tan, Peng Yao, Jiyuan Jia, Bin Chen, Chengru Song, Di Zhang

Abstract: Visual Transformers have achieved great success in almost all vision tasks, such as classification, detection, and so on. However, the model complexity and the inference speed of the visual transformers hinder their deployments in industrial products. Various model compression techniques focus on directly compressing the visual transformers into a smaller one while maintaining the model performanc… ▽ More Visual Transformers have achieved great success in almost all vision tasks, such as classification, detection, and so on. However, the model complexity and the inference speed of the visual transformers hinder their deployments in industrial products. Various model compression techniques focus on directly compressing the visual transformers into a smaller one while maintaining the model performance, however, the performance drops dramatically when the compression ratio is large. Furthermore, several dynamic network techniques have also been applied to dynamically compress the visual transformers to obtain input-adaptive efficient sub-structures during the inference stage, which can achieve a better trade-off between the compression ratio and the model performance. The upper bound of memory of dynamic models is not reduced in the practical deployment since the whole original visual transformer model and the additional control gating modules should be loaded onto devices together for inference. To alleviate two disadvantages of two categories of methods, we propose to unify the static compression and dynamic compression techniques jointly to obtain an input-adaptive compressed model, which can further better balance the total compression ratios and the model performances. Moreover, in practical deployment, the batch sizes of the training and inference stage are usually different, which will cause the model inference performance to be worse than the model training performance, which is not touched by all previous dynamic network papers. We propose a sub-group gates augmentation technique to solve this performance drop problem. Extensive experiments demonstrate the superiority of our method on various baseline visual transformers such as DeiT, T2T-ViT, and so on. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: This paper was actually finished in 2021

arXiv:2309.11279 [pdf, other]

On the Fine-Grained Query Complexity of Symmetric Functions

Authors: Supartha Podder, Penghui Yao, Zekun Ye

Abstract: This paper explores a fine-grained version of the Watrous conjecture, including the randomized and quantum algorithms with success probabilities arbitrarily close to $1/2$. Our contributions include the following: i) An analysis of the optimal success probability of quantum and randomized query algorithms of two fundamental partial symmetric Boolean functions given a fixed number of queries. We… ▽ More This paper explores a fine-grained version of the Watrous conjecture, including the randomized and quantum algorithms with success probabilities arbitrarily close to $1/2$. Our contributions include the following: i) An analysis of the optimal success probability of quantum and randomized query algorithms of two fundamental partial symmetric Boolean functions given a fixed number of queries. We prove that for any quantum algorithm computing these two functions using $T$ queries, there exist randomized algorithms using $\mathsf{poly}(T)$ queries that achieve the same success probability as the quantum algorithm, even if the success probability is arbitrarily close to 1/2. ii) We establish that for any total symmetric Boolean function $f$, if a quantum algorithm uses $T$ queries to compute $f$ with success probability $1/2+β$, then there exists a randomized algorithm using $O(T^2)$ queries to compute $f$ with success probability $1/2+Ω(δβ^2)$ on a $1-δ$ fraction of inputs, where $β,δ$ can be arbitrarily small positive values. As a corollary, we prove a randomized version of Aaronson-Ambainis Conjecture for total symmetric Boolean functions in the regime where the success probability of algorithms can be arbitrarily close to 1/2. iii) We present polynomial equivalences for several fundamental complexity measures of partial symmetric Boolean functions. Specifically, we first prove that for certain partial symmetric Boolean functions, quantum query complexity is at most quadratic in approximate degree for any error arbitrarily close to 1/2. Next, we show exact quantum query complexity is at most quadratic in degree. Additionally, we give the tight bounds of several complexity measures, indicating their polynomial equivalence. △ Less

Submitted 21 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: accepted in ISAAC 2023

arXiv:2309.08941 [pdf, ps, other]

Quantum Pseudorandom Scramblers

Authors: Chuhan Lu, Minglong Qin, Fang Song, Penghui Yao, Mingnan Zhao

Abstract: Quantum pseudorandom state generators (PRSGs) have stimulated exciting developments in recent years. A PRSG, on a fixed initial (e.g., all-zero) state, produces an output state that is computationally indistinguishable from a Haar random state. However, pseudorandomness of the output state is not guaranteed on other initial states. In fact, known PRSG constructions provably fail on some initial st… ▽ More Quantum pseudorandom state generators (PRSGs) have stimulated exciting developments in recent years. A PRSG, on a fixed initial (e.g., all-zero) state, produces an output state that is computationally indistinguishable from a Haar random state. However, pseudorandomness of the output state is not guaranteed on other initial states. In fact, known PRSG constructions provably fail on some initial state. In this work, we propose and construct quantum Pseudorandom State Scramblers (PRSSs), which can produce a pseudorandom state on an arbitrary initial state. In the information-theoretical setting, we obtain a scrambler which maps an arbitrary initial state to a distribution of quantum states that is close to Haar random in total variation distance. As a result, our PRSS exhibits a dispersing property. Loosely, it can span an $ε$-net of the state space. This significantly strengthens what standard PRSGs can induce, as they may only concentrate on a small region of the state space as long as the average output state approximates a Haar random state in total variation distance. Our PRSS construction develops a parallel extension of the famous Kac's walk, and we show that it mixes exponentially faster than the standard Kac's walk. This constitutes the core of our proof. We also describe a few applications of PRSSs. While our PRSS construction assumes a post-quantum one-way function, PRSSs are potentially a weaker primitive and can be separated from one-way functions in a relativized world similar to standard PRSGs. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2309.05683 [pdf, other]

EANet: Expert Attention Network for Online Trajectory Prediction

Authors: Pengfei Yao, Tianlu Mao, Min Shi, Jingkai Sun, Zhaoqi Wang

Abstract: Trajectory prediction plays a crucial role in autonomous driving. Existing mainstream research and continuoual learning-based methods all require training on complete datasets, leading to poor prediction accuracy when sudden changes in scenarios occur and failing to promptly respond and update the model. Whether these methods can make a prediction in real-time and use data instances to update the… ▽ More Trajectory prediction plays a crucial role in autonomous driving. Existing mainstream research and continuoual learning-based methods all require training on complete datasets, leading to poor prediction accuracy when sudden changes in scenarios occur and failing to promptly respond and update the model. Whether these methods can make a prediction in real-time and use data instances to update the model immediately(i.e., online learning settings) remains a question. The problem of gradient explosion or vanishing caused by data instance streams also needs to be addressed. Inspired by Hedge Propagation algorithm, we propose Expert Attention Network, a complete online learning framework for trajectory prediction. We introduce expert attention, which adjusts the weights of different depths of network layers, avoiding the model updated slowly due to gradient problem and enabling fast learning of new scenario's knowledge to restore prediction accuracy. Furthermore, we propose a short-term motion trend kernel function which is sensitive to scenario change, allowing the model to respond quickly. To the best of our knowledge, this work is the first attempt to address the online learning problem in trajectory prediction. The experimental results indicate that traditional methods suffer from gradient problems and that our method can quickly reduce prediction errors and reach the state-of-the-art prediction accuracy. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.05264 [pdf, other]

Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning (Extended Version)

Authors: Pingchuan Ma, Zhenlan Ji, Peisen Yao, Shuai Wang, Kui Ren

Abstract: Causal discovery is a powerful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise… ▽ More Causal discovery is a powerful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise when excessive CI tests are performed. Despite the distinct nature between unreliable and excessive CI tests, this paper identifies a unified and principled approach to addressing both of them. Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms, which are a set of well-established integrity constraints on conditional independence. Hence, we can either detect erroneous CI statements if they violate Pearl's axioms or prune excessive CI statements if they are logically entailed by Pearl's axioms. Holistically, both problems boil down to reasoning about the consistency of CI statements under Pearl's axioms (referred to as CIR problem). We propose a runtime verification tool called CICheck, designed to harden causal discovery algorithms from reliability and privacy perspectives. CICheck employs a sound and decidable encoding scheme that translates CIR into SMT problems. To solve the CIR problem efficiently, CICheck introduces a four-stage decision procedure with three lightweight optimizations that actively prove or refute consistency, and only resort to costly SMT-based reasoning when necessary. Based on the decision procedure to CIR, CICheck includes two variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively. [abridged due to length limit] △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.15987 [pdf, other]

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

Authors: Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang, Xiangxiang Chu, Yerui Sun, Li Du, Yuchen Xie

Abstract: In the era of large-scale language models, the substantial parameter size poses significant challenges for deployment. Being a prevalent compression technique, quantization has emerged as the mainstream practice to tackle this issue, which is mainly centered on two recipes W8A8 and W4A16 (i.e. weights and activations in such bit widths). In this study, we propose a novel W4A8 post-training quantiz… ▽ More In the era of large-scale language models, the substantial parameter size poses significant challenges for deployment. Being a prevalent compression technique, quantization has emerged as the mainstream practice to tackle this issue, which is mainly centered on two recipes W8A8 and W4A16 (i.e. weights and activations in such bit widths). In this study, we propose a novel W4A8 post-training quantization method for the available open-sourced LLMs, which combines the advantages of both two recipes. Therefore, we can leverage the benefit in the I/O utilization of 4-bit weight quantization and the acceleration due to 8-bit matrix computation. Nevertheless, the W4A8 faces notorious performance degradation. As a remedy, we involve layerwise activation quantization strategies which feature a novel logarithmic equalization for most intractable layers, and we combine them with fine-grained weight quantization. Without whistles and bells, we eliminate the necessity for further fine-tuning and obtain the state-of-the-art W4A8 quantized performance on BLOOM, LLaMA, and LLaMA-2 on standard benchmarks. We confirm that the W4A8 quantization is achievable for the deployment of large language models, fostering their wide-spreading real-world applications. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2307.14588 [pdf]

MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation

Authors: Liang Xu, Mingxiao Chen, Yi Cheng, Pengfei Shao, Shuwei Shen, Peng Yao, Ronald X. Xu

Abstract: The UNet architecture, based on Convolutional Neural Networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome t… ▽ More The UNet architecture, based on Convolutional Neural Networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome this limitation by effectively capturing global feature correlations. However, the integration of the Transformer modules may result in the loss of local contextual information during the global feature fusion process. To overcome these challenges, we propose a 2D medical image segmentation model called Multi-scale Cross Perceptron Attention Network (MCPA). The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron. The Cross Perceptron first captures the local correlations using multiple Multi-scale Cross Perceptron modules, facilitating the fusion of features across scales. The resulting multi-scale feature vectors are then spatially unfolded, concatenated, and fed through a Global Perceptron module to model global dependencies. Furthermore, we introduce a Progressive Dual-branch Structure to address the semantic segmentation of the image involving finer tissue structures. This structure gradually shifts the segmentation focus of MCPA network training from large-scale structural features to more sophisticated pixel-level features. We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices, including the open large-scale dataset of CT (Synapse), MRI (ACDC), fundus camera (DRIVE, CHASE_DB1, HRF), and OCTA (ROSE). The experimental results show that our MCPA model achieves state-of-the-art performance. The code is available at https://github.com/simonustc/MCPA-for-2D-Medical-Image-Segmentation. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.07928 [pdf]

Reinforced Disentanglement for Face Swapping without Skip Connection

Authors: Xiaohang Ren, Xingyu Chen, Pengfei Yao, Heung-Yeung Shum, Baoyuan Wang

Abstract: The SOTA face swap models still suffer the problem of either target identity (i.e., shape) being leaked or the target non-identity attributes (i.e., background, hair) failing to be fully preserved in the final results. We show that this insufficient disentanglement is caused by two flawed designs that were commonly adopted in prior models: (1) counting on only one compressed encoder to represent b… ▽ More The SOTA face swap models still suffer the problem of either target identity (i.e., shape) being leaked or the target non-identity attributes (i.e., background, hair) failing to be fully preserved in the final results. We show that this insufficient disentanglement is caused by two flawed designs that were commonly adopted in prior models: (1) counting on only one compressed encoder to represent both the semantic-level non-identity facial attributes(i.e., pose) and the pixel-level non-facial region details, which is contradictory to satisfy at the same time; (2) highly relying on long skip-connections between the encoder and the final generator, leaking a certain amount of target face identity into the result. To fix them, we introduce a new face swap framework called 'WSC-swap' that gets rid of skip connections and uses two target encoders to respectively capture the pixel-level non-facial region attributes and the semantic non-identity attributes in the face region. To further reinforce the disentanglement learning for the target encoder, we employ both identity removal loss via adversarial training (i.e., GAN) and the non-identity preservation loss via prior 3DMM models like [11]. Extensive experiments on both FaceForensics++ and CelebA-HQ show that our results significantly outperform previous works on a rich set of metrics, including one novel metric for measuring identity consistency that was completely neglected before. △ Less

Submitted 3 August, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023

arXiv:2305.12097 [pdf, ps, other]

On Testing and Learning Quantum Junta Channels

Authors: Zongbo Bao, Penghui Yao

Abstract: We consider the problems of testing and learning quantum $k$-junta channels, which are $n$-qubit to $n$-qubit quantum channels acting non-trivially on at most $k$ out of $n$ qubits and leaving the rest of qubits unchanged. We show the following. 1. An $O\left(k\right)$-query algorithm to distinguish whether the given channel is $k$-junta channel or is far from any $k$-junta channels, and a lower… ▽ More We consider the problems of testing and learning quantum $k$-junta channels, which are $n$-qubit to $n$-qubit quantum channels acting non-trivially on at most $k$ out of $n$ qubits and leaving the rest of qubits unchanged. We show the following. 1. An $O\left(k\right)$-query algorithm to distinguish whether the given channel is $k$-junta channel or is far from any $k$-junta channels, and a lower bound $Ω\left(\sqrt{k}\right)$ on the number of queries; 2. An $\widetilde{O}\left(4^k\right)$-query algorithm to learn a $k$-junta channel, and a lower bound $Ω\left(4^k/k\right)$ on the number of queries. This gives the first junta channel testing and learning results, and partially answers an open problem raised by Chen et al. (2023). In order to settle these problems, we develop a Fourier analysis framework over the space of superoperators and prove several fundamental properties, which extends the Fourier analysis over the space of operators introduced in Montanaro and Osborne (2010). Besides, we introduce $\textit{Influence-Sample}$ to replace $\textit{Fourier-Sample}$ proposed in Atici and Servedio (2007). Our $\textit{Influence-Sample}$ includes only single-qubit operations and results in only constant-factor decrease in efficiency. △ Less

Submitted 19 December, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

arXiv:2305.04316 [pdf, other]

Synthesizing Conjunctive Queries for Code Search

Authors: Chengpeng Wang, Peisen Yao, Wensheng Tang, Gang Fan, Charles Zhang

Abstract: This paper presents Squid, a new conjunctive query synthesis algorithm for searching code with target patterns. Given positive and negative examples along with a natural language description, Squid analyzes the relations derived from the examples by a Datalog-based program analyzer and synthesizes a conjunctive query expressing the search intent. The synthesized query can be further used to search… ▽ More This paper presents Squid, a new conjunctive query synthesis algorithm for searching code with target patterns. Given positive and negative examples along with a natural language description, Squid analyzes the relations derived from the examples by a Datalog-based program analyzer and synthesizes a conjunctive query expressing the search intent. The synthesized query can be further used to search for desired grammatical constructs in the editor. To achieve high efficiency, we prune the huge search space by removing unnecessary relations and enumerating query candidates via refinement. We also introduce two quantitative metrics for query prioritization to select the queries from multiple candidates, yielding desired queries for code search. We have evaluated Squid on over thirty code search tasks. It is shown that Squid successfully synthesizes the conjunctive queries for all the tasks, taking only 2.56 seconds on average. △ Less

Submitted 11 May, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

Comments: 32 pages, 7 figures, and 1 table. Accepted by ECOOP 2023

arXiv:2304.12690 [pdf, ps, other]

The Generations of Classical Correlations via Quantum Schemes

Authors: Zhenyu Chen, Lijinzhi Lin, Xiaodie Lin, Zhaohui Wei, Penghui Yao

Abstract: Suppose two separated parties, Alice and Bob, share a bipartite quantum state or a classical correlation called a \emph{seed}, and they try to generate a target classical correlation by performing local quantum or classical operations on the seed, i.e., any communications are not allowed. We consider the following fundamental problem about this setting: whether Alice and Bob can use a given seed t… ▽ More Suppose two separated parties, Alice and Bob, share a bipartite quantum state or a classical correlation called a \emph{seed}, and they try to generate a target classical correlation by performing local quantum or classical operations on the seed, i.e., any communications are not allowed. We consider the following fundamental problem about this setting: whether Alice and Bob can use a given seed to generate a target classical correlation. We show that this problem has rich mathematical structures. Firstly, we prove that even if the seed is a pure bipartite state, the above decision problem is already NP-hard and a similar conclusion can also be drawn when the seed is also a classical correlation, implying that this problem is hard to solve generally. Furthermore, we prove that when the seed is a pure quantum state, solving the problem is equivalent to finding out whether the target classical correlation has some diagonal form of positive semi-definite factorizations that matches the seed pure state, revealing an interesting connection between the current problem and optimization theory. Based on this observation and other insights, we give several necessary conditions where the seed pure state has to satisfy to generate the target classical correlation, and it turns out that these conditions can also be generalized to the case that the seed is a mixed quantum state. Lastly, since diagonal forms of positive semi-definite factorizations play a crucial role in solving the problem, we develop an algorithm that can compute them for an arbitrary classical correlation, which has decent performance on the cases we test. △ Less

Submitted 13 May, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 18 pages, no figures. To appear in IEEE Transactions on Information Theory. Comments are welcome

arXiv:2303.01410 [pdf, other]

doi 10.18653/v1/2023.eacl-demo.3

NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools

Authors: Peiran Yao, Matej Kosmajac, Abeer Waheed, Kostyantyn Guzhva, Natalie Hervieux, Denilson Barbosa

Abstract: NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsi… ▽ More NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsing, and relation extraction. Its extensible design enables researchers and developers to smoothly replace an existing model or integrate a new one. To improve efficiency, we employ a microservice architecture that facilitates allocation of acceleration hardware and parallelization of computation. This paper presents the architecture of NLP Workbench and discusses the challenges we faced in designing it. We also discuss diverse use cases of NLP Workbench and the benefits of using it over other approaches. The platform is under active development, with its source code released under the MIT license. A website and a short video demonstrating our platform are also available. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Camera-ready version for EACL 2023: System Demonstrations

arXiv:2301.11011 [pdf, other]

Verifying Data Constraint Equivalence in FinTech Systems

Authors: Chengpeng Wang, Gang Fan, Peisen Yao, Fuxiong Pan, Charles Zhang

Abstract: Data constraints are widely used in FinTech systems for monitoring data consistency and diagnosing anomalous data manipulations. However, many equivalent data constraints are created redundantly during the development cycle, slowing down the FinTech systems and causing unnecessary alerts. We present EqDAC, an efficient decision procedure to determine the data constraint equivalence. We first propo… ▽ More Data constraints are widely used in FinTech systems for monitoring data consistency and diagnosing anomalous data manipulations. However, many equivalent data constraints are created redundantly during the development cycle, slowing down the FinTech systems and causing unnecessary alerts. We present EqDAC, an efficient decision procedure to determine the data constraint equivalence. We first propose the symbolic representation for semantic encoding and then introduce two light-weighted analyses to refute and prove the equivalence, respectively, which are proved to achieve in polynomial time. We evaluate EqDAC upon 30,801 data constraints in a FinTech system. It is shown that EqDAC detects 11,538 equivalent data constraints in three hours. It also supports efficient equivalence searching with an average time cost of 1.22 seconds, enabling the system to check new data constraints upon submission. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 14 pages, 11 figures, accepted by ICSE 2023

arXiv:2206.02767 [pdf, other]

doi 10.1145/3519270.3538441

Quantum Complexity of Weighted Diameter and Radius in CONGEST Networks

Authors: Xudong Wu, Penghui Yao

Abstract: This paper studies the round complexity of computing the weighted diameter and radius of a graph in the quantum CONGEST model. We present a quantum algorithm that $(1+o(1))$-approximates the diameter and radius with round complexity $\widetilde O\left(\min\left\{n^{9/10}D^{3/10},n\right\}\right)$, where $D$ denotes the unweighted diameter. This exhibits the advantages of quantum communication over… ▽ More This paper studies the round complexity of computing the weighted diameter and radius of a graph in the quantum CONGEST model. We present a quantum algorithm that $(1+o(1))$-approximates the diameter and radius with round complexity $\widetilde O\left(\min\left\{n^{9/10}D^{3/10},n\right\}\right)$, where $D$ denotes the unweighted diameter. This exhibits the advantages of quantum communication over classical communication since computing a $(3/2-\varepsilon)$-approximation of the diameter and radius in a classical CONGEST network takes $\widetildeΩ(n)$ rounds, even if $D$ is constant [Abboud, Censor-Hillel, and Khoury, DISC '16]. We also prove a lower bound of $\widetildeΩ(n^{2/3})$ for $(3/2-\varepsilon)$-approximating the weighted diameter/radius in quantum CONGEST networks, even if $D=Θ(\log n)$. Thus, in quantum CONGEST networks, computing weighted diameter and weighted radius of graphs with small $D$ is strictly harder than unweighted ones due to Le Gall and Magniez's $\widetilde O\left(\sqrt{nD}\right)$-round algorithm for unweighted diameter/radius [PODC '18]. △ Less

Submitted 26 September, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: 24 pages. accepted by PODC 2022

Journal ref: Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing (PODC 2022), pp. 120-130, 2022

arXiv:2206.02766 [pdf, other]

doi 10.1142/S2010324721400075

Complexity of Eccentricities and All-Pairs Shortest Paths in the Quantum CONGEST Model

Authors: ChengSheng Wang, Xudong Wu, Penghui Yao

Abstract: Computing the distance parameters of a network, including the diameter, radius, eccentricities and the all-pairs shortest paths (APSP) is a central problem in distributed computing. This paper investigates he dtistance parameters in the quantum CONGEST models and establishes almost linear lower bounds on eccentricities and APSP, which match the classical upper bounds. Our results imply that there… ▽ More Computing the distance parameters of a network, including the diameter, radius, eccentricities and the all-pairs shortest paths (APSP) is a central problem in distributed computing. This paper investigates he dtistance parameters in the quantum CONGEST models and establishes almost linear lower bounds on eccentricities and APSP, which match the classical upper bounds. Our results imply that there is not quantum speedup for these two problems. In contrast with the diameter and radius, exchanging quantum messages is able to save the communication when the networks have low diameters [Le Gall and Magniez, PODC 2018]. We obtain the lower bounds via a reduction from the two-way quantum communication complexity of the set intersection [Razborov, Izvestiya Mathematics 2003]. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: 16 pages. invited paper on SPIN, Special Issue on Quantum Algorithms and Software

arXiv:2205.12117 [pdf, other]

Phased Progressive Learning with Coupling-Regulation-Imbalance Loss for Imbalanced Data Classification

Authors: Liang Xu, Yi Cheng, Fan Zhang, Bingxuan Wu, Pengfei Shao, Peng Liu, Shuwei Shen, Peng Yao, Ronald X. Xu

Abstract: Deep convolutional neural networks often perform poorly when faced with datasets that suffer from quantity imbalances and classification difficulties. Despite advances in the field, existing two-stage approaches still exhibit dataset bias or domain shift. To counter this, a phased progressive learning schedule has been proposed that gradually shifts the emphasis from representation learning to tra… ▽ More Deep convolutional neural networks often perform poorly when faced with datasets that suffer from quantity imbalances and classification difficulties. Despite advances in the field, existing two-stage approaches still exhibit dataset bias or domain shift. To counter this, a phased progressive learning schedule has been proposed that gradually shifts the emphasis from representation learning to training the upper classifier. This approach is particularly beneficial for datasets with larger imbalances or fewer samples. Another new method a coupling-regulation-imbalance loss function is proposed, which combines three parts: a correction term, Focal loss, and LDAM loss. This loss is effective in addressing quantity imbalances and outliers, while regulating the focus of attention on samples with varying classification difficulties. These approaches have yielded satisfactory results on several benchmark datasets, including Imbalanced CIFAR10, Imbalanced CIFAR100, ImageNet-LT, and iNaturalist 2018, and can be easily generalized to other imbalanced classification models. △ Less

Submitted 15 March, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

arXiv:2201.12772 [pdf, ps, other]

Polynomial-Time Approximation of Zero-Free Partition Functions

Authors: Penghui Yao, Yitong Yin, Xinyuan Zhang

Abstract: Zero-free based algorithm is a major technique for deterministic approximate counting. In Barvinok's original framework[Bar17], by calculating truncated Taylor expansions, a quasi-polynomial time algorithm was given for estimating zero-free partition functions. Patel and Regts[PR17] later gave a refinement of Barvinok's framework, which gave a polynomial-time algorithm for a class of zero-free gra… ▽ More Zero-free based algorithm is a major technique for deterministic approximate counting. In Barvinok's original framework[Bar17], by calculating truncated Taylor expansions, a quasi-polynomial time algorithm was given for estimating zero-free partition functions. Patel and Regts[PR17] later gave a refinement of Barvinok's framework, which gave a polynomial-time algorithm for a class of zero-free graph polynomials that can be expressed as counting induced subgraphs in bounded-degree graphs. In this paper, we give a polynomial-time algorithm for estimating classical and quantum partition functions specified by local Hamiltonians with bounded maximum degree, assuming a zero-free property for the temperature. Consequently, when the inverse temperature is close enough to zero by a constant gap, we have polynomial-time approximation algorithm for all such partition functions. Our result is based on a new abstract framework that extends and generalizes the approach of Patel and Regts. △ Less

Submitted 30 January, 2022; originally announced January 2022.

arXiv:2110.10423 [pdf, other]

ProxyBO: Accelerating Neural Architecture Search via Bayesian Optimization with Zero-cost Proxies

Authors: Yu Shen, Yang Li, Jian Zheng, Wentao Zhang, Peng Yao, Jixiang Li, Sen Yang, Ji Liu, Bin Cui

Abstract: Designing neural architectures requires immense manual efforts. This has promoted the development of neural architecture search (NAS) to automate the design. While previous NAS methods achieve promising results but run slowly, zero-cost proxies run extremely fast but are less promising. Therefore, it is of great potential to accelerate NAS via those zero-cost proxies. The existing method has two l… ▽ More Designing neural architectures requires immense manual efforts. This has promoted the development of neural architecture search (NAS) to automate the design. While previous NAS methods achieve promising results but run slowly, zero-cost proxies run extremely fast but are less promising. Therefore, it is of great potential to accelerate NAS via those zero-cost proxies. The existing method has two limitations, which are unforeseeable reliability and one-shot usage. To address the limitations, we present ProxyBO, an efficient Bayesian optimization (BO) framework that utilizes the zero-cost proxies to accelerate neural architecture search. We apply the generalization ability measurement to estimate the fitness of proxies on the task during each iteration and design a novel acquisition function to combine BO with zero-cost proxies based on their dynamic influence. Extensive empirical studies show that ProxyBO consistently outperforms competitive baselines on five tasks from three public benchmarks. Concretely, ProxyBO achieves up to 5.41x and 3.86x speedups over the state-of-the-art approaches REA and BRP-NAS. △ Less

Submitted 13 March, 2023; v1 submitted 20 October, 2021; originally announced October 2021.

Comments: Accepted by AAAI 2023

arXiv:2109.07923 [pdf, other]

Efficient Path-Sensitive Data-Dependence Analysis

Authors: Peisen Yao, Jinguo Zhou, Xiao Xiao, Qingkai Shi, Rongxin Wu, Charles Zhang

Abstract: This paper presents a scalable path- and context-sensitive data-dependence analysis. The key is to address the aliasing-path-explosion problem via a sparse, demand-driven, and fused approach that piggybacks the computation of pointer information with the resolution of data dependence. Specifically, our approach decomposes the computational efforts of disjunctive reasoning into 1) a context- and se… ▽ More This paper presents a scalable path- and context-sensitive data-dependence analysis. The key is to address the aliasing-path-explosion problem via a sparse, demand-driven, and fused approach that piggybacks the computation of pointer information with the resolution of data dependence. Specifically, our approach decomposes the computational efforts of disjunctive reasoning into 1) a context- and semi-path-sensitive analysis that concisely summarizes data dependence as the symbolic and storeless value-flow graphs, and 2) a demand-driven phase that resolves transitive data dependence over the graphs. We have applied the approach to two clients, namely thin slicing and value flow analysis. Using a suite of 16 programs ranging from 13 KLoC to 8 MLoC, we compare our techniques against a diverse group of state-of-the-art analyses, illustrating significant precision and scalability advantages of our approach. △ Less

Submitted 16 September, 2021; originally announced September 2021.

arXiv:2107.13200 [pdf]

An explainable two-dimensional single model deep learning approach for Alzheimer's disease diagnosis and brain atrophy localization

Authors: Fan Zhang, Bo Pan, Pengfei Shao, Peng Liu, Shuwei Shen, Peng Yao, Ronald X. Xu

Abstract: Early and accurate diagnosis of Alzheimer's disease (AD) and its prodromal period mild cognitive impairment (MCI) is essential for the delayed disease progression and the improved quality of patients'life. The emerging computer-aided diagnostic methods that combine deep learning with structural magnetic resonance imaging (sMRI) have achieved encouraging results, but some of them are limit of issue… ▽ More Early and accurate diagnosis of Alzheimer's disease (AD) and its prodromal period mild cognitive impairment (MCI) is essential for the delayed disease progression and the improved quality of patients'life. The emerging computer-aided diagnostic methods that combine deep learning with structural magnetic resonance imaging (sMRI) have achieved encouraging results, but some of them are limit of issues such as data leakage and unexplainable diagnosis. In this research, we propose a novel end-to-end deep learning approach for automated diagnosis of AD and localization of important brain regions related to the disease from sMRI data. This approach is based on a 2D single model strategy and has the following differences from the current approaches: 1) Convolutional Neural Network (CNN) models of different structures and capacities are evaluated systemically and the most suitable model is adopted for AD diagnosis; 2) a data augmentation strategy named Two-stage Random RandAugment (TRRA) is proposed to alleviate the overfitting issue caused by limited training data and to improve the classification performance in AD diagnosis; 3) an explainable method of Grad-CAM++ is introduced to generate the visually explainable heatmaps that localize and highlight the brain regions that our model focuses on and to make our model more transparent. Our approach has been evaluated on two publicly accessible datasets for two classification tasks of AD vs. cognitively normal (CN) and progressive MCI (pMCI) vs. stable MCI (sMCI). The experimental results indicate that our approach outperforms the state-of-the-art approaches, including those using multi-model and 3D CNN methods. The resultant localization heatmaps from our approach also highlight the lateral ventricle and some disease-relevant regions of cortex, coincident with the commonly affected regions during the development of AD. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2107.03660 [pdf, other]

Duplicate-sensitivity Guided Transformation Synthesis for DBMS Correctness Bug Detection

Authors: Yushan Zhang, Peisen Yao, Rongxin Wu, Charles Zhang

Abstract: Database Management System (DBMS) plays a core role in modern software from mobile apps to online banking. It is critical that DBMS should provide correct data to all applications. When the DBMS returns incorrect data, a correctness bug is triggered. Current production-level DBMSs still suffer from insufficient testing due to the limited hand-written test cases. Recently several works proposed to… ▽ More Database Management System (DBMS) plays a core role in modern software from mobile apps to online banking. It is critical that DBMS should provide correct data to all applications. When the DBMS returns incorrect data, a correctness bug is triggered. Current production-level DBMSs still suffer from insufficient testing due to the limited hand-written test cases. Recently several works proposed to automatically generate many test cases with query transformation, a process of generating an equivalent query pair and testing a DBMS by checking whether the system returns the same result set for both queries. However, all of them still heavily rely on manual work to provide a transformation which largely confines their exploration of the valid input query space. This paper introduces duplicate-sensitivity guided transformation synthesis which automatically finds new transformations by first synthesizing many candidates then filtering the nonequivalent ones. Our automated synthesis is achieved by mutating a query while keeping its duplicate sensitivity, which is a necessary condition for query equivalence. After candidate synthesis, we keep the mutant query which is equivalent to the given one by using a query equivalent checker. Furthermore, we have implemented our idea in a tool Eqsql and used it to test the production-level DBMSs. In two months, we detected in total 30 newly confirmed and unique bugs in MySQL, TiDB and CynosDB. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 11 pages, 6 figures, 7 tables

arXiv:2106.14300 [pdf, other]

ASK: Adversarial Soft k-Nearest Neighbor Attack and Defense

Authors: Ren Wang, Tianqi Chen, Philip Yao, Sijia Liu, Indika Rajapakse, Alfred Hero

Abstract: K-Nearest Neighbor (kNN)-based deep learning methods have been applied to many applications due to their simplicity and geometric interpretability. However, the robustness of kNN-based classification models has not been thoroughly explored and kNN attack strategies are underdeveloped. In this paper, we propose an Adversarial Soft kNN (ASK) loss to both design more effective kNN attack strategies a… ▽ More K-Nearest Neighbor (kNN)-based deep learning methods have been applied to many applications due to their simplicity and geometric interpretability. However, the robustness of kNN-based classification models has not been thoroughly explored and kNN attack strategies are underdeveloped. In this paper, we propose an Adversarial Soft kNN (ASK) loss to both design more effective kNN attack strategies and to develop better defenses against them. Our ASK loss approach has two advantages. First, ASK loss can better approximate the kNN's probability of classification error than objectives proposed in previous works. Second, the ASK loss is interpretable: it preserves the mutual information between the perturbed input and the in-class-reference data. We use the ASK loss to generate a novel attack method called the ASK-Attack (ASK-Atk), which shows superior attack efficiency and accuracy degradation relative to previous kNN attacks. Based on the ASK-Atk, we then derive an ASK-\underline{Def}ense (ASK-Def) method that optimizes the worst-case training loss induced by ASK-Atk. Experiments on CIFAR-10 (ImageNet) show that (i) ASK-Atk achieves $\geq 13\%$ ($\geq 13\%$) improvement in attack success rate over previous kNN attacks, and (ii) ASK-Def outperforms the conventional adversarial training method by $\geq 6.9\%$ ($\geq 3.5\%$) in terms of robustness improvement. △ Less

Submitted 26 September, 2022; v1 submitted 27 June, 2021; originally announced June 2021.

arXiv:2105.05956 [pdf]

doi 10.1088/2634-4386/ac4a83

2022 Roadmap on Neuromorphic Computing and Engineering

Authors: Dennis V. Christensen, Regina Dittmann, Bernabé Linares-Barranco, Abu Sebastian, Manuel Le Gallo, Andrea Redaelli, Stefan Slesazeck, Thomas Mikolajick, Sabina Spiga, Stephan Menzel, Ilia Valov, Gianluca Milano, Carlo Ricciardi, Shi-Jun Liang, Feng Miao, Mario Lanza, Tyler J. Quill, Scott T. Keene, Alberto Salleo, Julie Grollier, Danijela Marković, Alice Mizrahi, Peng Yao, J. Joshua Yang, Giacomo Indiveri , et al. (34 additional authors not shown)

Abstract: Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In the Von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exas… ▽ More Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In the Von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018 calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this Roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The Roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges. We hope that this Roadmap will be a useful resource to readers outside this field, for those who are just entering the field, and for those who are well established in the neuromorphic community. https://doi.org/10.1088/2634-4386/ac4a83 △ Less

Submitted 13 January, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

Journal ref: Neuromorph. Comput. Eng. 2 022501 (2022)

arXiv:2102.02307 [pdf, other]

Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out

Authors: Peiran Yao, Denilson Barbosa

Abstract: Factual knowledge graphs (KGs) such as DBpedia and Wikidata have served as part of various downstream tasks and are also widely adopted by artificial intelligence research communities as benchmark datasets. However, we found these KGs to be surprisingly noisy. In this study, we question the quality of these KGs, where the typing error rate is estimated to be 27% for coarse-grained types on average… ▽ More Factual knowledge graphs (KGs) such as DBpedia and Wikidata have served as part of various downstream tasks and are also widely adopted by artificial intelligence research communities as benchmark datasets. However, we found these KGs to be surprisingly noisy. In this study, we question the quality of these KGs, where the typing error rate is estimated to be 27% for coarse-grained types on average, and even 73% for certain fine-grained types. In pursuit of solutions, we propose an active typing error detection algorithm that maximizes the utilization of both gold and noisy labels. We also comprehensively discuss and compare unsupervised, semi-supervised, and supervised paradigms to deal with typing errors in factual KGs. The outcomes of this study provide guidelines for researchers to use noisy factual KGs. To help practitioners deploy the techniques and conduct further research, we published our code and data. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: 9 pages, 3 figures Camera-ready for WWW2021

arXiv:2102.01284 [pdf]

doi 10.1109/TMI.2021.3136682

Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion Classification

Authors: Peng Yao, Shuwei Shen, Mengjuan Xu, Peng Liu, Fan Zhang, Jinyu Xing, Pengfei Shao, Benjamin Kaffenberger, Ronald X. Xu

Abstract: Deep convolutional neural network (DCNN) models have been widely explored for skin disease diagnosis and some of them have achieved the diagnostic outcomes comparable or even superior to those of dermatologists. However, broad implementation of DCNN in skin disease detection is hindered by small size and data imbalance of the publically accessible skin lesion datasets. This paper proposes a novel… ▽ More Deep convolutional neural network (DCNN) models have been widely explored for skin disease diagnosis and some of them have achieved the diagnostic outcomes comparable or even superior to those of dermatologists. However, broad implementation of DCNN in skin disease detection is hindered by small size and data imbalance of the publically accessible skin lesion datasets. This paper proposes a novel single-model based strategy for classification of skin lesions on small and imbalanced datasets. First, various DCNNs are trained on different small and imbalanced datasets to verify that the models with moderate complexity outperform the larger models. Second, regularization DropOut and DropBlock are added to reduce overfitting and a Modified RandAugment augmentation strategy is proposed to deal with the defects of sample underrepresentation in the small dataset. Finally, a novel Multi-Weighted New Loss (MWNL) function and an end-to-end cumulative learning strategy (CLS) are introduced to overcome the challenge of uneven sample size and classification difficulty and to reduce the impact of abnormal samples on training. By combining Modified RandAugment, MWNL and CLS, our single DCNN model method achieved the classification accuracy comparable or superior to those of multiple ensembling models on different dermoscopic image datasets. Our study shows that this method is able to achieve a high classification performance at a low cost of computational resources and inference time, potentially suitable to implement in mobile devices for automated screening of skin lesions and many other malignancies in low resource settings. △ Less

Submitted 11 February, 2022; v1 submitted 1 February, 2021; originally announced February 2021.

Journal ref: IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021

arXiv:2101.08141 [pdf, ps, other]

Positive spectrahedra: Invariance principles and Pseudorandom generators

Authors: Srinivasan Arunachalam, Penghui Yao

Abstract: In a recent work, O'Donnell, Servedio and Tan (STOC 2019) gave explicit pseudorandom generators (PRGs) for arbitrary $m$-facet polytopes in $n$ variables with seed length poly-logarithmic in $m,n$, concluding a sequence of works in the last decade, that was started by Diakonikolas, Gopalan, Jaiswal, Servedio, Viola (SICOMP 2010) and Meka, Zuckerman (SICOMP 2013) for fooling linear and polynomial t… ▽ More In a recent work, O'Donnell, Servedio and Tan (STOC 2019) gave explicit pseudorandom generators (PRGs) for arbitrary $m$-facet polytopes in $n$ variables with seed length poly-logarithmic in $m,n$, concluding a sequence of works in the last decade, that was started by Diakonikolas, Gopalan, Jaiswal, Servedio, Viola (SICOMP 2010) and Meka, Zuckerman (SICOMP 2013) for fooling linear and polynomial threshold functions, respectively. In this work, we consider a natural extension of PRGs for intersections of positive spectrahedrons. A positive spectrahedron is a Boolean function $f(x)=[x_1A^1+\cdots +x_nA^n \preceq B]$ where the $A^i$s are $k\times k$ positive semidefinite matrices. We construct explicit PRGs that $δ$-fool "regular" width-$M$ positive spectrahedrons (i.e., when none of the $A^i$s are dominant) over the Boolean space with seed length $\textsf{poly}(\log k,\log n, M, 1/δ)$. Our main technical contributions are the following: We first prove an invariance principle for positive spectrahedra via the well-known Lindeberg method. As far as we are aware such a generalization of the Lindeberg method was unknown. Second, we prove an upper bound on noise sensitivity and a Littlewood-Offord theorem for positive spectrahedra. Using these results, we give applications for constructing PRGs for positive spectrahedra, learning theory, discrepancy sets for positive spectrahedra (over the Boolean cube) and PRGs for intersections of structured polynomial threshold functions. △ Less

Submitted 1 June, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

Comments: 63 pages. v2: Minor revisions and Improvements in presentation

arXiv:2101.02353 [pdf]

Low-cost and high-performance data augmentation for deep-learning-based skin lesion classification

Authors: Shuwei Shen, Mengjuan Xu, Fan Zhang, Pengfei Shao, Honghong Liu, Liang Xu, Chi Zhang, Peng Liu, Zhihong Zhang, Peng Yao, Ronald X. Xu

Abstract: Although deep convolutional neural networks (DCNNs) have achieved significant accuracy in skin lesion classification comparable or even superior to those of dermatologists, practical implementation of these models for skin cancer screening in low resource settings is hindered by their limitations in computational cost and training dataset. To overcome these limitations, we propose a low-cost and h… ▽ More Although deep convolutional neural networks (DCNNs) have achieved significant accuracy in skin lesion classification comparable or even superior to those of dermatologists, practical implementation of these models for skin cancer screening in low resource settings is hindered by their limitations in computational cost and training dataset. To overcome these limitations, we propose a low-cost and high-performance data augmentation strategy that includes two consecutive stages of augmentation search and network search. At the augmentation search stage, the augmentation strategy is optimized in the search space of Low-Cost-Augment (LCA) under the criteria of balanced accuracy (BACC) with 5-fold cross validation. At the network search stage, the DCNNs are fine-tuned with the full training set in order to select the model with the highest BACC. The efficiency of the proposed data augmentation strategy is verified on the HAM10000 dataset using EfficientNets as a baseline. With the proposed strategy, we are able to reduce the search space to 60 and achieve a high BACC of 0.853 by using a single DCNN model without external database, suitable to be implemented in mobile devices for DCNN-based skin lesion detection in low resource settings. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: 8 pages, 5 figures

arXiv:2012.07101 [pdf, other]

Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D Human Pose Estimation

Authors: Kun Zhang, Rui Wu, Ping Yao, Kai Deng, Ding Li, Renbiao Liu, Chuanguang Yang, Ge Chen, Min Du, Tianyao Zheng

Abstract: The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using classification models on ImageNet as their backbones. We note that 2D pose estimation task is highly… ▽ More The target of 2D human pose estimation is to locate the keypoints of body parts from input 2D images. State-of-the-art methods for pose estimation usually construct pixel-wise heatmaps from keypoints as labels for learning convolution neural networks, which are usually initialized randomly or using classification models on ImageNet as their backbones. We note that 2D pose estimation task is highly dependent on the contextual relationship between image patches, thus we introduce a self-supervised method for pretraining 2D pose estimation networks. Specifically, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as our pretext-task, whose target is to learn the location of each patch from an image composed of shuffled patches. During our pretraining process, we only use images of person instances in MS-COCO, rather than introducing extra and much larger ImageNet dataset. A heatmap-style label for patch location is designed and our learning process is in a non-contrastive way. The weights learned by HSJP pretext task are utilised as backbones of 2D human pose estimator, which are then finetuned on MS-COCO human keypoints dataset. With two popular and strong 2D human pose estimators, HRNet and SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev datasets. Our experiments show that downstream pose estimators with our self-supervised pretraining obtain much better performance than those trained from scratch, and are comparable to those using ImageNet classification models as their initial backbones. △ Less

Submitted 13 December, 2020; originally announced December 2020.

arXiv:2007.10673 [pdf, ps, other]

Quantum and Classical Hybrid Generations for Classical Correlations

Authors: Xiaodie Lin, Zhaohui Wei, Penghui Yao

Abstract: We consider two-stage hybrid protocols that combine quantum resource and classical resource to generate classical correlations shared by two separated players. Our motivation is twofold. First, in the near future the scale of quantum information processing is quite limited, and when quantum resource available is not sufficient for certain tasks, a possible way to strengthen the capability of quant… ▽ More We consider two-stage hybrid protocols that combine quantum resource and classical resource to generate classical correlations shared by two separated players. Our motivation is twofold. First, in the near future the scale of quantum information processing is quite limited, and when quantum resource available is not sufficient for certain tasks, a possible way to strengthen the capability of quantum schemes is introducing extra classical resource. We analyze the mathematical structures of these hybrid protocols, and characterize the relation between the amount of quantum resource and classical resource needed. Second, a fundamental open problem in communication complexity theory is to describe the advantages of sharing prior quantum entanglement over sharing prior randomness, which is still widely open. It turns out that our quantum and classical hybrid protocols provide new insight into this important problem. △ Less

Submitted 21 July, 2020; originally announced July 2020.

Comments: 13 pages

arXiv:2003.08897 [pdf, other]

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Authors: Longteng Guo, Jing Liu, Xinxin Zhu, Peng Yao, Shichen Lu, Hanqing Lu

Abstract: Self-attention (SA) network has shown profound value in image captioning. In this paper, we improve SA from two aspects to promote the performance of image captioning. First, we propose Normalized Self-Attention (NSA), a reparameterization of SA that brings the benefits of normalization inside SA. While normalization is previously only applied outside SA, we introduce a novel normalization method… ▽ More Self-attention (SA) network has shown profound value in image captioning. In this paper, we improve SA from two aspects to promote the performance of image captioning. First, we propose Normalized Self-Attention (NSA), a reparameterization of SA that brings the benefits of normalization inside SA. While normalization is previously only applied outside SA, we introduce a novel normalization method and demonstrate that it is both possible and beneficial to perform it on the hidden activations inside SA. Second, to compensate for the major limit of Transformer that it fails to model the geometry structure of the input objects, we propose a class of Geometry-aware Self-Attention (GSA) that extends SA to explicitly and efficiently consider the relative geometry relations between the objects in the image. To construct our image captioning model, we combine the two modules and apply it to the vanilla self-attention network. We extensively evaluate our proposals on MS-COCO image captioning dataset and superior results are achieved when comparing to state-of-the-art approaches. Further experiments on three challenging tasks, i.e. video captioning, machine translation, and visual question answering, show the generality of our methods. △ Less

Submitted 19 March, 2020; originally announced March 2020.

Comments: Accepted by CVPR 2020

arXiv:2001.02818 [pdf, other]

Capacity Approaching Coding for Low Noise Interactive Quantum Communication, Part I: Large Alphabets

Authors: Debbie Leung, Ashwin Nayak, Ala Shayeghi, Dave Touchette, Penghui Yao, Nengkun Yu

Abstract: We consider the problem of implementing two-party interactive quantum communication over noisy channels, a necessary endeavor if we wish to fully reap quantum advantages for communication. For an arbitrary protocol with $n$ messages, designed for a noiseless qudit channel over a $\mathrm{poly}(n)$ size alphabet, our main result is a simulation method that fails with probability less than… ▽ More We consider the problem of implementing two-party interactive quantum communication over noisy channels, a necessary endeavor if we wish to fully reap quantum advantages for communication. For an arbitrary protocol with $n$ messages, designed for a noiseless qudit channel over a $\mathrm{poly}(n)$ size alphabet, our main result is a simulation method that fails with probability less than $2^{-Θ(nε)}$ and uses a qudit channel over the same alphabet $n\left(1+Θ\left(\sqrtε\right)\right)$ times, of which an $ε$ fraction can be corrupted adversarially. The simulation is thus capacity achieving to leading order, and we conjecture that it is optimal up to a constant factor in the $\sqrtε$ term. Furthermore, the simulation is in a model that does not require pre-shared resources such as randomness or entanglement between the communicating parties. Our work improves over the best previously known quantum result where the overhead is a non-explicit large constant [Brassard et al., FOCS'14] for low $ε$. △ Less

Submitted 8 January, 2020; originally announced January 2020.

Comments: 94 pages, 7 figures

arXiv:1910.11102 [pdf, other]

Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid Reward Strategies for Video Captioning

Authors: Xinxin Zhu, Longteng Guo, Peng Yao, Shichen Lu, Wei Liu, Jing Liu

Abstract: This report describes our solution for the VATEX Captioning Challenge 2020, which requires generating descriptions for the videos in both English and Chinese languages. We identified three crucial factors that improve the performance, namely: multi-view features, hybrid reward, and diverse ensemble. Based on our method of VATEX 2019 challenge, we achieved significant improvements this year with mo… ▽ More This report describes our solution for the VATEX Captioning Challenge 2020, which requires generating descriptions for the videos in both English and Chinese languages. We identified three crucial factors that improve the performance, namely: multi-view features, hybrid reward, and diverse ensemble. Based on our method of VATEX 2019 challenge, we achieved significant improvements this year with more advanced model architectures, combination of appearance and motion features, and careful hyper-parameters tuning. Our method achieves very competitive results on both of the Chinese and English video captioning tracks. △ Less

Submitted 23 June, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

Comments: 4 pages,2 figure

arXiv:1909.05090 [pdf, other]

doi 10.1109/ICIP40778.2020.9191174

Learning Enhanced Resolution-wise features for Human Pose Estimation

Authors: Kun Zhang, Peng He, Ping Yao, Ge Chen, Rui Wu, Min Du, Huimin Li, Li Fu, Tianyao Zheng

Abstract: Recently, multi-resolution networks (such as Hourglass, CPN, HRNet, etc.) have achieved significant performance on pose estimation by combining feature maps of various resolutions. In this paper, we propose a Resolution-wise Attention Module (RAM) and Gradual Pyramid Refinement (GPR), to learn enhanced resolution-wise feature maps for precise pose estimation. Specifically, RAM learns a group of we… ▽ More Recently, multi-resolution networks (such as Hourglass, CPN, HRNet, etc.) have achieved significant performance on pose estimation by combining feature maps of various resolutions. In this paper, we propose a Resolution-wise Attention Module (RAM) and Gradual Pyramid Refinement (GPR), to learn enhanced resolution-wise feature maps for precise pose estimation. Specifically, RAM learns a group of weights to represent the different importance of feature maps across resolutions, and the GPR gradually merges every two feature maps from low to high resolutions to regress final human keypoint heatmaps. With the enhanced resolution-wise features learnt by CNN, we obtain more accurate human keypoint locations. The efficacies of our proposed methods are demonstrated on MS-COCO dataset, achieving state-of-the-art performance with average precision of 77.7 on COCO val2017 set and 77.0 on test-dev2017 set without using extra human keypoint training dataset. △ Less

Submitted 13 December, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: Published on ICIP 2020

arXiv:1904.08832 [pdf, other]

A doubly exponential upper bound on noisy EPR states for binary games

Authors: Penghui Yao

Abstract: This paper initiates the study of a class of entangled games, mono-state games, denoted by $(G,ψ)$, where $G$ is a two-player one-round game and $ψ$ is a bipartite state independent of the game $G$. In the mono-state game $(G,ψ)$, the players are only allowed to share arbitrary copies of $ψ$. This paper provides a doubly exponential upper bound on the copies of $ψ$ for the players to approximate t… ▽ More This paper initiates the study of a class of entangled games, mono-state games, denoted by $(G,ψ)$, where $G$ is a two-player one-round game and $ψ$ is a bipartite state independent of the game $G$. In the mono-state game $(G,ψ)$, the players are only allowed to share arbitrary copies of $ψ$. This paper provides a doubly exponential upper bound on the copies of $ψ$ for the players to approximate the value of the game to an arbitrarily small constant precision for any mono-state binary game $(G,ψ)$, if $ψ$ is a noisy EPR state, which is a two-qubit state with completely mixed states as marginals and maximal correlation less than $1$. In particular, it includes $(1-ε)|Ψ\rangle\langleΨ|+ε\frac{I_2}{2}\otimes\frac{I_2}{2}$, an EPR state with an arbitrary depolarizing noise $ε>0$.The structure of the proofs is built the recent framework about the decidability of the non-interactive simulation of joint distributions, which is completely different from all previous optimization-based approaches or "Tsirelson's problem"-based approaches. This paper develops a series of new techniques about the Fourier analysis on matrix spaces and proves a quantum invariance principle and a hypercontractive inequality of random operators. This novel approach provides a new angle to study the decidability of the complexity class MIP$^*$, a longstanding open problem in quantum complexity theory. △ Less

Submitted 15 September, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

Comments: The proof of Lemma C.9 is corrected. The presentation is improved. Some typos are corrected

arXiv:1903.10153 [pdf, other]

DenseBody: Directly Regressing Dense 3D Human Pose and Shape From a Single Color Image

Authors: Pengfei Yao, Zheng Fang, Fan Wu, Yao Feng, Jiwei Li

Abstract: Recovering 3D human body shape and pose from 2D images is a challenging task due to high complexity and flexibility of human body, and relatively less 3D labeled data. Previous methods addressing these issues typically rely on predicting intermediate results such as body part segmentation, 2D/3D joints, silhouette mask to decompose the problem into multiple sub-tasks in order to utilize more 2D la… ▽ More Recovering 3D human body shape and pose from 2D images is a challenging task due to high complexity and flexibility of human body, and relatively less 3D labeled data. Previous methods addressing these issues typically rely on predicting intermediate results such as body part segmentation, 2D/3D joints, silhouette mask to decompose the problem into multiple sub-tasks in order to utilize more 2D labels. Most previous works incorporated parametric body shape model in their methods and predict parameters in low-dimensional space to represent human body. In this paper, we propose to directly regress the 3D human mesh from a single color image using Convolutional Neural Network(CNN). We use an efficient representation of 3D human shape and pose which can be predicted through an encoder-decoder neural network. The proposed method achieves state-of-the-art performance on several 3D human body datasets including Human3.6M, SURREAL and UP-3D with even faster running speed. △ Less

Submitted 28 March, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

Comments: 10 pages, 6 figures

arXiv:1901.00984 [pdf, other]

Quantum Insertion-Deletion Channels

Authors: Janet Leahy, Dave Touchette, Penghui Yao

Abstract: We introduce a model of quantum insertion-deletion (insdel) channels. Insdel channels are meant to represent, for example, synchronization errors arising in data transmission. In the classical setting, they represent a strict generalization of the better-understood corruption error channels, and until recently, had mostly resisted effort toward a similar understanding as their corruption counterpa… ▽ More We introduce a model of quantum insertion-deletion (insdel) channels. Insdel channels are meant to represent, for example, synchronization errors arising in data transmission. In the classical setting, they represent a strict generalization of the better-understood corruption error channels, and until recently, had mostly resisted effort toward a similar understanding as their corruption counterparts. They have received considerable attention in recent years. Very recently, Haeupler and Shahrasbi developed a framework, using what they call synchronisation strings, that allows one to turn insdel-type errors into corruption-type errors. These can then be handled by the use of standard error-correcting codes. We show that their framework can be extended to the quantum setting, providing a way to turn quantum insdel errors into quantum corruption errors, which can be handled with standard quantum error-correcting codes. △ Less

Submitted 4 January, 2019; originally announced January 2019.

arXiv:1808.06449 [pdf, other]

doi 10.1109/TIT.2020.2965114

On the compression of messages in the multi-party setting

Authors: Anurag Anshu, Penghui Yao

Abstract: We consider the following communication task in the multi-party setting, which involves a joint random variable $XYZMN$ with the property that $M$ is independent of $YZN$ conditioned on $X$ and $N$ is independent of $XZM$ conditioned on $Y$. Three parties Alice, Bob and Charlie, respectively, observe samples $x,y$ and $z$ from $XYZ$. Alice and Bob communicate messages to Charlie with the goal that… ▽ More We consider the following communication task in the multi-party setting, which involves a joint random variable $XYZMN$ with the property that $M$ is independent of $YZN$ conditioned on $X$ and $N$ is independent of $XZM$ conditioned on $Y$. Three parties Alice, Bob and Charlie, respectively, observe samples $x,y$ and $z$ from $XYZ$. Alice and Bob communicate messages to Charlie with the goal that Charlie can output a sample from $MN$ having correct correlation with $XYZ$. This task reflects the simultaneous message passing model of communication complexity. Furthermore, it is a generalization of some well studied problems in information theory, such as distributed source coding, source coding with a helper and one sender and one receiver message compression. It is also closely related to the lossy distributed source coding task. Our main result is an achievable communication region for this task in the one-shot setting, through which we obtain a near optimal characterization using auxiliary random variables of bounded size. We employ our achievability result to provide a near-optimal one-shot communication region for the task of lossy distributed source coding, in terms of auxiliary random variables of bounded size. Finally, we show that interaction is necessary to achieve the optimal expected communication cost for our main task. △ Less

Submitted 20 August, 2018; originally announced August 2018.

Comments: version 1, 34 pages, 2 figures

Journal ref: IEEE Transactions on Information Theory ( Volume: 66 , Issue: 4 , April 2020 )

arXiv:1806.00751 [pdf, ps, other]

An Efficient Graph Accelerator with Parallel Data Conflict Management

Authors: Pengcheng Yao

Abstract: Graph-specific computing with the support of dedicated accelerator has greatly boosted the graph processing in both efficiency and energy. Nevertheless, their data conflict management is still sequential in essential when some vertex needs a large number of conflicting updates at the same time, leading to prohibitive performance degradation. This is particularly true for processing natural graphs.… ▽ More Graph-specific computing with the support of dedicated accelerator has greatly boosted the graph processing in both efficiency and energy. Nevertheless, their data conflict management is still sequential in essential when some vertex needs a large number of conflicting updates at the same time, leading to prohibitive performance degradation. This is particularly true for processing natural graphs. In this paper, we have the insight that the atomic operations for the vertex updating of many graph algorithms (e.g., BFS, PageRank and WCC) are typically incremental and simplex. This hence allows us to parallelize the conflicting vertex updates in an accumulative manner. We architect a novel graphspecific accelerator that can simultaneously process atomic vertex updates for massive parallelism on the conflicting data access while ensuring the correctness. A parallel accumulator is designed to remove the serialization in atomic protection for conflicting vertex updates through merging their results in parallel. Our implementation on Xilinx Virtex UltraScale+ XCVU9P with a wide variety of typical graph algorithms shows that our accelerator achieves an average throughput by 2.36 GTEPS as well as up to 3.14x performance speedup in comparison with state-of-the-art ForeGraph (with single-chip version). △ Less

Submitted 3 June, 2018; originally announced June 2018.

arXiv:1611.08946 [pdf, other]

doi 10.1145/3055399.3055401

Exponential Separation of Quantum Communication and Classical Information

Authors: Anurag Anshu, Dave Touchette, Penghui Yao, Nengkun Yu

Abstract: We exhibit a Boolean function for which the quantum communication complexity is exponentially larger than the classical information complexity. An exponential separation in the other direction was already known from the work of Kerenidis et. al. [SICOMP 44, pp. 1550-1572], hence our work implies that these two complexity measures are incomparable. As classical information complexity is an upper bo… ▽ More We exhibit a Boolean function for which the quantum communication complexity is exponentially larger than the classical information complexity. An exponential separation in the other direction was already known from the work of Kerenidis et. al. [SICOMP 44, pp. 1550-1572], hence our work implies that these two complexity measures are incomparable. As classical information complexity is an upper bound on quantum information complexity, which in turn is equal to amortized quantum communication complexity, our work implies that a tight direct sum result for distributional quantum communication complexity cannot hold. The function we use to present such a separation is the Symmetric k-ary Pointer Jumping function introduced by Rao and Sinha [ECCC TR15-057], whose classical communication complexity is exponentially larger than its classical information complexity. In this paper, we show that the quantum communication complexity of this function is polynomially equivalent to its classical communication complexity. The high-level idea behind our proof is arguably the simplest so far for such an exponential separation between information and communication, driven by a sequence of round-elimination arguments, allowing us to simplify further the approach of Rao and Sinha. As another application of the techniques that we develop, we give a simple proof for an optimal trade-off between Alice's and Bob's communication while computing the related Greater-Than function on n bits: say Bob communicates at most b bits, then Alice must send n/exp(O(b)) bits to Bob. This holds even when allowing pre-shared entanglement. We also present a classical protocol achieving this bound. △ Less

Submitted 27 November, 2016; originally announced November 2016.

Comments: v1, 36 pages, 3 figures

Journal ref: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017

arXiv:1506.02936 [pdf, other]

Parity Decision Tree Complexity and 4-Party Communication Complexity of XOR-functions Are Polynomially Equivalent

Authors: Penghui Yao

Abstract: In this note, we study the relation between the parity decision tree complexity of a boolean function $f$, denoted by $\mathrm{D}_{\oplus}(f)$, and the $k$-party number-in-hand multiparty communication complexity of the XOR functions $F(x_1,\ldots, x_k)= f(x_1\oplus\cdots\oplus x_k)$, denoted by $\mathrm{CC}^{(k)}(F)$. It is known that $\mathrm{CC}^{(k)}(F)\leq k\cdot\mathrm{D}_{\oplus}(f)$ becaus… ▽ More In this note, we study the relation between the parity decision tree complexity of a boolean function $f$, denoted by $\mathrm{D}_{\oplus}(f)$, and the $k$-party number-in-hand multiparty communication complexity of the XOR functions $F(x_1,\ldots, x_k)= f(x_1\oplus\cdots\oplus x_k)$, denoted by $\mathrm{CC}^{(k)}(F)$. It is known that $\mathrm{CC}^{(k)}(F)\leq k\cdot\mathrm{D}_{\oplus}(f)$ because the players can simulate the parity decision tree that computes $f$. In this note, we show that \[\mathrm{D}_{\oplus}(f)\leq O\big(\mathrm{CC}^{(4)}(F)^5\big).\] Our main tool is a recent result from additive combinatorics due to Sanders. As $\mathrm{CC}^{(k)}(F)$ is non-decreasing as $k$ grows, the parity decision tree complexity of $f$ and the communication complexity of the corresponding $k$-argument XOR functions are polynomially equivalent whenever $k\geq 4$. Remark: After the first version of this paper was finished, we discovered that Hatami and Lovett had already discovered the same result a few years ago, without writing it up. △ Less

Submitted 28 June, 2015; v1 submitted 9 June, 2015; originally announced June 2015.

arXiv:1405.6015 [pdf, ps, other]

Multipartite Quantum Correlation and Communication Complexities

Authors: Rahul Jain, Zhaohui Wei, Penghui Yao, Shengyu Zhang

Abstract: The concepts of quantum correlation complexity and quantum communication complexity were recently proposed to quantify the minimum amount of resources needed in generating bipartite classical or quantum states in the single-shot setting. The former is the minimum size of the initially shared state $σ$ on which local operations by the two parties (without communication) can generate the target stat… ▽ More The concepts of quantum correlation complexity and quantum communication complexity were recently proposed to quantify the minimum amount of resources needed in generating bipartite classical or quantum states in the single-shot setting. The former is the minimum size of the initially shared state $σ$ on which local operations by the two parties (without communication) can generate the target state $ρ$, and the latter is the minimum amount of communication needed when initially sharing nothing. In this paper, we generalize these two concepts to multipartite cases, for both exact and approximate state generation. Our results are summarized as follows. (1) For multipartite pure states, the correlation complexity can be completely characterized by local ranks of sybsystems. (2) We extend the notion of PSD-rank of matrices to that of tensors, and use it to bound the quantum correlation complexity for generating multipartite classical distributions. (3) For generating multipartite mixed quantum states, communication complexity is not always equal to correlation complexity (as opposed to bipartite case). But they differ by at most a factor of 2. Generating a multipartite mixed quantum state has the same communication complexity as generating its optimal purification. But for correlation complexity of these two tasks can be different (though still related by less than a factor of 2). (4) To generate a bipartite classical distribution $P(x,y)$ approximately, the quantum communication complexity is completely characterized by the approximate PSD-rank of $P$. The quantum correlation complexity of approximately generating multipartite pure states is bounded by approximate local ranks. △ Less

Submitted 17 July, 2014; v1 submitted 23 May, 2014; originally announced May 2014.

Comments: 19 pages; some typos are corrected

Showing 1–50 of 59 results for author: Yao, P