Search | arXiv e-print repository

Verifiable cloud-based variational quantum algorithms

Authors: Junhong Yang, Banghai Wang, Junyu Quan, Qin Li

Abstract: Variational quantum algorithms (VQAs) have shown potential for quantum advantage with noisy intermediate-scale quantum (NISQ) devices for quantum machine learning (QML). However, given the high cost and limited availability of quantum resources, delegating VQAs via cloud networks is a more practical solution for clients with limited quantum capabilities. Recently, Shingu et al.[Physical Review A,… ▽ More Variational quantum algorithms (VQAs) have shown potential for quantum advantage with noisy intermediate-scale quantum (NISQ) devices for quantum machine learning (QML). However, given the high cost and limited availability of quantum resources, delegating VQAs via cloud networks is a more practical solution for clients with limited quantum capabilities. Recently, Shingu et al.[Physical Review A, 105, 022603 (2022)] proposed a variational secure cloud quantum computing protocol, utilizing ancilla-driven quantum computation (ADQC) for cloud-based VQAs with minimal quantum resource consumption. However, their protocol lacks verifiability, which exposes it to potential malicious behaviors by the server. Additionally, channel loss requires frequent re-delegation as the size of the delegated variational circuit grows, complicating verification due to increased circuit complexity. This paper introduces a new protocol to address these challenges and enhance both verifiability and tolerance to channel loss in cloud-based VQAs. △ Less

Submitted 3 September, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.12908 [pdf, other]

Carrier Mobility of Strongly Anharmonic Materials from First Principles

Authors: Jingkai Quan, Christian Carbogno, Matthias Scheffler

Abstract: First-principle approaches for phonon-limited electronic transport are typically based on many-body perturbation theory and transport equations. With that, they rely on the validity of the quasi-particle picture for electrons and phonons, which is known to fail in strongly anharmonic systems. In this work, we demonstrated the relevance of effects beyond the quasi-particle picture by combining ab i… ▽ More First-principle approaches for phonon-limited electronic transport are typically based on many-body perturbation theory and transport equations. With that, they rely on the validity of the quasi-particle picture for electrons and phonons, which is known to fail in strongly anharmonic systems. In this work, we demonstrated the relevance of effects beyond the quasi-particle picture by combining ab initio molecular dynamics and the Kubo-Greenwood (KG) formalism to establish a non-perturbative, stochastic method to calculate carrier mobilities while accounting for all orders of anharmonic and electron-vibrational couplings. In particular, we propose and exploit several numerical strategies that overcome the notoriously slow convergence of the KG formalism for both electronic and nuclear degree of freedom in crystalline solids. The capability of this method is demonstrated by calculating the temperature-dependent electron mobility of the strongly anharmonic oxide perovskites SrTiO3 and BaTiO3 across a wide range of temperatures. We show that the temperature-dependence of the mobility is largely driven by anharmonic, higher-order coupling effects and rationalize these trends in terms of the non-perturbative electronic spectral functions. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: 21 pages, 13 Figures

arXiv:2408.07032 [pdf]

QIris: Quantum Implementation of Rainbow Table Attacks

Authors: Lee Jun Quan, Tan Jia Ye, Goh Geok Ling, Vivek Balachandran

Abstract: This paper explores the use of Grover's Algorithm in the classical rainbow table, uncovering the potential of integrating quantum computing techniques with conventional cryptographic methods to develop a Quantum Rainbow Table Proof-of-Concept. This leverages on Quantum concepts and algorithms which includes the principle of qubit superposition, entanglement and teleportation, coupled with Grover's… ▽ More This paper explores the use of Grover's Algorithm in the classical rainbow table, uncovering the potential of integrating quantum computing techniques with conventional cryptographic methods to develop a Quantum Rainbow Table Proof-of-Concept. This leverages on Quantum concepts and algorithms which includes the principle of qubit superposition, entanglement and teleportation, coupled with Grover's Algorithm to enable a more efficient search through the rainbow table. The paper also details on the hardware constraints and the work around to produce better results in the implementation stages. Through this work we develop a working prototype of quantum rainbow table and demonstrate how quantum computing could significantly improve the speed of cyber tools such as password crackers and thus impact the cyber security landscape. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2407.17025 [pdf, other]

Distinct moiré trions in a twisted semiconductor homobilayer

Authors: Zhida Liu, Haonan Wang, Xiaohui Liu, Yue Ni, Frank Gao, Saba Arash, Dong Seob Kim, Xiangcheng Liu, Yongxin Zeng, Jiamin Quan, Di Huang, Kenji Watanabe, Takashi Taniguchi, Edoardo Baldini, Allan H. MacDonald, Chih-Kang Shih, Li Yang, Xiaoqin Li

Abstract: Many fascinating properties discovered in graphene and transition metal dichalcogenide (TMD) moiré superlattices originate from flat bands and enhanced many-body effects. Here, we discover new many-electron excited states in TMD homobilayers. As optical resonances evolve with twist angle and doping in MoSe$_2$ bilayers, a unique type of ``charge-transfer" trions is observed when gradual changes in… ▽ More Many fascinating properties discovered in graphene and transition metal dichalcogenide (TMD) moiré superlattices originate from flat bands and enhanced many-body effects. Here, we discover new many-electron excited states in TMD homobilayers. As optical resonances evolve with twist angle and doping in MoSe$_2$ bilayers, a unique type of ``charge-transfer" trions is observed when gradual changes in atomic alignment between the layers occur. In real space, the optically excited electron-hole pair mostly resides in a different site from the doped hole in a moiré supercell. In momentum space, the electron-hole pair forms in the single-particle-band $K$-valley, while the hole occupies the $Γ$-valley. The rich internal structure of this trion resonance arises from the ultra-flatness of the first valence band and the distinct influence of moiré potential modulation on holes and excitons. Our findings open new routes to realizing photon-spin transduction or implementing moiré quantum simulators with independently tunable fermion and boson densities. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 11 pages, 10 figures

arXiv:2406.09813 [pdf, other]

Diffuse X-ray Explorer: a high-resolution X-ray spectroscopic sky surveyor on the China Space Station

Authors: Hai Jin, Junjie Mao, Liubiao Chen, Naihui Chen, Wei Cui, Bo Gao, Jinjin Li, Xinfeng Li, Jiejia Liu, Jia Quan, Chunyang Jiang, Guole Wang, Le Wang, Qian Wang, Sifan Wang, Aimin Xiao, Shuo Zhang

Abstract: DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan… ▽ More DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan a large fraction of the sky. High-resolution X-ray spectroscopy, enabled by superconducting microcalorimeters based on the transition-edge sensor (TES) technology, will probe the physical properties (e.g., temperature, density, elemental abundances, kinematics) of the Galactic hot baryons. This will complement the high-resolution imaging data obtained with the eROSITA mission. Here we present the preliminary design of DIXE. The payload consists mainly of a detector assembly and a cryogenic cooling system. The key components of the detector assembly are a microcalorimeter array and frequency-domain multiplexing readout electronics. To provide a working temperature for the detector assembly, the cooling system consists of an adiabatic demagnetization refrigerator and a mechanical cryocooler system. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 6 figures, the full version is published by Journal of Low Temperature Physics

arXiv:2401.01195 [pdf, ps, other]

Deep Learning Driven Buffer-Aided Cooperative Networks for B5G/6G: Challenges, Solutions, and Future Opportunities

Authors: Peng Xu, Gaojie Chen, Jianping Quan, Chong Huang, Ioannis Krikidis, Kai-Kit Wong, Chan-Byoung Chae

Abstract: Buffer-aided cooperative networks (BACNs) have garnered significant attention due to their potential applications in beyond fifth generation (B5G) or sixth generation (6G) critical scenarios. This article explores various typical application scenarios of buffer-aided relaying in B5G/6G networks to emphasize the importance of incorporating BACN. Additionally, we delve into the crucial technical cha… ▽ More Buffer-aided cooperative networks (BACNs) have garnered significant attention due to their potential applications in beyond fifth generation (B5G) or sixth generation (6G) critical scenarios. This article explores various typical application scenarios of buffer-aided relaying in B5G/6G networks to emphasize the importance of incorporating BACN. Additionally, we delve into the crucial technical challenges in BACN, including stringent delay constraints, high reliability, imperfect channel state information (CSI), transmission security, and integrated network architecture. To address the challenges, we propose leveraging deep learning-based methods for the design and operation of B5G/6G networks with BACN, deviating from conventional buffer-aided relay selection approaches. In particular, we present two case studies to demonstrate the efficacy of centralized deep reinforcement learning (DRL) and decentralized DRL in buffer-aided non-terrestrial networks. Finally, we outline future research directions in B5G/6G that pertain to the utilization of BACN. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 9 Pages, accepted for publication in IEEE Wireless Communications

arXiv:2312.09187 [pdf, other]

Vision-Language Models as a Source of Rewards

Authors: Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Dmitry Nikulin, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald , et al. (2 additional authors not shown)

Abstract: Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of… ▽ More Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents. △ Less

Submitted 12 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 10 pages, 5 figures

arXiv:2301.10433 [pdf, other]

Delegated variational quantum algorithms based on quantum homomorphic encryption

Authors: Qin Li, Junyu Quan, Jinjing Shi, Shichao Zhang, Xuelong Li

Abstract: Variational quantum algorithms (VQAs) are considered as one of the most promising candidates for achieving quantum advantages on quantum devices in the noisy intermediate-scale quantum (NISQ) era. They have been developed for numerous applications such as image processing and solving linear systems of equations. The application of VQAs can be greatly enlarged if users with limited quantum capabili… ▽ More Variational quantum algorithms (VQAs) are considered as one of the most promising candidates for achieving quantum advantages on quantum devices in the noisy intermediate-scale quantum (NISQ) era. They have been developed for numerous applications such as image processing and solving linear systems of equations. The application of VQAs can be greatly enlarged if users with limited quantum capabilities can run them on remote powerful quantum computers. But the private data of clients may be leaked to quantum servers in such a quantum cloud model. To solve the problem, a novel quantum homomorphic encryption (QHE) scheme which is client-friendly and suitable for VQAs is constructed for quantum servers to calculate encrypted data. Then delegated VQAs are proposed based on the given QHE scheme, where the server can train the ansatz circuit using the client's data even without knowing the real input and the output of the client. Furthermore, a delegated variational quantum classifier to identify handwritten digit images is given as a specific example of delegated VQAs and simulated on the cloud platform of Original Quantum to show its feasibility. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 12 pages, 15 figures

arXiv:2301.07593 [pdf, other]

doi 10.1038/s41586-023-06275-2

Cavity-controlled magneto-optical properties of a strongly coupled van der Waals magnet

Authors: Florian Dirnberger, Jiamin Quan, Rezlind Bushati, Geoffrey Diederich, Matthias Florian, Julian Klein, Kseniia Mosina, Zdenek Sofer, Xiaodong Xu, Akashdeep Kamra, Francisco J. García-Vidal, Andrea Alù, Vinod M. Menon

Abstract: Controlling the properties of quantum materials with light is of fundamental and technological importance. While high-power lasers may achieve this goal, more practical strategies aim to exploit the strong coupling of light and matter in optical cavities, which has recently been shown to affect elemental physical phenomena, like superconductivity, phase transitions, and topological protection. Her… ▽ More Controlling the properties of quantum materials with light is of fundamental and technological importance. While high-power lasers may achieve this goal, more practical strategies aim to exploit the strong coupling of light and matter in optical cavities, which has recently been shown to affect elemental physical phenomena, like superconductivity, phase transitions, and topological protection. Here we report the capacity of strong light-matter coupling to modify and control the magneto-optical properties of magnets. Tuning the hybridization of magnetic excitons and cavity photons allows us to realize distinct optical signatures of external magnetic fields and magnons in the archetypal van der Waals magnetic semiconductor CrSBr. These results highlight novel directions for cavity-controlled magneto-optics and the manipulation of quantum material properties by strong light-matter coupling. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2212.00734 [pdf]

doi 10.1038/s41467-023-39339-y

Interaction-driven transport of dark excitons in 2D semiconductors with phonon-mediated optical readout

Authors: Saroj B. Chand, John M. Woods, Jiamin Quan, Enrique Mejia, Takashi Taniguchi, Kenji Watanabe, Andrea Alù, Gabriele Grosso

Abstract: The growing field of quantum information technology requires propagation of information over long distances with efficient readout mechanisms. Excitonic quantum fluids have emerged as a powerful platform for this task due to their straightforward electro-optical conversion. In two-dimensional transition metal dichalcogenides, the coupling between spin and valley provides exciting opportunities for… ▽ More The growing field of quantum information technology requires propagation of information over long distances with efficient readout mechanisms. Excitonic quantum fluids have emerged as a powerful platform for this task due to their straightforward electro-optical conversion. In two-dimensional transition metal dichalcogenides, the coupling between spin and valley provides exciting opportunities for harnessing, manipulating and storing bits of information. However, the large inhomogeneity of single layers cannot be overcome by the properties of bright excitons, hindering spin-valley transport. Nonetheless, the rich band structure supports dark excitonic states with strong binding energy and longer lifetime, ideally suited for long-range transport. Here we show that dark excitons can diffuse over several micrometers and prove that this repulsion-driven propagation is robust across non-uniform samples. The long-range propagation of dark states with an optical readout mediated by chiral phonons provides a new concept of excitonic devices for applications in both classical and quantum information technology. △ Less

Submitted 10 April, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.08727 [pdf]

Coherent Perfect Absorption in Chaotic Optical Microresonators for Efficient Modal Control

Authors: Xuefeng Jiang, Shixiong Yin, Huanan Li, Jiamin Quan, Michele Cotrufo, Julius Kullig, Jan Wiersig, Andrea Alù

Abstract: Non-Hermitian wave engineering has attracted a surge of interest in photonics in recent years. One of the prominent phenomena is coherent perfect absorption (CPA), in which the annihilation of electromagnetic scattering occurs by destructive interference of multiple incident waves. This concept has been implemented in various platforms to demonstrate real-time control of absorption, scattering and… ▽ More Non-Hermitian wave engineering has attracted a surge of interest in photonics in recent years. One of the prominent phenomena is coherent perfect absorption (CPA), in which the annihilation of electromagnetic scattering occurs by destructive interference of multiple incident waves. This concept has been implemented in various platforms to demonstrate real-time control of absorption, scattering and radiation by varying the relative phase of the excitation signals. However, so far these studies have been limited to simple photonic systems involving single or few modes at well-defined resonant frequencies. Realizing CPA in more complex photonic systems is challenging because it typically requires engineering the interplay of a large number of resonances featuring large spatial complexity within a narrow frequency range. Here, we extend the paradigm of coherent control of light to a complex photonic system involving more than 1,000 optical modes in a chaotic microresonator. We efficiently model the optical fields within a quasi-normal mode (QNM) expansion, and experimentally demonstrate chaotic CPA states, as well as their non-Hermitian degeneracies, which we leverage to efficiently control the cavity excitation through the input phases of multiple excitation channels. Our results shed light on the universality of non-Hermitian physics beyond simple resonant systems, paving the way for new opportunities in the science and technology of complex nanophotonic systems by chaotic wave interference. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: 22 pages, 5 figures

arXiv:2210.09878 [pdf, other]

Ancilla-driven blind quantum computation for clients with different quantum capabilities

Authors: Qunfeng Dai, Junyu Quan, Xiaoping Lou, Qin Li

Abstract: Blind quantum computation (BQC) allows a client with limited quantum power to delegate his quantum computational task to a powerful server and still keep his input, output, and algorithm private. There are mainly two kinds of models about BQC, namely circuit-based and measurement-based models. In addition, a hybrid model called ancilla-driven universal blind quantum computing (ADBQC) was proposed… ▽ More Blind quantum computation (BQC) allows a client with limited quantum power to delegate his quantum computational task to a powerful server and still keep his input, output, and algorithm private. There are mainly two kinds of models about BQC, namely circuit-based and measurement-based models. In addition, a hybrid model called ancilla-driven universal blind quantum computing (ADBQC) was proposed by combining the properties of both circuit-based and measurement-based models, where all unitary operations on the register qubits can be realized with the aid of single ancillae coupled to the register qubits. However, in the ADBQC model, the quantum capability of the client is strictly limited to preparing single qubits. If a client can only perform single-qubit measurements or a few simple quantum gates, he may also want to delegate his computation to a remote server via ADBQC. This paper solves the problem and extends the existing model by proposing two types of ADBQC protocols for clients with different quantum capabilities, such as performing single-qubit measurements or single-qubit gates. Furthermore, in the proposed two ADBQC protocols, clients can detect whether servers are honest or not with a high probability by using corresponding verifiable techniques. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: 14 pages, 6 figures

arXiv:2210.09830 [pdf, other]

Verifiable blind quantum computation with identity authentication for different types of clients

Authors: Junyu Quan, Qin Li, Lvzhou Li

Abstract: Quantum computing has considerable advantages in solving some problems over its classical counterpart. Currently various physical systems are developed to construct quantum computers but it is still challenging and the first use of quantum computers may adopt the cloud style. Blind quantum computing (BQC) provides a solution for clients with limited quantum capabilities to delegate their quantum c… ▽ More Quantum computing has considerable advantages in solving some problems over its classical counterpart. Currently various physical systems are developed to construct quantum computers but it is still challenging and the first use of quantum computers may adopt the cloud style. Blind quantum computing (BQC) provides a solution for clients with limited quantum capabilities to delegate their quantum computation to remote quantum servers while keeping input, output, and even algorithm private. In this paper, we propose three multi-party verifiable blind quantum computing (VBQC) protocols with identity authentication to handle clients with varying quantum capabilities in quantum networks, such as those who can just make measurements, prepare single qubits, or perform a few single-qubit gates. They are client-friendly and flexible since the clients can achieve BQC depending on their own quantum devices and resist both insider outsider attacks in quantum networks. Furthermore, all the three proposed protocols are verifiable, namely that the clients can verify the correctness of their calculations. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: 27 pages, 3 figures

arXiv:2207.02884 [pdf, other]

doi 10.1021/acsnano.2c07655

Sensing the local magnetic environment through optically active defects in a layered magnetic semiconductor

Authors: Julian Klein, Zhigang Song, Benjamin Pingault, Florian Dirnberger, Hang Chi, Jonathan B. Curtis, Rami Dana, Rezlind Bushati, Jiamin Quan, Lukas Dekanovsky, Zdenek Sofer, Andrea Alù, Vinod M. Menon, Jagadeesh S. Moodera, Marko Lončar, Prineha Narang, Frances M. Ross

Abstract: Atomic-level defects in van der Waals (vdW) materials are essential building blocks for quantum technologies and quantum sensing applications. The layered magnetic semiconductor CrSBr is an outstanding candidate for exploring optically active defects owing to a direct gap in addition to a rich magnetic phase diagram including a recently hypothesized defect-induced magnetic order at low temperature… ▽ More Atomic-level defects in van der Waals (vdW) materials are essential building blocks for quantum technologies and quantum sensing applications. The layered magnetic semiconductor CrSBr is an outstanding candidate for exploring optically active defects owing to a direct gap in addition to a rich magnetic phase diagram including a recently hypothesized defect-induced magnetic order at low temperature. Here, we show optically active defects in CrSBr that are probes of the local magnetic environment. We observe spectrally narrow (1 meV) defect emission in CrSBr that is correlated with both the bulk magnetic order and an additional low temperature defect-induced magnetic order. We elucidate the origin of this magnetic order in the context of local and non-local exchange coupling effects. Our work establishes vdW magnets like CrSBr as an exceptional platform to optically study defects that are correlated with the magnetic lattice. We anticipate that controlled defect creation allows for tailor-made complex magnetic textures and phases with the unique ingredient of direct optical access. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: main: 12 pages, 5 figures; SI: 14 pages, 11 figures

Journal ref: ACS Nano 17, 288-299 (2023)

arXiv:2206.11629 [pdf, other]

Global Sensing and Measurements Reuse for Image Compressed Sensing

Authors: Zi-En Fan, Feng Lian, Jia-Ni Quan

Abstract: Recently, deep network-based image compressed sensing methods achieved high reconstruction quality and reduced computational overhead compared with traditional methods. However, existing methods obtain measurements only from partial features in the network and use them only once for image reconstruction. They ignore there are low, mid, and high-level features in the network\cite{zeiler2014visualiz… ▽ More Recently, deep network-based image compressed sensing methods achieved high reconstruction quality and reduced computational overhead compared with traditional methods. However, existing methods obtain measurements only from partial features in the network and use them only once for image reconstruction. They ignore there are low, mid, and high-level features in the network\cite{zeiler2014visualizing} and all of them are essential for high-quality reconstruction. Moreover, using measurements only once may not be enough for extracting richer information from measurements. To address these issues, we propose a novel Measurements Reuse Convolutional Compressed Sensing Network (MR-CCSNet) which employs Global Sensing Module (GSM) to collect all level features for achieving an efficient sensing and Measurements Reuse Block (MRB) to reuse measurements multiple times on multi-scale. Finally, experimental results on three benchmark datasets show that our model can significantly outperform state-of-the-art methods. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2206.00730 [pdf, other]

The Phenomenon of Policy Churn

Authors: Tom Schaul, André Barreto, John Quan, Georg Ostrovski

Abstract: We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states within a handful of learning updates (in a typical deep RL set-up such as DQN on Atari). We characterise the phenomenon empirically, verifying that it… ▽ More We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states within a handful of learning updates (in a typical deep RL set-up such as DQN on Atari). We characterise the phenomenon empirically, verifying that it is not limited to specific algorithm or environment properties. A number of ablations help whittle down the plausible explanations on why churn occurs to just a handful, all related to deep learning. Finally, we hypothesise that policy churn is a beneficial but overlooked form of implicit exploration that casts $ε$-greedy exploration in a fresh light, namely that $ε$-noise plays a much smaller role than expected. △ Less

Submitted 20 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Published at NeurIPS 2022

MSC Class: 68T07 ACM Class: I.2.6

arXiv:2205.13456 [pdf, other]

doi 10.1021/acsnano.2c07316

The bulk van der Waals layered magnet CrSBr is a quasi-1D material

Authors: Julian Klein, Benjamin Pingault, Matthias Florian, Marie-Christin Heißenbüttel, Alexander Steinhoff, Zhigang Song, Kierstin Torres, Florian Dirnberger, Jonathan B. Curtis, Mads Weile, Aubrey Penn, Thorsten Deilmann, Rami Dana, Rezlind Bushati, Jiamin Quan, Jan Luxa, Zdenek Sofer, Andrea Alù, Vinod M. Menon, Ursula Wurstbauer, Michael Rohlfing, Prineha Narang, Marko Lončar, Frances M. Ross

Abstract: Correlated quantum phenomena in one-dimensional (1D) systems that exhibit competing electronic and magnetic order are of strong interest for studying fundamental interactions and excitations, such as Tomonaga-Luttinger liquids and topological orders and defects with properties completely different from the quasiparticles expected in their higher-dimensional counterparts. However, clean 1D electron… ▽ More Correlated quantum phenomena in one-dimensional (1D) systems that exhibit competing electronic and magnetic order are of strong interest for studying fundamental interactions and excitations, such as Tomonaga-Luttinger liquids and topological orders and defects with properties completely different from the quasiparticles expected in their higher-dimensional counterparts. However, clean 1D electronic systems are difficult to realize experimentally, particularly magnetically ordered systems. Here, we show that the van der Waals layered magnetic semiconductor CrSBr behaves like a quasi-1D material embedded in a magnetically ordered environment. The strong 1D electronic character originates from the Cr-S chains and the combination of weak interlayer hybridization and anisotropy in effective mass and dielectric screening with an effective electron mass ratio of $m^e_X/m^e_Y \sim 50$. This extreme anisotropy experimentally manifests in strong electron-phonon and exciton-phonon interactions, a Peierls-like structural instability and a Fano resonance from a van Hove singularity of similar strength of metallic carbon nanotubes. Moreover, due to the reduced dimensionality and interlayer coupling, CrSBr hosts spectrally narrow (1 meV) excitons of high binding energy and oscillator strength that inherit the 1D character. Overall, CrSBr is best understood as a stack of weakly hybridized monolayers and appears to be an experimentally attractive candidate for the study of exotic exciton and 1D correlated many-body physics in the presence of magnetic order. △ Less

Submitted 2 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: main: 16 pages, 5 figures; SI: 15 pages, 9 figures

Journal ref: ACS Nano (2023)

arXiv:2203.16189 [pdf]

doi 10.1103/PhysRevB.106.125302

Quantitative determination of interlayer electronic coupling at various critical points in bilayer MoS2

Authors: Wei-Ting Hsu, Jiamin Quan, Chi-Ruei Pan, Peng-Jen Chen, Mei-Yin Chou, Wen-Hao Chang, Allan H MacDonald, Xiaoqin Li, Jung-Fu Lin, Chih-Kang Shih

Abstract: Tailoring interlayer coupling has emerged as a powerful tool to tune the electronic structure of van der Waals (vdW) bilayers. One example is the usage of the moire pattern to create controllable two-dimensional electronic superlattices through the configurational dependence of interlayer electronic couplings. This approach has led to some remarkable discoveries in twisted graphene bilayers, and t… ▽ More Tailoring interlayer coupling has emerged as a powerful tool to tune the electronic structure of van der Waals (vdW) bilayers. One example is the usage of the moire pattern to create controllable two-dimensional electronic superlattices through the configurational dependence of interlayer electronic couplings. This approach has led to some remarkable discoveries in twisted graphene bilayers, and transition metal dichalcogenide (TMD) homo- and hetero-bilayers. However, a largely unexplored factor is the interlayer distance, d, which can impact the interlayer coupling strength exponentially. In this letter, we quantitatively determine the coupling strengths as a function of interlayer spacing at various critical points of the Brillouin zone in bilayer MoS2. The exponential dependence of the coupling parameter on the gap distance is demonstrated. Most significantly, we achieved a 280% enhancement of K-valley coupling strength with an 8% reduction of the vdW gap, pointing to a new strategy in designing a novel electronic system in vdW bilayers. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.10728 [pdf, ps, other]

Parametric Euler Sums of Harmonic Numbers

Authors: Junjie Quan, Xiyu Wang, Xiaoxue Wei, Ce Xu

Abstract: We define a parametric variant of generalized Euler sums and construct contour integration to give some explicit evaluations of these parametric Euler sums. In particular, we establish several explicit formulas of (Hurwitz) zeta functions, linear and quadratic parametric Euler sums. Furthermore, we also give an explicit evaluation of alternating double zeta values $\ze(\overline{2j},2m+1)$ in term… ▽ More We define a parametric variant of generalized Euler sums and construct contour integration to give some explicit evaluations of these parametric Euler sums. In particular, we establish several explicit formulas of (Hurwitz) zeta functions, linear and quadratic parametric Euler sums. Furthermore, we also give an explicit evaluation of alternating double zeta values $\ze(\overline{2j},2m+1)$ in terms of a combination of alternating Riemann zeta values by using the parametric Euler sums. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.10001 [pdf, other]

FORCE: A Framework of Rule-Based Conversational Recommender System

Authors: Jun Quan, Ze Wei, Qiang Gan, Jingqi Yao, Jingyi Lu, Yuchen Dong, Yiming Liu, Yi Zeng, Chao Zhang, Yongzhi Li, Huang Hu, Yingying He, Yang Yang, Daxin Jiang

Abstract: The conversational recommender systems (CRSs) have received extensive attention in recent years. However, most of the existing works focus on various deep learning models, which are largely limited by the requirement of large-scale human-annotated datasets. Such methods are not able to deal with the cold-start scenarios in industrial products. To alleviate the problem, we propose FORCE, a Framewor… ▽ More The conversational recommender systems (CRSs) have received extensive attention in recent years. However, most of the existing works focus on various deep learning models, which are largely limited by the requirement of large-scale human-annotated datasets. Such methods are not able to deal with the cold-start scenarios in industrial products. To alleviate the problem, we propose FORCE, a Framework Of Rule-based Conversational Recommender system that helps developers to quickly build CRS bots by simple configuration. We conduct experiments on two datasets in different languages and domains to verify its effectiveness and usability. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: AAAI 2022 (Demonstration Track)

arXiv:2111.00950 [pdf, other]

Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation

Authors: Jianning Quan, A. Ben Hamza

Abstract: Estimating a 3D human pose has proven to be a challenging task, primarily because of the complexity of the human body joints, occlusions, and variability in lighting conditions. In this paper, we introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation. Using multi-hop neighborhoods for node feature aggregation, our model is able to capt… ▽ More Estimating a 3D human pose has proven to be a challenging task, primarily because of the complexity of the human body joints, occlusions, and variability in lighting conditions. In this paper, we introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation. Using multi-hop neighborhoods for node feature aggregation, our model is able to capture the long-range dependencies between body joints. Moreover, our approach leverages residual connections, which are integrated by design in our network architecture, ensuring that the learned feature representations retain important information from the initial features of the input layer as the network depth increases. Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model, achieving superior performance over strong baseline methods for 3D human pose estimation. △ Less

Submitted 1 November, 2021; originally announced November 2021.

Journal ref: British Machine Vision Conference, 2021

arXiv:2108.02296 [pdf]

doi 10.1038/s41467-021-25311-1

Superior Photo-carrier Diffusion Dynamics in Organic-inorganic Hybrid Perovskites Revealed by Spatiotemporal Conductivity Imaging

Authors: Xuejian Ma, Fei Zhang, Zhaodong Chu, Ji Hao, Xihan Chen, Jiamin Quan, Zhiyuan Huang, Xiaoming Wang, Xiaoqin Li, Yanfa Yan, Kai Zhu, Keji Lai

Abstract: The outstanding performance of organic-inorganic metal trihalide solar cells benefits from the exceptional photo-physical properties of both electrons and holes in the material. Here, we directly probe the free-carrier dynamics in Cs-doped FAPbI3 thin films by spatiotemporal photoconductivity imaging. Using charge transport layers to selectively quench one type of carriers, we show that the two re… ▽ More The outstanding performance of organic-inorganic metal trihalide solar cells benefits from the exceptional photo-physical properties of both electrons and holes in the material. Here, we directly probe the free-carrier dynamics in Cs-doped FAPbI3 thin films by spatiotemporal photoconductivity imaging. Using charge transport layers to selectively quench one type of carriers, we show that the two relaxation times on the order of 1 microsecond and 10 microseconds correspond to the lifetimes of electrons and holes in FACsPbI3, respectively. Strikingly, the diffusion mapping indicates that the difference in electron/hole lifetimes is largely compensated by their disparate mobility. Consequently, the long diffusion lengths (3 ~ 5 micrometers) of both carriers are comparable to each other, a feature closely related to the unique charge trapping and de-trapping processes in hybrid trihalide perovskites. Our results unveil the origin of superior diffusion dynamics in this material, crucially important for solar-cell applications. △ Less

Submitted 4 August, 2021; originally announced August 2021.

arXiv:2104.06272 [pdf, other]

Podracer architectures for scalable Reinforcement Learning

Authors: Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt

Abstract: Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems.Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive part… ▽ More Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems.Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive parts of training and inference in modern deep learning systems. Popular training pipelines that use these frameworks for deep learning typically focus on (un-)supervised learning. How to best train reinforcement learning (RL) agents at scale is still an active research area. In this report we argue that TPUs are particularly well suited for training RL agents in a scalable, efficient and reproducible way. Specifically we describe two architectures designed to make the best use of the resources available on a TPU Pod (a special configuration in a Google data center that features multiple TPU devices connected to each other by extremely low latency communication channels). △ Less

Submitted 13 April, 2021; originally announced April 2021.

arXiv:2102.08553 [pdf, ps, other]

Integrating Pre-trained Model into Rule-based Dialogue Management

Authors: Jun Quan, Meng Yang, Qiang Gan, Deyi Xiong, Yiming Liu, Yuchen Dong, Fangxin Ouyang, Jun Tian, Ruiling Deng, Yongzhi Li, Yang Yang, Daxin Jiang

Abstract: Rule-based dialogue management is still the most popular solution for industrial task-oriented dialogue systems for their interpretablility. However, it is hard for developers to maintain the dialogue logic when the scenarios get more and more complex. On the other hand, data-driven dialogue systems, usually with end-to-end structures, are popular in academic research and easier to deal with compl… ▽ More Rule-based dialogue management is still the most popular solution for industrial task-oriented dialogue systems for their interpretablility. However, it is hard for developers to maintain the dialogue logic when the scenarios get more and more complex. On the other hand, data-driven dialogue systems, usually with end-to-end structures, are popular in academic research and easier to deal with complex conversations, but such methods require plenty of training data and the behaviors are less interpretable. In this paper, we propose a method to leverages the strength of both rule-based and data-driven dialogue managers (DM). We firstly introduce the DM of Carina Dialog System (CDS, an advanced industrial dialogue system built by Microsoft). Then we propose the "model-trigger" design to make the DM trainable thus scalable to scenario changes. Furthermore, we integrate pre-trained models and empower the DM with few-shot capability. The experimental results demonstrate the effectiveness and strong few-shot capability of our method. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: AAAI 2021 Demo Paper

arXiv:2101.03294 [pdf]

Observation of protected localized states induced by curved space in acoustic topological insulators

Authors: H. W. Wu, J. Q. Quan, Y. K. Liu, Y. Pan, Z. Q. Sheng, L. W. Jing

Abstract: Topological insulators (TIs) with robust boundary states against perturbations and disorders provide a unique approach for manipulating waves, whereas curved space can effectively control the wave propagation on curved surfaces by the geometric potential effect as well. In general, two-dimensional (2D) TIs are designed on a flat surface; however, in most practical cases, curved topological structu… ▽ More Topological insulators (TIs) with robust boundary states against perturbations and disorders provide a unique approach for manipulating waves, whereas curved space can effectively control the wave propagation on curved surfaces by the geometric potential effect as well. In general, two-dimensional (2D) TIs are designed on a flat surface; however, in most practical cases, curved topological structures are required. In this study, we design a 2D curved acoustic TI by perforation on a curved rigid plate. We experimentally demonstrate that a topological localized state stands erect in the bulk gap, and the corresponding pressure distributions are confined at the position with the maximal curvature. Moreover, we experimentally verify the robustness of the topological localized state by introducing defects near the localized position. To understand the underlying mechanism of the topological localized state, a tight-binding model considering the geometric potential effect is proposed. The interaction between the geometrical curvature and topology in the system provides a novel scheme for manipulating and trapping wave propagation along the boundary of curved TIs, thereby offering potential applications in flexible devices. △ Less

Submitted 9 January, 2021; originally announced January 2021.

arXiv:2101.00315 [pdf, other]

A phase defect framework for the analysis of cardiac arrhythmia patterns

Authors: Louise Arno, Jan Quan, Nhan T. Nguyen, Maarten Vanmarcke, Elena G. Tolkacheva, Hans Dierckx

Abstract: During cardiac arrhythmias, dynamical patterns of electrical activation form and evolve, which are of interest to understand and cure heart rhythm disorders. The analysis of these patterns is commonly performed by calculating the local activation phase and searching for phase singularities (PSs), i.e. points around which all phases are present. Here we propose an alternative framework, which focus… ▽ More During cardiac arrhythmias, dynamical patterns of electrical activation form and evolve, which are of interest to understand and cure heart rhythm disorders. The analysis of these patterns is commonly performed by calculating the local activation phase and searching for phase singularities (PSs), i.e. points around which all phases are present. Here we propose an alternative framework, which focuses on phase defect lines (PDLs) and surfaces (PDSs) as more general mechanisms, which include PSs as a specific case. The proposed framework enables two conceptual unifications: between the local activation time and phase description, and between conduction block lines and the central regions of linear-core rotors. A simple PDL detection method is proposed and applied to data from simulations and optical mapping experiments. Our analysis of ventricular tachycardia in rabbit hearts $(n=6)$ shows that nearly all detected PSs were found on PDLs, but the PDLs had a significantly longer lifespan than the detected PSs. Since the proposed framework revisits basic building blocks of cardiac activation patterns, it can become a useful tool for further theory development and experimental analysis. △ Less

Submitted 8 September, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

Comments: second version, modified according to reviewer comments

arXiv:2012.08217 [pdf]

doi 10.1002/lpor.202100469

Photonic Floquet time crystals

Authors: Bing Wang, Jiaqi Quan, Jianfei Han, Xiaopeng Shen, Hongwei Wu, Yiming Pan

Abstract: The public and scientists constantly have different perspectives. While on a time crystal, they stand in line and ask: What is a time crystal? Show me a material that is spontaneously crystalline in time? This study synthesizes a photonic material of Floquet time crystals and experimentally observes its indicative period-2T beating. We explicitly reconstruct a discrete time-crystalline ground stat… ▽ More The public and scientists constantly have different perspectives. While on a time crystal, they stand in line and ask: What is a time crystal? Show me a material that is spontaneously crystalline in time? This study synthesizes a photonic material of Floquet time crystals and experimentally observes its indicative period-2T beating. We explicitly reconstruct a discrete time-crystalline ground state and reveal using an appropriately-designed photonic Floquet simulator the rigid period-doubling as a signature of the spontaneous breakage of the discrete time-translational symmetry. Unlike the result of the exquisite many-body interaction, the photonic time crystal is derived from a single-particle topological phase that can be extensively accessed by many pertinent nonequilibrium and periodically-driven platforms. Our observation will drive theoretical and technological interests toward condensed matter physics and topological photonics, and demystify time crystals for the non-scientific public. △ Less

Submitted 19 January, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: 39 pages, 5 figures, supplementary materials, 6 suppl. figures

Journal ref: Laser & Photonics Reviews (2022): 2100469

arXiv:2010.08738 [pdf, other]

RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling

Authors: Jun Quan, Shian Zhang, Qian Cao, Zizhong Li, Deyi Xiong

Abstract: In order to alleviate the shortage of multi-domain data and to capture discourse phenomena for task-oriented dialogue modeling, we propose RiSAWOZ, a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic Annotations. RiSAWOZ contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains, which is larger th… ▽ More In order to alleviate the shortage of multi-domain data and to capture discourse phenomena for task-oriented dialogue modeling, we propose RiSAWOZ, a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic Annotations. RiSAWOZ contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains, which is larger than all previous annotated H2H conversational datasets. Both single- and multi-domain dialogues are constructed, accounting for 65% and 35%, respectively. Each dialogue is labeled with comprehensive dialogue annotations, including dialogue goal in the form of natural language description, domain, dialogue states and acts at both the user and system side. In addition to traditional dialogue annotations, we especially provide linguistic annotations on discourse phenomena, e.g., ellipsis and coreference, in dialogues, which are useful for dialogue coreference and ellipsis resolution tasks. Apart from the fully annotated dataset, we also present a detailed description of the data collection procedure, statistics and analysis of the dataset. A series of benchmark models and results are reported, including natural language understanding (intent detection & slot filling), dialogue state tracking and dialogue context-to-text generation, as well as coreference and ellipsis resolution, which facilitate the baseline comparison for future research on this corpus. △ Less

Submitted 17 October, 2020; originally announced October 2020.

Comments: EMNLP 2020 (long paper)

arXiv:2009.10650 [pdf, other]

doi 10.1038/s41563-021-00960-1

Phonon Renormalization in Reconstructed MoS$_2$ Moiré Superlattices

Authors: Jiamin Quan, Lukas Linhart, Miao-Ling Lin, Daehun Lee, Jihang Zhu, Chun-Yuan Wang, Wei-Ting Hsu, Junho Choi, Jacob Embley, Carter Young, Takashi Taniguchi, Kenji Watanabe, Chih-Kang Shih, Keji Lai, Allan H. MacDonald, Ping-Heng Tan, Florian Libisch, Xiaoqin Li

Abstract: In moiré crystals formed by stacking van der Waals (vdW) materials, surprisingly diverse correlated electronic phases and optical properties can be realized by a subtle change in the twist angle. Here, we discover that phonon spectra are also renormalized in MoS$_2$ twisted bilayers, adding a new perspective to moiré physics. Over a range of small twist angles, the phonon spectra evolve rapidly du… ▽ More In moiré crystals formed by stacking van der Waals (vdW) materials, surprisingly diverse correlated electronic phases and optical properties can be realized by a subtle change in the twist angle. Here, we discover that phonon spectra are also renormalized in MoS$_2$ twisted bilayers, adding a new perspective to moiré physics. Over a range of small twist angles, the phonon spectra evolve rapidly due to ultra-strong coupling between different phonon modes and atomic reconstructions of the moiré pattern. We develop a new low-energy continuum model for phonons that overcomes the outstanding challenge of calculating properties of large moiré supercells and successfully captures essential experimental observations. Remarkably, simple optical spectroscopy experiments can provide information on strain and lattice distortions in moiré crystals with nanometer-size supercells. The newly developed theory promotes a comprehensive and unified understanding of structural, optical, and electronic properties of moiré superlattices. △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: 21 pages, 4 figures

arXiv:2007.15196 [pdf, other]

doi 10.1103/PhysRevLett.126.047401

Twist Angle Dependent Interlayer Exciton Lifetimes in van der Waals Heterostructures

Authors: Junho Choi, Matthias Florian, Alexander Steinhoff, Daniel Erben, Kha Tran, Dong Seob Kim, Liuyang Sun, Jiamin Quan, Robert Claassen, Somak Majumder, Jennifer A. Hollingsworth, Takashi Taniguchi, Kenji Watanabe, Keiji Ueno, Akshay Singh, Galan Moody, Frank Jahnke, Xiaoqin Li

Abstract: In van der Waals (vdW) heterostructures formed by stacking two monolayers of transition metal dichalcogenides, multiple exciton resonances with highly tunable properties are formed and subject to both vertical and lateral confinement. We investigate how a unique control knob, the twist angle between the two monolayers, can be used to control the exciton dynamics. We observe that the interlayer exc… ▽ More In van der Waals (vdW) heterostructures formed by stacking two monolayers of transition metal dichalcogenides, multiple exciton resonances with highly tunable properties are formed and subject to both vertical and lateral confinement. We investigate how a unique control knob, the twist angle between the two monolayers, can be used to control the exciton dynamics. We observe that the interlayer exciton lifetimes in $\text{MoSe}_{\text{2}}$/$\text{WSe}_{\text{2}}$ twisted bilayers (TBLs) change by one order of magnitude when the twist angle is varied from 1$^\circ$ to 3.5$^\circ$. Using a low-energy continuum model, we theoretically separate two leading mechanisms that influence interlayer exciton radiative lifetimes. The shift to indirect transitions in the momentum space with an increasing twist angle and the energy modulation from the moiré potential both have a significant impact on interlayer exciton lifetimes. We further predict distinct temperature dependence of interlayer exciton lifetimes in TBLs with different twist angles, which is partially validated by experiments. While many recent studies have highlighted how the twist angle in a vdW TBL can be used to engineer the ground states and quantum phases due to many-body interaction, our studies explore its role in controlling the dynamics of optically excited states, thus, expanding the conceptual applications of "twistronics". △ Less

Submitted 26 January, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

Journal ref: Phys. Rev. Lett. 126, 047401 (2021)

arXiv:2006.02243 [pdf, other]

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

Authors: Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

Abstract: In value-based reinforcement learning (RL), unlike in supervised learning, the agent faces not a single, stationary, approximation problem, but a sequence of value prediction problems. Each time the policy improves, the nature of the problem changes, shifting both the distribution of states and their values. In this paper we take a novel perspective, arguing that the value prediction problems face… ▽ More In value-based reinforcement learning (RL), unlike in supervised learning, the agent faces not a single, stationary, approximation problem, but a sequence of value prediction problems. Each time the policy improves, the nature of the problem changes, shifting both the distribution of states and their values. In this paper we take a novel perspective, arguing that the value prediction problems faced by an RL agent should not be addressed in isolation, but rather as a single, holistic, prediction problem. An RL algorithm generates a sequence of policies that, at least approximately, improve towards the optimal policy. We explicitly characterize the associated sequence of value functions and call it the value-improvement path. Our main idea is to approximate the value-improvement path holistically, rather than to solely track the value function of the current policy. Specifically, we discuss the impact that this holistic view of RL has on representation learning. We demonstrate that a representation that spans the past value-improvement path will also provide an accurate value approximation for future policy improvements. We use this insight to better understand existing approaches to auxiliary tasks and to propose new ones. To test our hypothesis empirically, we augmented a standard deep RL agent with an auxiliary task of learning the value-improvement path. In a study of Atari 2600 games, the augmented agent achieved approximately double the mean and median performance of the baseline agent. △ Less

Submitted 4 January, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: AAAI-21

arXiv:2004.14080 [pdf, other]

Modeling Long Context for Task-Oriented Dialogue State Generation

Authors: Jun Quan, Deyi Xiong

Abstract: Based on the recently proposed transferable dialogue state generator (TRADE) that predicts dialogue states from utterance-concatenated dialogue context, we propose a multi-task learning model with a simple yet effective utterance tagging technique and a bidirectional language model as an auxiliary task for task-oriented dialogue state generation. By enabling the model to learn a better representat… ▽ More Based on the recently proposed transferable dialogue state generator (TRADE) that predicts dialogue states from utterance-concatenated dialogue context, we propose a multi-task learning model with a simple yet effective utterance tagging technique and a bidirectional language model as an auxiliary task for task-oriented dialogue state generation. By enabling the model to learn a better representation of the long dialogue context, our approaches attempt to solve the problem that the performance of the baseline significantly drops when the input dialogue context sequence is long. In our experiments, our proposed model achieves a 7.03% relative improvement over the baseline, establishing a new state-of-the-art joint goal accuracy of 52.04% on the MultiWOZ 2.0 dataset. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: ACL 2020

arXiv:2003.01817 [pdf]

doi 10.1073/pnas.2004106117

Unveiling Defect-Mediated Carrier Dynamics in Monolayer Semiconductors by Spatiotemporal Microwave Imaging

Authors: Zhaodong Chu, Chun-Yuan Wang, Jiamin Quan, Chenhui Zhang, Chao Lei, Ali Han, Xuejian Ma, Hao-Ling Tang, Dishan Abeysinghe, Matthew Staab, Xixiang Zhang, Allan H. MacDonald, Vincent Tung, Xiaoqin Li, Chih-Kang Shih, Keji Lai

Abstract: The optoelectronic properties of atomically thin transition-metal dichalcogenides are strongly correlated with the presence of defects in the materials, which are not necessarily detrimental for certain applications. For instance, defects can lead to an enhanced photoconduction, a complicated process involving charge generation and recombination in the time domain and carrier transport in the spat… ▽ More The optoelectronic properties of atomically thin transition-metal dichalcogenides are strongly correlated with the presence of defects in the materials, which are not necessarily detrimental for certain applications. For instance, defects can lead to an enhanced photoconduction, a complicated process involving charge generation and recombination in the time domain and carrier transport in the spatial domain. Here, we report the simultaneous spatial and temporal photoconductivity imaging in two types of WS2 monolayers by laser-illuminated microwave impedance microscopy. The diffusion length and carrier lifetime were directly extracted from the spatial profile and temporal relaxation of microwave signals respectively. Time-resolved experiments indicate that the critical process for photo-excited carriers is the escape of holes from trap states, which prolongs the apparent lifetime of mobile electrons in the conduction band. As a result, counterintuitively, the photoconductivity is stronger in CVD samples than exfoliated monolayers with a lower defect density. Our work reveals the intrinsic time and length scales of electrical response to photo-excitation in van der Waals materials, which is essential for their applications in novel optoelectronic devices. △ Less

Submitted 3 March, 2020; originally announced March 2020.

Comments: 21 pages, 4 figures

arXiv:1912.11101 [pdf]

doi 10.1126/sciadv.aba8866

Moiré Potential Impedes Interlayer Exciton Diffusion in van der Waals Heterostructures

Authors: Junho Choi, Wei-Ting Hsu, Li-Syuan Lu, Liuyang Sun, Hui-Yu Cheng, Ming-Hao Lee, Jiamin Quan, Kha Tran, Chun-Yuan Wang, Matthew Staab, Kayleigh Jones, Takashi Taniguchi, Kenji Watanabe, Ming-Wen Chu, Shangjr Gwo, Suenne Kim, Chih-Kang Shih, Xiaoqin Li, Wen-Hao Chang

Abstract: The properties of van der Waals (vdW) heterostructures are drastically altered by a tunable moiré superlattice arising from periodic variations of atomic alignment between the layers. Exciton diffusion represents an important channel of energy transport in semiconducting transition metal dichalcogenides (TMDs). While early studies performed on TMD heterobilayers have suggested that carriers and ex… ▽ More The properties of van der Waals (vdW) heterostructures are drastically altered by a tunable moiré superlattice arising from periodic variations of atomic alignment between the layers. Exciton diffusion represents an important channel of energy transport in semiconducting transition metal dichalcogenides (TMDs). While early studies performed on TMD heterobilayers have suggested that carriers and excitons exhibit long diffusion lengths, a rich variety of scenarios can exist. In a moiré crystal with a large supercell size and deep potential, interlayer excitons may be completely localized. As the moiré period reduces at a larger twist angle, excitons can tunnel between supercells and diffuse over a longer lifetime. The diffusion length should be the longest in commensurate heterostructures where the moiré superlattice is completely absent. In this study, we experimentally demonstrate that the moiré potential impedes interlayer exciton diffusion by comparing a number of WSe2/MoSe2 heterostructures prepared with chemical vapor deposition and mechanical stacking with accurately controlled twist angles. Our results provide critical guidance to developing 'twistronic' devices that explore the moiré superlattice to engineer material properties. △ Less

Submitted 23 December, 2019; originally announced December 2019.

Journal ref: Science Advances, 2020

arXiv:1912.02478 [pdf, ps, other]

Effective Data Augmentation Approaches to End-to-End Task-Oriented Dialogue

Authors: Jun Quan, Deyi Xiong

Abstract: The training of task-oriented dialogue systems is often confronted with the lack of annotated data. In contrast to previous work which augments training data through expensive crowd-sourcing efforts, we propose four different automatic approaches to data augmentation at both the word and sentence level for end-to-end task-oriented dialogue and conduct an empirical study on their impact. Experiment… ▽ More The training of task-oriented dialogue systems is often confronted with the lack of annotated data. In contrast to previous work which augments training data through expensive crowd-sourcing efforts, we propose four different automatic approaches to data augmentation at both the word and sentence level for end-to-end task-oriented dialogue and conduct an empirical study on their impact. Experimental results on the CamRest676 and KVRET datasets demonstrate that each of the four data augmentation approaches is able to obtain a significant improvement over a strong baseline in terms of Success F1 score and that the ensemble of the four approaches achieves the state-of-the-art results in the two datasets. In-depth analyses further confirm that our methods adequately increase the diversity of user utterances, which enables the end-to-end model to learn features robustly. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: accepted by IALP 2019

arXiv:1909.12086 [pdf, ps, other]

GECOR: An End-to-End Generative Ellipsis and Co-reference Resolution Model for Task-Oriented Dialogue

Authors: Jun Quan, Deyi Xiong, Bonnie Webber, Changjian Hu

Abstract: Ellipsis and co-reference are common and ubiquitous especially in multi-turn dialogues. In this paper, we treat the resolution of ellipsis and co-reference in dialogue as a problem of generating omitted or referred expressions from the dialogue context. We therefore propose a unified end-to-end Generative Ellipsis and CO-reference Resolution model (GECOR) in the context of dialogue. The model can… ▽ More Ellipsis and co-reference are common and ubiquitous especially in multi-turn dialogues. In this paper, we treat the resolution of ellipsis and co-reference in dialogue as a problem of generating omitted or referred expressions from the dialogue context. We therefore propose a unified end-to-end Generative Ellipsis and CO-reference Resolution model (GECOR) in the context of dialogue. The model can generate a new pragmatically complete user utterance by alternating the generation and copy mode for each user utterance. A multi-task learning framework is further proposed to integrate the GECOR into an end-to-end task-oriented dialogue. In order to train both the GECOR and the multi-task learning framework, we manually construct a new dataset on the basis of the public dataset CamRest676 with both ellipsis and co-reference annotation. On this dataset, intrinsic evaluations on the resolution of ellipsis and co-reference show that the GECOR model significantly outperforms the sequence-to-sequence (seq2seq) baseline model in terms of EM, BLEU and F1 while extrinsic evaluations on the downstream dialogue task demonstrate that our multi-task learning framework with GECOR achieves a higher success rate of task completion than TSCP, a state-of-the-art end-to-end task-oriented dialogue model. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: accepted to appear at EMNLP 2019

arXiv:1909.04142 [pdf, other]

DaTscan SPECT Image Classification for Parkinson's Disease

Authors: Justin Quan, Lin Xu, Rene Xu, Tyrael Tong, Jean Su

Abstract: Parkinson's Disease (PD) is a neurodegenerative disease that currently does not have a cure. In order to facilitate disease management and reduce the speed of symptom progression, early diagnosis is essential. The current clinical, diagnostic approach is to have radiologists perform human visual analysis of the degeneration of dopaminergic neurons in the substantia nigra region of the brain. Clini… ▽ More Parkinson's Disease (PD) is a neurodegenerative disease that currently does not have a cure. In order to facilitate disease management and reduce the speed of symptom progression, early diagnosis is essential. The current clinical, diagnostic approach is to have radiologists perform human visual analysis of the degeneration of dopaminergic neurons in the substantia nigra region of the brain. Clinically, dopamine levels are monitored through observing dopamine transporter (DaT) activity. One method of DaT activity analysis is performed with the injection of an Iodine-123 fluoropropyl (123I-FP-CIT) tracer combined with single photon emission computerized tomography (SPECT) imaging. The tracer illustrates the region of interest in the resulting DaTscan SPECT images. Human visual analysis is slow and vulnerable to subjectivity between radiologists, so the goal was to develop an introductory implementation of a deep convolutional neural network that can objectively and accurately classify DaTscan SPECT images as Parkinson's Disease or normal. This study illustrates the approach of using a deep convolutional neural network and evaluates its performance on DaTscan SPECT image classification. The data used in this study was obtained through a database provided by the Parkinson's Progression Markers Initiative (PPMI). The deep neural network in this study utilizes the InceptionV3 architecture, 1st runner up in the 2015 ImageNet Large Scale Visual Recognition Competition (ILSVRC), as a base model. A custom, binary classifier block was added on top of this base. In order to account for the small dataset size, a ten fold cross validation was implemented to evaluate the model's performance. △ Less

Submitted 9 September, 2019; originally announced September 2019.

arXiv:1907.03687 [pdf, other]

General non-linear Bellman equations

Authors: Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto

Abstract: We consider a general class of non-linear Bellman equations. These open up a design space of algorithms that have interesting properties, which has two potential advantages. First, we can perhaps better model natural phenomena. For instance, hyperbolic discounting has been proposed as a mathematical model that matches human and animal data well, and can therefore be used to explain preference orde… ▽ More We consider a general class of non-linear Bellman equations. These open up a design space of algorithms that have interesting properties, which has two potential advantages. First, we can perhaps better model natural phenomena. For instance, hyperbolic discounting has been proposed as a mathematical model that matches human and animal data well, and can therefore be used to explain preference orderings. We present a different mathematical model that matches the same data, but that makes very different predictions under other circumstances. Second, the larger design space can perhaps lead to algorithms that perform better, similar to how discount factors are often used in practice even when the true objective is undiscounted. We show that many of the resulting Bellman operators still converge to a fixed point, and therefore that the resulting algorithms are reasonable and inherit many beneficial properties of their linear counterparts. △ Less

Submitted 8 July, 2019; originally announced July 2019.

arXiv:1901.10964 [pdf, other]

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Authors: André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos

Abstract: The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SFs & GPI framework in two ways. One of the basic… ▽ More The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SFs & GPI framework in two ways. One of the basic assumptions underlying the original formulation of SFs & GPI is that rewards for all tasks of interest can be computed as linear combinations of a fixed set of features. We relax this constraint and show that the theoretical guarantees supporting the framework can be extended to any set of tasks that only differ in the reward function. Our second contribution is to show that one can use the reward functions themselves as features for future tasks, without any loss of expressiveness, thus removing the need to specify a set of features beforehand. This makes it possible to combine SFs & GPI with deep learning in a more stable way. We empirically verify this claim on a complex 3D environment where observations are images from a first-person perspective. We show that the transfer promoted by SFs & GPI leads to very good policies on unseen tasks almost instantaneously. We also describe how to learn policies specialised to the new tasks in a way that allows them to be added to the agent's set of skills, and thus be reused in the future. △ Less

Submitted 30 January, 2019; originally announced January 2019.

Comments: Published at ICML 2018

arXiv:1812.07626 [pdf, other]

Universal Successor Features Approximators

Authors: Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul

Abstract: The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpol… ▽ More The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpolation power of a function approximator that is given the task description as input; one of its most common form are universal value function approximators (UVFAs). Another way to generalise to new tasks is to exploit structure in the RL problem itself. Generalised policy improvement (GPI) combines solutions of previous tasks into a policy for the unseen task; this relies on instantaneous policy evaluation of old policies under the new reward function, which is made possible through successor features (SFs). Our proposed universal successor features approximators (USFAs) combine the advantages of all of these, namely the scalability of UVFAs, the instant inference of SFs, and the strong generalisation of GPI. We discuss the challenges involved in training a USFA, its generalisation properties and demonstrate its practical benefits and transfer abilities on a large-scale domain in which the agent has to navigate in a first-person perspective three-dimensional environment. △ Less

Submitted 18 December, 2018; originally announced December 2018.

arXiv:1805.11593 [pdf, other]

Observe and Look Further: Achieving Consistent Performance on Atari

Authors: Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Abstract: Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games. We identify three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and explori… ▽ More Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games. We identify three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and exploring efficiently. In this paper, we propose an algorithm that addresses each of these challenges and is able to learn human-level policies on nearly all Atari games. A new transformed Bellman operator allows our algorithm to process rewards of varying densities and scales; an auxiliary temporal consistency loss allows us to train stably using a discount factor of $γ= 0.999$ (instead of $γ= 0.99$) extending the effective planning horizon by an order of magnitude; and we ease the exploration problem by using human demonstrations that guide the agent towards rewarding states. When tested on a set of 42 Atari games, our algorithm exceeds the performance of an average human on 40 games using a common set of hyper parameters. Furthermore, it is the first deep RL algorithm to solve the first level of Montezuma's Revenge. △ Less

Submitted 29 May, 2018; originally announced May 2018.

arXiv:1803.00933 [pdf, other]

Distributed Prioritized Experience Replay

Authors: Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver

Abstract: We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shar… ▽ More We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shared experience replay memory; the learner replays samples of experience and updates the neural network. The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. Our architecture substantially improves the state of the art on the Arcade Learning Environment, achieving better final performance in a fraction of the wall-clock training time. △ Less

Submitted 2 March, 2018; originally announced March 2018.

Comments: Accepted to International Conference on Learning Representations 2018

arXiv:1802.08294 [pdf, other]

Unicorn: Continual Learning with a Universal, Off-policy Agent

Authors: Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul

Abstract: Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the f… ▽ More Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the frontiers that has resisted quick progress. To test continual learning capabilities we consider a challenging 3D domain with an implicit sequence of tasks and sparse rewards. We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup. △ Less

Submitted 3 July, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

arXiv:1711.10329 [pdf]

doi 10.1364/OE.26.002965

Superconducting nanowire single photon detection system for space applications

Authors: Lixing You, Jia Quan, yong Wang, Yuexue Ma, Xiaoyan Yang, Yanjie Liu, Hao Li, Jianguo Li, Juan Wang, Jingtao Liang, Zhen Wang, Xiaoming Xie

Abstract: Superconducting nanowire single photon detectors (SNSPDs) have advanced various frontier scientific and technological fields such as quantum key distribution and deep space communications. However, limited by available cooling technology, all past experimental demonstrations have had ground-based applications. In this work we demonstrate a SNSPD system using a hybrid cryocooler compatible with spa… ▽ More Superconducting nanowire single photon detectors (SNSPDs) have advanced various frontier scientific and technological fields such as quantum key distribution and deep space communications. However, limited by available cooling technology, all past experimental demonstrations have had ground-based applications. In this work we demonstrate a SNSPD system using a hybrid cryocooler compatible with space applications. With a minimum operational temperature of 2.8 K, this SNSPD system presents a maximum system detection efficiency of over 50% and a timing jitter of 48 ps, which paves the way for various space applications. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: 7 pages, 3 figures, 1 table, submitted to OE

Journal ref: Optics Express 26(3): 2965-2971. (2018)

arXiv:1708.04782 [pdf, other]

StarCraft II: A New Challenge for Reinforcement Learning

Authors: Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, Rodney Tsing

Abstract: This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially o… ▽ More This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures. △ Less

Submitted 16 August, 2017; originally announced August 2017.

Comments: Collaboration between DeepMind & Blizzard. 20 pages, 9 figures, 2 tables

arXiv:1707.04175 [pdf, other]

Distral: Robust Multitask Reinforcement Learning

Authors: Yee Whye Teh, Victor Bapst, Wojciech Marian Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, Razvan Pascanu

Abstract: Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from d… ▽ More Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning. △ Less

Submitted 13 July, 2017; originally announced July 2017.

arXiv:1704.03732 [pdf, ps, other]

Deep Q-learning from Demonstrations

Authors: Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

Abstract: Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world… ▽ More Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstrator's actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD DQN 83 million steps to catch up to DQfD's performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN. △ Less

Submitted 22 November, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

Comments: Published at AAAI 2018. Previously on arxiv as "Learning from Demonstrations for Real World Reinforcement Learning"

arXiv:1701.03726 [pdf, ps, other]

doi 10.1080/10652469.2022.2097671

Some Evaluations of Parametric Euler Type Sums of Harmonic Numbers

Authors: Junjie Quan, Ce Xu, Xixi Zhang

Abstract: We establish some identities of Euler related sums. By using these identities, we discuss the closed form representations of sums of harmonic numbers and reciprocal parametric binomial coefficients through parametric harmonic numbers, shifted harmonic numbers and Riemann zeta function with positive integer arguments. In particular we investigate products of quadratic and cubic harmonic numbers and… ▽ More We establish some identities of Euler related sums. By using these identities, we discuss the closed form representations of sums of harmonic numbers and reciprocal parametric binomial coefficients through parametric harmonic numbers, shifted harmonic numbers and Riemann zeta function with positive integer arguments. In particular we investigate products of quadratic and cubic harmonic numbers and reciprocal parametric binomial coefficients. Some illustrative special cases as well as immediate consequences of the main results are also considered. △ Less

Submitted 27 July, 2022; v1 submitted 3 January, 2017; originally announced January 2017.

arXiv:1612.00796 [pdf, other]

doi 10.1073/pnas.1611835114

Overcoming catastrophic forgetting in neural networks

Authors: James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell

Abstract: The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have… ▽ More The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially. △ Less

Submitted 25 January, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

arXiv:1603.09574 [pdf, ps, other]

Near-Optimal Hybrid Analog and Digital Precoding for Downlink mmWave Massive MIMO Systems

Authors: Linglong Dai, Xinyu Gao, Jinguo Quan, Shuangfeng Han, Chih-Lin I

Abstract: Millimeter wave (mmWave) massive MIMO can achieve orders of magnitude increase in spectral and energy efficiency, and it usually exploits the hybrid analog and digital precoding to overcome the serious signal attenuation induced by mmWave frequencies. However, most of hybrid precoding schemes focus on the full-array structure, which involves a high complexity. In this paper, we propose a near-opti… ▽ More Millimeter wave (mmWave) massive MIMO can achieve orders of magnitude increase in spectral and energy efficiency, and it usually exploits the hybrid analog and digital precoding to overcome the serious signal attenuation induced by mmWave frequencies. However, most of hybrid precoding schemes focus on the full-array structure, which involves a high complexity. In this paper, we propose a near-optimal iterative hybrid precoding scheme based on the more realistic subarray structure with low complexity. We first decompose the complicated capacity optimization problem into a series of ones easier to be handled by considering each antenna array one by one. Then we optimize the achievable capacity of each antenna array from the first one to the last one by utilizing the idea of successive interference cancelation (SIC), which is realized in an iterative procedure that is easy to be parallelized. It is shown that the proposed hybrid precoding scheme can achieve better performance than other recently proposed hybrid precoding schemes, while it also enjoys an acceptable computational complexity. △ Less

Submitted 16 July, 2015; originally announced March 2016.

Showing 1–50 of 53 results for author: Quan, J