Zum Hauptinhalt springen

Showing 1–50 of 101 results for author: Du, N

.
  1. arXiv:2408.15227  [pdf, other

    hep-ex

    Axion Dark Matter eXperiment around 3.3 μeV with Dine-Fischler-Srednicki-Zhitnitsky Discovery Ability

    Authors: C. Bartram, C. Boutan, T. Braine, J. H. Buckley, T. J. Caligiure, G. Carosi, A. S. Chou, C. Cisneros, John Clarke, E. J. Daw, N. Du, L. D. Duffy, T. A. Dyson, C. Gaikwad, J. R. Gleason, C. Goodman, M. Goryachev, M. Guzzetti, C. Hanretty, E. Hartman, A. T. Hipp, J. Hoffman, M. Hollister, R. Khatiwada, S. Knirck , et al. (24 additional authors not shown)

    Abstract: We report the results of a QCD axion dark matter search with discovery ability for Dine-Fischler-Srednicki-Zhitnitsky (DFSZ) axions using an axion haloscope. Sub-Kelvin noise temperatures are reached with an ultra low-noise Josephson parametric amplifier cooled by a dilution refrigerator. This work excludes (with a 90% confidence level) DFSZ axions with masses between 3.27 to 3.34 μeV, assuming a… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  2. arXiv:2408.05752  [pdf, other

    cs.CV

    RTF-Q: Unsupervised domain adaptation based retraining-free quantization network

    Authors: Nanyang Du, Chen Tang, Yuan Meng, Zhi Wang

    Abstract: Performing unsupervised domain adaptation on resource-constrained edge devices is a significant task. Although existing research allows edge devices to use subnets with different computational budgets for inference, they often require expensive pre-training and do not consider the issues of parameter precision redundancy in the model, which is not conducive to the deployment of the model on edge d… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  3. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.19371  [pdf, other

    cs.LG

    Deep State-Space Generative Model For Correlated Time-to-Event Predictions

    Authors: Yuan Xue, Denny Zhou, Nan Du, Andrew M. Dai, Zhen Xu, Kun Zhang, Claire Cui

    Abstract: Capturing the inter-dependencies among multiple types of clinically-critical events is critical not only to accurate future event prediction, but also to better treatment planning. In this work, we propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events (e.g., kidney failure, mortality) by explicitly modeling the temporal d… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  5. arXiv:2407.19359  [pdf, other

    cs.LG cs.AI

    Learning to Select the Best Forecasting Tasks for Clinical Outcome Prediction

    Authors: Yuan Xue, Nan Du, Anne Mottram, Martin Seneviratne, Andrew M. Dai

    Abstract: We propose to meta-learn an a self-supervised patient trajectory forecast learning rule by meta-training on a meta-objective that directly optimizes the utility of the patient representation over the subsequent clinical outcome prediction. This meta-objective directly targets the usefulness of a representation generated from unlabeled clinical measurement forecast for later supervised tasks. The… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: NeurIPS 2020

  6. arXiv:2406.01928  [pdf, other

    cs.RO

    History-Aware Planning for Risk-free Autonomous Navigation on Unknown Uneven Terrain

    Authors: Yinchuan Wang, Nianfei Du, Yongsen Qin, Xiang Zhang, Rui Song, Chaoqun Wang

    Abstract: It is challenging for the mobile robot to achieve autonomous and mapless navigation in the unknown environment with uneven terrain. In this study, we present a layered and systematic pipeline. At the local level, we maintain a tree structure that is dynamically extended with the navigation. This structure unifies the planning with the terrain identification. Besides, it contributes to explicitly i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  7. arXiv:2405.15277  [pdf

    cond-mat.mtrl-sci

    Inducing ferroelectricity in NH$_4$I and NH$_4$Br via partial replacement of protons by deuterons

    Authors: Miao Miao Zhao, Lei Meng, Yi Yang Xu, Na Du, Fei Yen

    Abstract: While all of the polymorphs of NH$_4$I and NH$_4$Br are non-polar, a reversible electric polarization is established in the ordered $γ$ phases of (NH$_4$)$_{0.73}$(ND$_4$)$_{0.27}$I and (NH$_4$)$_{0.84}$(ND$_4$)$_{0.16}$Br (where D is $^2$H) via $dc$ electric fields. The presence of two groups of orbital magnetic moments appears to be responsible for the asymmetric lattice distortions. Our finding… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 14 pages, 3 figures

    Journal ref: J. Phys. Chem. C 127, 20951-20955 (2023)

  8. arXiv:2405.15052  [pdf, other

    cs.LG cs.AI

    Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training

    Authors: Xianzhi Du, Tom Gunter, Xiang Kong, Mark Lee, Zirui Wang, Aonan Zhang, Nan Du, Ruoming Pang

    Abstract: Mixture-of-Experts (MoE) enjoys performance gain by increasing model capacity while keeping computation cost constant. When comparing MoE to dense models, prior work typically adopt the following setting: 1) use FLOPs or activated parameters as a measure of model complexity; 2) train all models to the same number of tokens. We argue that this setting favors MoE as FLOPs and activated parameters do… ▽ More

    Submitted 28 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 8 pages

  9. arXiv:2405.13640  [pdf, other

    cs.CL cs.AI cs.LG

    Knowledge Graph Reasoning with Self-supervised Reinforcement Learning

    Authors: Ying Ma, Owen Burns, Mingqiu Wang, Gang Li, Nan Du, Laurent El Shafey, Liqiang Wang, Izhak Shafran, Hagen Soltau

    Abstract: Reinforcement learning (RL) is an effective method of finding reasoning pathways in incomplete knowledge graphs (KGs). To overcome the challenges of a large action space, a self-supervised pre-training method is proposed to warm up the policy network before the RL training stage. To alleviate the distributional mismatch issue in general self-supervised RL (SSRL), in our supervised learning (SL) st… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 17 pages, 11 figures

  10. arXiv:2404.10642  [pdf, other

    cs.CL cs.LG

    Self-playing Adversarial Language Game Enhances LLM Reasoning

    Authors: Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du

    Abstract: We explore the self-play training procedure of large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's u… ▽ More

    Submitted 23 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Preprint

  11. arXiv:2404.02012  [pdf

    cond-mat.mtrl-sci physics.chem-ph

    Determining the chemical composition of diamagnetic mixed solids via measurements of the magnetic susceptibility

    Authors: Miao Miao Zhao, Yang Yang, Na Du, Yu Ying Zhu, Peng Ren, Fei Yen

    Abstract: Mixed solid compounds are employed in a vast array of applications so an accurate determination of their chemical compositions is of crucial importance. All current characterization methods require specially-treated samples so the availability of a more practical method with similar accuracy should alleviate the quantification process. In this work, we show how the doping concentration $δ$ (or iso… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Main article: 19 pages, 12 Figures; Supplementary Information: 7 pages, 9 Tables and 4 Figures

  12. arXiv:2403.15468  [pdf, other

    eess.SP

    Human Detection in Realistic Through-the-Wall Environments using Raw Radar ADC Data and Parametric Neural Networks

    Authors: Wei Wang, Naike Du, Yuchao Guo, Chao Sun, Jingyang Liu, Rencheng Song, Xiuzhu Ye

    Abstract: The radar signal processing algorithm is one of the core components in through-wall radar human detection technology. Traditional algorithms (e.g., DFT and matched filtering) struggle to adaptively handle low signal-to-noise ratio echo signals in challenging and dynamic real-world through-wall application environments, which becomes a major bottleneck in the system. In this paper, we introduce an… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 11pages,13figures

  13. arXiv:2403.09611  [pdf, other

    cs.CV cs.CL cs.LG

    MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

    Authors: Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman , et al. (7 additional authors not shown)

    Abstract: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  14. arXiv:2402.16696  [pdf, other

    cs.CL

    Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models

    Authors: Anchun Gui, Jian Li, Yong Dai, Nan Du, Han Xiao

    Abstract: Tool-augmented large language models (LLMs) are attracting widespread attention when accessing up-to-date knowledge and alleviating hallucination issues. Nowadays, advanced closed-source LLMs (e.g., ChatGPT) have demonstrated surprising tool-usage capabilities through prompting and in-context learning techniques. To empower the capabilities of open-source LLMs (e.g., LLaMA) in manipulating tools,… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 20 pages, 18 figures

  15. arXiv:2402.15572  [pdf, other

    cs.AI cs.CV cs.RO

    Improving Explainable Object-induced Model through Uncertainty for Automated Vehicles

    Authors: Shihong Ling, Yue Wan, Xiaowei Jia, Na Du

    Abstract: The rapid evolution of automated vehicles (AVs) has the potential to provide safer, more efficient, and comfortable travel options. However, these systems face challenges regarding reliability in complex driving scenarios. Recent explainable AV architectures neglect crucial information related to inherent uncertainties while providing explanations for actions. To overcome such challenges, our stud… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: In Proceedings of the 2024 ACM / IEEE International Conference on Human-Robot Interaction (HRI '24), March 11--14, 2024, Boulder, CO, USA. ACM, New York, NY, USA, 9 pages

  16. arXiv:2402.02101  [pdf, other

    cs.CL cs.AI

    Are Large Language Models Good Prompt Optimizers?

    Authors: Ruotian Ma, Xiaolei Wang, Xin Zhou, Jian Li, Nan Du, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: LLM-based Automatic Prompt Optimization, which typically utilizes LLMs as Prompt Optimizers to self-reflect and refine prompts, has shown promising performance in recent studies. Despite the success, the underlying mechanism of this approach remains unexplored, and the true effectiveness of LLMs as Prompt Optimizers requires further validation. In this work, we conducted a comprehensive study to u… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  17. arXiv:2312.17504  [pdf

    physics.app-ph

    Improving the Imaging Performance of Microwave Imaging Systems by Exploiting Virtual Antennas

    Authors: Xinhui Zhang, Naike Du, Jing Wang, Andrea Massa, Xiuzhu Ye

    Abstract: Starting from the observation that the correlation coefficient defined by the scattered field data tested by two adjacent antennas decreases with the noise, it turns out that the imaging performance can be improved by adding non-redundant scattered field information through more measuring antennas.However, adding more measuring antennas faces practical challenges such as the limited antenna space,… ▽ More

    Submitted 5 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: The paper have been submitted to T-MTT(IEEE Transactions on Microwave Theory and Techniques)on January 5, 2024

  18. arXiv:2312.16668  [pdf, other

    hep-ex astro-ph.CO physics.ins-det

    Axion Dark Matter eXperiment: Run 1A Analysis Details

    Authors: C. Boutan, B. H. LaRoque, E. Lentz, N. S. Oblath, M. S. Taubman, J. Tedeschi, J. Yang, A. M. Jones, T. Braine, N. Crisosto, L. J Rosenberg, G. Rybka, D. Will, D. Zhang, S. Kimes, R. Ottens, C. Bartram, D. Bowring, R. Cervantes, A. S. Chou, S. Knirck, D. V. Mitchell, A. Sonnenschein, W. Wester, R. Khatiwada , et al. (28 additional authors not shown)

    Abstract: The ADMX collaboration gathered data for its Run 1A axion dark matter search from January to June 2017, scanning with an axion haloscope over the frequency range 645-680 MHz (2.66-2.81 ueV in axion mass) at DFSZ sensitivity. The resulting axion search found no axion-like signals comprising all the dark matter in the form of a virialized galactic halo over the entire frequency range, implying lower… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: 27 pages, 19 figures, accepted for publication in PRD

  19. arXiv:2312.07401  [pdf, other

    cs.AI

    On Diversified Preferences of Large Language Model Alignment

    Authors: Dun Zeng, Yong Dai, Pengyu Cheng, Longyue Wang, Tianhao Hu, Wanshun Chen, Nan Du, Zenglin Xu

    Abstract: Aligning large language models (LLMs) with human preferences has been recognized as the key to improving LLMs' interaction quality. However, in this pluralistic world, human preferences can be diversified due to annotators' different tastes, which hinders the effectiveness of LLM alignment methods. This paper presents the first quantitative analysis of commonly used human feedback datasets to inve… ▽ More

    Submitted 17 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: preprint

  20. arXiv:2312.06302  [pdf

    physics.app-ph

    Non-iterative Methods in Inhomogeneous Background Inverse Scattering Imaging Problem Assisted by Swin Transformer Network

    Authors: Naike Du, Tiantian Yin, Jing Wang, Rencheng Song, Kuiwen Xu, Bingyuan Liang, Sheng Sun, Xiuzhu Ye

    Abstract: A deep learning-assisted inversion method is proposed to solve the inhomogeneous background imaging problem. Three non-iterative methods, namely the distorted-Born (DB) major current coefficients method, the DB modified Born approximation method, and the DB connection method, are introduced to address the inhomogeneous background inverse scattering problem. These methods retain the multiple scatte… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: We have submitted this paper to TGRS(IEEE Transactionson Geoscience andRemote Sensing) on 29-Jan-2023; and resubmitted on 12-Jul-2023

  21. arXiv:2312.01170  [pdf, other

    cs.CR

    Power-balanced Memristive Cryptographic Implementation Against Side Channel Attacks

    Authors: Ziang Chen, Li-Wei Chen, Xianyue Zhao, Kefeng Li, Heidemarie Schmidt, Ilia Polian, Nan Du

    Abstract: Memristors, as emerging nano-devices, offer promising performance and exhibit rich electrical dynamic behavior. Having already found success in applications such as neuromorphic and in-memory computing, researchers are now exploring their potential for cryptographic implementations. In this study, we present a novel power-balanced hiding strategy utilizing memristor groups to conceal power consump… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  22. arXiv:2311.15436  [pdf, other

    cs.CL

    Learning to Skip for Language Modeling

    Authors: Dewen Zeng, Nan Du, Tao Wang, Yuanzhong Xu, Tao Lei, Zhifeng Chen, Claire Cui

    Abstract: Overparameterized large-scale language models have impressive generalization performance of in-context few-shot learning. However, most language models allocate the same amount of parameters or computation to each token, disregarding the complexity or importance of the input data. We argue that in language model pretraining, a variable amount of computation should be assigned to different tokens,… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  23. arXiv:2311.08045  [pdf, other

    cs.CL cs.AI cs.LG

    Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game

    Authors: Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Tianhao Hu, Peixin Cao, Nan Du, Xiaolong Li

    Abstract: Human preference alignment is essential to improve the interaction quality of large language models (LLMs). Existing alignment methods depend on manually annotated preference data to guide the LLM optimization directions. However, continuously updating LLMs for alignment raises a distribution gap between model-generated samples and human-annotated responses, hindering training effectiveness. To mi… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by ACL2024 findings

  24. arXiv:2311.07748  [pdf, other

    astro-ph.CO

    Non-Virialized Axion Search Sensitive to Doppler Effects in the Milky Way Halo

    Authors: C. Bartram, T. Braine, R. Cervantes, N. Crisosto, N. Du, C. Goodman, M. Guzzetti, C. Hanretty, S. Lee, G. Leum, L. J. Rosenberg, G. Rybka, J. Sinnis, D. Zhang, M. H. Awida, D. Bowring, A. S. Chou, M. Hollister, S. Knirck, A. Sonnenschein, W. Wester, R. Khatiwada, J. Brodsky, G. Carosi, L. D. Duffy , et al. (31 additional authors not shown)

    Abstract: The Axion Dark Matter eXperiment (ADMX) has previously excluded Dine-Fischler-Srednicki-Zhitnisky (DFSZ) axions between 680-790 MHz under the assumption that the dark matter is described by the isothermal halo model. However, the precise nature of the velocity distribution of dark matter is still unknown, and alternative models have been proposed. We report the results of a non-virialized axion se… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  25. TDPP: Two-Dimensional Permutation-Based Protection of Memristive Deep Neural Networks

    Authors: Minhui Zou, Zhenhua Zhu, Tzofnat Greenberg-Toledo, Orian Leitersdorf, Jiang Li, Junlong Zhou, Yu Wang, Nan Du, Shahar Kvatinsky

    Abstract: The execution of deep neural network (DNN) algorithms suffers from significant bottlenecks due to the separation of the processing and memory units in traditional computer systems. Emerging memristive computing systems introduce an in situ approach that overcomes this bottleneck. The non-volatility of memristive devices, however, may expose the DNN weights stored in memristive crossbars to potenti… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 14 pages, 11 figures

  26. arXiv:2309.03126  [pdf, other

    cs.CL

    Everyone Deserves A Reward: Learning Customized Human Preferences

    Authors: Pengyu Cheng, Jiawen Xie, Ke Bai, Yong Dai, Nan Du

    Abstract: Reward models (RMs) are essential for aligning large language models (LLMs) with human preferences to improve interaction quality. However, the real world is pluralistic, which leads to diversified human preferences with respect to different religions, politics, cultures, etc. Moreover, each individual can have their unique preferences on various topics. Neglecting the diversity of human preferenc… ▽ More

    Submitted 15 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  27. arXiv:2308.13191  [pdf, other

    cs.CL cs.AI

    Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

    Authors: Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du

    Abstract: Although dominant in natural language processing, transformer-based models remain challenged by the task of long-sequence processing, because the computational cost of self-attention operations in transformers swells quadratically with the input sequence length. To alleviate the complexity of long-sequence processing, we propose a simple framework to enable the offthe-shelf pre-trained transformer… ▽ More

    Submitted 5 July, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: ACL 2024

  28. arXiv:2306.00008  [pdf, other

    cs.LG cs.CL

    Brainformers: Trading Simplicity for Efficiency

    Authors: Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laudon, Jeff Dean

    Abstract: Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this in… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 May, 2023; originally announced June 2023.

  29. arXiv:2305.14705  [pdf, other

    cs.CL

    Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

    Authors: Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou

    Abstract: Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions. We advocate combining these two approaches, as we find that MoE models benefit more from instruction tuning than dense models. In particular, we… ▽ More

    Submitted 5 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Preprint

  30. arXiv:2305.12281  [pdf, other

    cs.CL cs.LG

    Lifelong Language Pretraining with Distribution-Specialized Experts

    Authors: Wuyang Chen, Yanqi Zhou, Nan Du, Yanping Huang, James Laudon, Zhifeng Chen, Claire Cu

    Abstract: Pretraining on a large-scale corpus has become a standard method to build general language models (LMs). Adapting a model to new data distributions targeting different downstream tasks poses significant challenges. Naive fine-tuning may incur catastrophic forgetting when the over-parameterized LMs overfit the new data but fail to preserve the pretrained features. Lifelong learning (LLL) aims to en… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  31. arXiv:2305.10429  [pdf, other

    cs.CL cs.LG

    DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

    Authors: Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

    Abstract: The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of do… ▽ More

    Submitted 20 November, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  32. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  33. arXiv:2304.04947  [pdf, other

    cs.CL

    Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

    Authors: Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang

    Abstract: We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-w… ▽ More

    Submitted 26 November, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: NeurIPS camera ready version

  34. arXiv:2303.07116  [pdf, ps, other

    hep-ph hep-ex physics.ins-det

    Low Frequency (100-600 MHz) Searches with Axion Cavity Haloscopes

    Authors: S. Chakrabarty, J. R. Gleason, Y. Han, A. T. Hipp, M. Solano, P. Sikivie, N. S. Sullivan, D. B. Tanner, M. Goryachev, E. Hartman, B. T. McAllister, A. Quiskamp, C. Thomson, M. E. Tobar, M. H. Awida, A. S. Chou, M. Hollister, S. Knirck, A. Sonnenschein, W. Wester, T. Braine, M. Guzzetti, C. Hanretty, G. Leum, L. J Rosenberg , et al. (22 additional authors not shown)

    Abstract: We investigate reentrant and dielectric loaded cavities for the purpose of extending the range of axion cavity haloscopes to lower masses, below the range where the Axion Dark Matter eXperiment (ADMX) has already searched. Reentrant and dielectric loaded cavities were simulated numerically to calculate and optimize their form factors and quality factors. A prototype reentrant cavity was built and… ▽ More

    Submitted 28 March, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: 33 pages, 24 figures

  35. Search for a dark-matter induced Cosmic Axion Background with ADMX

    Authors: ADMX Collaboration, T. Nitta, T. Braine, N. Du, M. Guzzetti, C. Hanretty, G. Leum, L. J Rosenberg, G. Rybka, J. Sinnis, John Clarke, I. Siddiqi, M. H. Awida, A. S. Chou, M. Hollister, S. Knirck, A. Sonnenschein, W. Wester, J. R. Gleason, A. T. Hipp, P. Sikivie, N. S. Sullivan, D. B. Tanner, R. Khatiwada, G. Carosi , et al. (23 additional authors not shown)

    Abstract: We report the first result of a direct search for a Cosmic ${\it axion}$ Background (C$a$B) - a relativistic background of axions that is not dark matter - performed with the axion haloscope, the Axion Dark Matter eXperiment (ADMX). Conventional haloscope analyses search for a signal with a narrow bandwidth, as predicted for dark matter, whereas the C$a$B will be broad. We introduce a novel analys… ▽ More

    Submitted 3 October, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: 9 pages, 4 figures

    Journal ref: Phys. Rev. Lett., 131, 101002 (2023)

  36. arXiv:2302.08917  [pdf, other

    cs.CL cs.LG

    Massively Multilingual Shallow Fusion with Large Language Models

    Authors: Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai, Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman

    Abstract: While large language models (LLM) have made impressive progress in natural language processing, it remains unclear how to utilize them in improving automatic speech recognition (ASR). In this work, we propose to train a single multilingual language model (LM) for shallow fusion in multiple languages. We push the limits of the multilingual LM to cover up to 84 languages by scaling up using a mixtur… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted to IEEE ICASSP 2023

  37. arXiv:2301.09297  [pdf, other

    q-fin.MF

    Model Based Reinforcement Learning with Non-Gaussian Environment Dynamics and its Application to Portfolio Optimization

    Authors: Huifang Huang, Ting Gao, Pengbo Li, Jin Guo, Peng Zhang, Nan Du

    Abstract: With the fast development of quantitative portfolio optimization in financial engineering, lots of AI-based algorithmic trading strategies have demonstrated promising results, among which reinforcement learning begins to manifest competitive advantages. However, the environment from real financial markets is complex and hard to be fully simulated, considering the observation of abrupt transitions,… ▽ More

    Submitted 9 March, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: text overlap with arXiv:2205.15056

  38. arXiv:2212.09347  [pdf, other

    cs.CR cs.ET

    Review of security techniques for memristor computing systems

    Authors: Minhui Zou, Nan Du, Shahar Kvatinsky

    Abstract: Neural network (NN) algorithms have become the dominant tool in visual object recognition, natural language processing, and robotics. To enhance the computational efficiency of these algorithms, in comparison to the traditional von Neuman computing architectures, researchers have been focusing on memristor computing systems. A major drawback when using memristor computing systems today is that, in… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: 15 pages, 5 figures

    Journal ref: Front. Electron. Mater, 19 December 2022, Sec. Semiconducting Materials and Devices Sec. Semiconducting Materials and Devices

  39. arXiv:2210.03629  [pdf, other

    cs.CL cs.AI cs.LG

    ReAct: Synergizing Reasoning and Acting in Language Models

    Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao

    Abstract: While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific acti… ▽ More

    Submitted 9 March, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: v3 is the ICLR camera ready version with some typos fixed. Project site with code: https://react-lm.github.io

  40. arXiv:2210.03465  [pdf, other

    cs.ET cond-mat.mes-hall cs.CR physics.comp-ph

    Physics inspired compact modelling of BiFeO$_3$ based memristors for hardware security applications

    Authors: Sahitya Yarragolla, Nan Du, Torben Hemke, Xianyue Zhao, Ziang Chen, Ilia Polian, Thomas Mussenbrock

    Abstract: With the advent of the Internet of Things, nanoelectronic devices or memristors have been the subject of significant interest for use as new hardware security primitives. Among the several available memristors, BiFe$\rm O_{3}$ (BFO)-based electroforming-free memristors have attracted considerable attention due to their excellent properties, such as long retention time, self-rectification, intrinsi… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: 13 pages and 8 figures

  41. arXiv:2208.11799  [pdf, other

    physics.acc-ph

    Multi-mode Analysis of Surface Losses in a Superconducting Microwave Resonator in High Magnetic Fields

    Authors: T. Braine, G. Rybka, A. A. Baker, J. Brodsky, G. Carosi, N. Du, N. Woollett, S. Knirck, M. Jones

    Abstract: This paper reports on a surface impedance measurement of a niobium titanium superconducting radio frequency (SRF) cavity in a magnetic field (up to $10\,{\rm T}$). A novel method is employed to decompose the surface resistance contributions of the cylindrical cavity end caps and walls using measurements from multiple $TM$ cavity modes. The results confirm that quality factor degradation of a NbTi… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

  42. arXiv:2205.01930  [pdf, other

    eess.SY

    Explainable Anomaly Detection for Industrial Control System Cybersecurity

    Authors: Do Thu Ha, Nguyen Xuan Hoang, Nguyen Viet Hoang, Nguyen Huu Du, Truong Thu Huong, Kim Phuc Tran

    Abstract: Industrial Control Systems (ICSs) are becoming more and more important in managing the operation of many important systems in smart manufacturing, such as power stations, water supply systems, and manufacturing sites. While massive digital data can be a driving force for system performance, data security has raised serious concerns. Anomaly detection, therefore, is essential for preventing network… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Copyright ©~ 2022, IFAC (International Federation of Automatic Control)

  43. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  44. arXiv:2203.14923  [pdf, other

    hep-ex astro-ph.CO hep-ph physics.ins-det

    Axion Dark Matter

    Authors: C. B. Adams, N. Aggarwal, A. Agrawal, R. Balafendiev, C. Bartram, M. Baryakhtar, H. Bekker, P. Belov, K. K. Berggren, A. Berlin, C. Boutan, D. Bowring, D. Budker, A. Caldwell, P. Carenza, G. Carosi, R. Cervantes, S. S. Chakrabarty, S. Chaudhuri, T. Y. Chen, S. Cheong, A. Chou, R. T. Co, J. Conrad, D. Croon , et al. (130 additional authors not shown)

    Abstract: Axions are well-motivated dark matter candidates with simple cosmological production mechanisms. They were originally introduced to solve the strong CP problem, but also arise in a wide range of extensions to the Standard Model. This Snowmass white paper summarizes axion phenomenology and outlines next-generation laboratory experiments proposed to detect axion dark matter. There are vibrant synerg… ▽ More

    Submitted 29 March, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: restore and expand author list

  45. arXiv:2203.14915  [pdf, other

    hep-ex astro-ph.CO hep-ph physics.ins-det quant-ph

    New Horizons: Scalar and Vector Ultralight Dark Matter

    Authors: D. Antypas, A. Banerjee, C. Bartram, M. Baryakhtar, J. Betz, J. J. Bollinger, C. Boutan, D. Bowring, D. Budker, D. Carney, G. Carosi, S. Chaudhuri, S. Cheong, A. Chou, M. D. Chowdhury, R. T. Co, J. R. Crespo López-Urrutia, M. Demarteau, N. DePorzio, A. V. Derbin, T. Deshpande, M. D. Chowdhury, L. Di Luzio, A. Diaz-Morcillo, J. M. Doyle , et al. (104 additional authors not shown)

    Abstract: The last decade has seen unprecedented effort in dark matter model building at all mass scales coupled with the design of numerous new detection strategies. Transformative advances in quantum technologies have led to a plethora of new high-precision quantum sensors and dark matter detection strategies for ultralight ($<10\,$eV) bosonic dark matter that can be described by an oscillating classical,… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Snowmass 2021 White Paper

  46. arXiv:2202.09544  [pdf, other

    physics.geo-ph

    Multi-task unscented Kalman inversion (MUKI): a derivative-free joint inversion framework and its application to joint inversion of geophysical data

    Authors: Longlong Wang, Yun Chen, Youshan Liu, Nanqiao Du, Wei Li, Junliu Suwen

    Abstract: In the geophysical joint inversion, the gradient and Bayesian Markov Chain Monte Carlo (MCMC) sampling-based methods are widely used owing to their fast convergences or global optimality. However, these methods either require the computation of gradients and easily fall into local optimal solutions, or cost much time to carry out the millions of forward calculations in a huge sampling space. Diffe… ▽ More

    Submitted 3 August, 2022; v1 submitted 19 February, 2022; originally announced February 2022.

    Comments: 13 pages, 4 figures

  47. arXiv:2202.09368  [pdf, other

    cs.LG cs.AI

    Mixture-of-Experts with Expert Choice Routing

    Authors: Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew Dai, Zhifeng Chen, Quoc Le, James Laudon

    Abstract: Sparsely-activated Mixture-of-experts (MoE) models allow the number of parameters to greatly increase while keeping the amount of computation for a given token or a given sample unchanged. However, a poor expert routing strategy (e.g. one resulting in load imbalance) can cause certain experts to be under-trained, leading to an expert being under or over-specialized. Prior work allocates a fixed nu… ▽ More

    Submitted 13 October, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

  48. arXiv:2202.08906  [pdf, other

    cs.CL cs.LG

    ST-MoE: Designing Stable and Transferable Sparse Expert Models

    Authors: Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus

    Abstract: Scale has opened new frontiers in natural language processing -- but at a high cost. In response, Mixture-of-Experts (MoE) and Switch Transformers have been proposed as an energy efficient path to even larger and more capable language models. But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine… ▽ More

    Submitted 29 April, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: 25 pages main text, 39 pages overall

  49. Stochastic epidemic SIR models with hidden states

    Authors: Nguyen Du, Alexandru Hening, Nhu Nguyen, George Yin

    Abstract: This paper focuses on and analyzes realistic SIR models that take stochasticity into account. The proposed systems are applicable to most incidence rates that are used in the literature including the bilinear incidence rate, the Beddington-DeAngelis incidence rate, and a Holling type II functional response. Given that many diseases can lead to asymptomatic infections, we look at a system of stocha… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: 27 pages, 3 figures

    MSC Class: 34C12; 60H10; 92D25

    Journal ref: Nonlinear Analysis: Hybrid Systems Volume 49, August 2023, 101368

  50. arXiv:2112.06905  [pdf, other

    cs.CL

    GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

    Authors: Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu , et al. (2 additional authors not shown)

    Abstract: Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named GL… ▽ More

    Submitted 1 August, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted to ICML 2022