Skip to main content

Showing 101–150 of 3,111 results for author: Huang, S

.
  1. arXiv:2405.13152  [pdf, other

    cs.CV cs.AI

    Enhancing Interaction Modeling with Agent Selection and Physical Methods for Trajectory Prediction

    Authors: Shiji Huang, Lei Ye, Min Chen, Wenhai Luo, Chenqi Xu, Deyuan Liang, Dihong Wang

    Abstract: In this study, we address the limitations inherent in most existing vehicle trajectory prediction methodologies that indiscriminately incorporate all agents within a predetermined proximity when accounting for inter-agent interactions. These approaches commonly employ attention-based architecture or graph neural networks for encoding interactions, which introduces three challenges: (i) The indiscr… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: code:https://github.com/kkk00714/ASPILin

  2. arXiv:2405.12809  [pdf, other

    hep-ex

    Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (604 additional authors not shown)

    Abstract: Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$. The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: to be submitted to PRD

  3. arXiv:2405.12784  [pdf, other

    cs.CV

    Generalize Polyp Segmentation via Inpainting across Diverse Backgrounds and Pseudo-Mask Refinement

    Authors: Jiajian Ma, Fangqi Lu, Silin Huang, Song Wu, Zhen Li

    Abstract: Inpainting lesions within different normal backgrounds is a potential method of addressing the generalization problem, which is crucial for polyp segmentation models. However, seamlessly introducing polyps into complex endoscopic environments while simultaneously generating accurate pseudo-masks remains a challenge for current inpainting methods. To address these issues, we first leverage the pre-… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  4. arXiv:2405.12706  [pdf, other

    cs.IR

    Disentangled Representation with Cross Experts Covariance Loss for Multi-Domain Recommendation

    Authors: Zhutian Lin, Junwei Pan, Haibin Yu, Xi Xiao, Ximei Wang, Zhixiang Feng, Shifeng Wen, Shudong Huang, Lei Xiao, Jie Jiang

    Abstract: Multi-domain learning (MDL) has emerged as a prominent research area aimed at enhancing the quality of personalized services. The key challenge in MDL lies in striking a balance between learning commonalities across domains while preserving the distinct characteristics of each domain. However, this gives rise to a challenging dilemma. On one hand, a model needs to leverage domain-specific modules,… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  5. arXiv:2405.12667  [pdf, other

    eess.SY

    Spatial Mode Multiplexing for Fiber-Coupled IM/DD Optical Wireless Links with Misalignment

    Authors: Jinzhe Che, Shenjie Huang, Majid Safari

    Abstract: Optical wireless communication (OWC) emerges as a pivotal solution for achieving terabit-level aggregate throughput in next-generation wireless networks. With the mature high-speed transceivers and advanced (de)multiplexing techniques designed for fiber optics, fiber-coupled OWC can be seamlessly integrated into existing ultra-high-speed networks such as data centres. In particular, OWC leveraging… ▽ More

    Submitted 25 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: 13 pages, 15 figures

  6. arXiv:2405.12130  [pdf, other

    cs.CL cs.LG

    MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

    Authors: Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang

    Abstract: Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Work in Progress

  7. arXiv:2405.11585  [pdf, other

    hep-ex

    Improved measurement of the branching fraction of $h_{c}\rightarrowγη^\prime/η$ and search for $h_{c}\rightarrowγπ^0$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (645 additional authors not shown)

    Abstract: The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  8. arXiv:2405.11442  [pdf, other

    cs.CV

    Unifying 3D Vision-Language Understanding via Promptable Queries

    Authors: Ziyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, Qing Li

    Abstract: A unified model for 3D vision-language (3D-VL) understanding is expected to take various scene representations and perform a wide range of tasks in a 3D scene. However, a considerable gap exists between existing methods and such a unified model, due to the independent application of representation and insufficient exploration of 3D multi-task training. In this paper, we introduce PQ3D, a unified m… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Project page: https://pq3d.github.io

  9. arXiv:2405.11338  [pdf

    cs.CV cs.AI

    EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

    Authors: Danli Shi, Weiyi Zhang, Xiaolan Chen, Yexin Liu, Jiancheng Yang, Siyu Huang, Yih Chung Tham, Yingfeng Zheng, Mingguang He

    Abstract: Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separa… ▽ More

    Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 21 pages, 2 figures, 4 tables

  10. arXiv:2405.11286  [pdf, other

    cs.CV

    Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion

    Authors: Zeyu Zhang, Yiran Wang, Biao Wu, Shuo Chen, Zhiyuan Zhang, Shiya Huang, Wenbo Zhang, Meng Fang, Ling Chen, Yang Zhao

    Abstract: In recent years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving to be a persistent challenge. A… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  11. arXiv:2405.11106  [pdf, other

    cs.MA cs.AI cs.CL cs.LG cs.RO

    LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

    Authors: Chuanneng Sun, Songjun Huang, Dario Pompili

    Abstract: In recent years, Large Language Models (LLMs) have shown great abilities in various tasks, including question answering, arithmetic problem solving, and poem writing, among others. Although research on LLM-as-an-agent has shown that LLM can be applied to Reinforcement Learning (RL) and achieve decent results, the extension of LLM-based RL to Multi-Agent System (MAS) is not trivial, as many aspects… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 8 pages, 1 figure, 1 table, submitted to IEEE RA-L

  12. arXiv:2405.10895  [pdf, other

    astro-ph.HE astro-ph.GA

    The unluckiest star: A spectroscopically confirmed repeated partial tidal disruption event AT 2022dbl

    Authors: Zheyu Lin, Ning Jiang, Tinggui Wang, Xu Kong, Dongyue Li, Han He, Yibo Wang, Jiazheng Zhu, Wentao Li, Ji-an Jiang, Avinash Singh, Rishabh Singh Teja, D. K. Sahu, Chichuan Jin, Keiichi Maeda, Shifeng Huang

    Abstract: The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from m… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 15 pages, 8 figures, submitted to ApJ Letters on 2024 Apr 27

  13. arXiv:2405.10879  [pdf, other

    cs.CV

    One registration is worth two segmentations

    Authors: Shiqi Huang, Tingfa Xu, Ziyi Shen, Shaheer Ullah Saeed, Wen Yan, Dean Barratt, Yipeng Hu

    Abstract: The goal of image registration is to establish spatial correspondence between two or more images, traditionally through dense displacement fields (DDFs) or parametric transformations (e.g., rigid, affine, and splines). Rethinking the existing paradigms of achieving alignment via spatial transformations, we uncover an alternative but more intuitive correspondence representation: a set of correspond… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted by MICCAI2024

  14. Occupancy-SLAM: Simultaneously Optimizing Robot Poses and Continuous Occupancy Map

    Authors: Liang Zhao, Yingyu Wang, Shoudong Huang

    Abstract: In this paper, we propose an optimization based SLAM approach to simultaneously optimize the robot trajectory and the occupancy map using 2D laser scans (and odometry) information. The key novelty is that the robot poses and the occupancy map are optimized together, which is significantly different from existing occupancy mapping strategies where the robot poses need to be obtained first before th… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: This paper has been accpeted by Robotics: Science and Systems 2022

    Journal ref: Robotics: Science and Systems 2022

  15. arXiv:2405.10632  [pdf, other

    cs.CY cs.AI cs.HC

    Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

    Authors: Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung

    Abstract: Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of hum… ▽ More

    Submitted 12 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: revised figure

  16. arXiv:2405.10098  [pdf, other

    cs.NE

    When Large Language Model Meets Optimization

    Authors: Sen Huang, Kaixiang Yang, Sheng Qi, Rui Wang

    Abstract: Optimization algorithms and large language models (LLMs) enhance decision-making in dynamic environments by integrating artificial intelligence with traditional techniques. LLMs, with extensive domain knowledge, facilitate intelligent modeling and strategic decision-making in optimization, while optimization algorithms refine LLM architectures and output quality. This synergy offers novel approach… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  17. arXiv:2405.09839  [pdf, other

    cs.LG

    Advances in Robust Federated Learning: Heterogeneity Considerations

    Authors: Chuan Chen, Tianchi Liao, Xiaojun Deng, Zihou Wu, Sheng Huang, Zibin Zheng

    Abstract: In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  18. arXiv:2405.09611  [pdf, other

    cond-mat.str-el hep-th math-ph

    Fermionic quantum criticality through the lens of topological holography

    Authors: Sheng-Jie Huang

    Abstract: We utilize the topological holographic framework to characterize and gain insights into the nature of quantum critical points and gapless phases in fermionic quantum systems. Topological holography is a general framework that describes the generalized global symmetry and the symmetry charges of a local quantum system in terms of a slab of a topological order, termed as the symmetry topological fie… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: 33 pages, 7 figures, 6 tables; v2: minor changes, references added

  19. arXiv:2405.09066  [pdf, other

    hep-ex

    Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, V. Batozskaya, D. Becker, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko , et al. (559 additional authors not shown)

    Abstract: We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 14 pages, 7 figures

  20. arXiv:2405.08749  [pdf, other

    nucl-th hep-ph nucl-ex

    Longitudinal Structure of Quark-Gluon Plasma Unveiled Through Nuclear Deformations

    Authors: Chunjian Zhang, Shengli Huang, Jiangyong Jia

    Abstract: The study of quark-gluon plasma (QGP) is hindered by our limited understanding of its initial conditions, particularly its longitudinal structure. We propose a novel approach that entails analyzing collisions involving nuclei of similar masses but different deformations. This strategy allows us to vary the initial conditions and collective expansion of the QGP, while minimizing the influence of no… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures

  21. arXiv:2405.08541  [pdf, other

    physics.ins-det

    A Determination of the Local Gravitational Acceleration for the Tsinghua Tabletop Kibble Balance

    Authors: Weibo Liu, Nanjia Li, Yongchao Ma, Ruo Hu, Shuqing Wu, Wei Zhao, Songling Huang, Shisong Li

    Abstract: The Kibble balance requires a measurement of the local gravitational acceleration, $g$, with a typical relative measurement uncertainty of $10^{-9}$. In this paper, the determination of $g$ for the Tsinghua tabletop Kibble balance is presented. A polynomial fitting method is proposed for blind transfers of the absolute gravitational acceleration using relative gravimeters, showing agreement with t… ▽ More

    Submitted 20 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: 11 figures, submitted to IEEE Trans. Instrum. Meas

  22. arXiv:2405.07741  [pdf, other

    hep-ex

    Search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (635 additional authors not shown)

    Abstract: Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 8 pages, 2 figures

  23. arXiv:2405.07469  [pdf, other

    quant-ph physics.optics

    Phase coding semi-quantum key distribution system based on the Single-state protocol

    Authors: Qincheng Hou, Siying Huang, Naida Mo, Jindong Wang, Zhengjun Wei, Yafei Yu, Tianming Zhao, Zhiming Zhang

    Abstract: Semi-quantum key distribution (SQKD) allows sharing random keys between a quantum user and a classical user. However, implementing classical user operations is challenging, posing a hurdle to achieving the Single-state protocol. By using the "selective modulation" method, the feasibility of SQKD is verified in principle. The proposal of the selective modulation method enables the realization of ot… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  24. arXiv:2405.06690  [pdf, other

    q-bio.BM cs.CL cs.LG

    DrugLLM: Open Large Language Model for Few-shot Molecule Generation

    Authors: Xianggen Liu, Yan Guo, Haoran Li, Jin Liu, Shudong Huang, Bowen Ke, Jiancheng Lv

    Abstract: Large Language Models (LLMs) have made great strides in areas such as language processing and computer vision. Despite the emergence of diverse techniques to improve few-shot learning capacity, current LLMs fall short in handling the languages in biology and chemistry. For example, they are struggling to capture the relationship between molecule structure and pharmacochemical properties. Consequen… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 17 pages, 3 figures

  25. arXiv:2405.06393  [pdf, other

    hep-ex

    Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}π^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  26. arXiv:2405.06217  [pdf, other

    cs.CV cs.MM

    DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

    Authors: Ting Liu, Xuyang Liu, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu

    Abstract: Visual grounding (VG) is a challenging task to localize an object in an image based on a textual description. Recent surge in the scale of VG models has substantially improved performance, but also introduced a significant burden on computational costs during fine-tuning. In this paper, we explore applying parameter-efficient transfer learning (PETL) to efficiently transfer the pre-trained vision-… ▽ More

    Submitted 8 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024 (Oral)

  27. arXiv:2405.05254  [pdf, other

    cs.CL

    You Only Cache Once: Decoder-Decoder Architectures for Language Models

    Authors: Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

    Abstract: We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO onl… ▽ More

    Submitted 9 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  28. arXiv:2405.04768  [pdf, other

    cond-mat.mtrl-sci

    Circularly polarized light irradiated ferromagnetic MnBi$_2$Te$_4$: the long-sought ideal Weyl semimetal

    Authors: Shuai Fan, Shengpu Huang, Zhuo Chen, Fangyang Zhan, Xian-Yong Ding, Da-Shuai Ma, Rui Wang

    Abstract: The interaction between light and non-trivial energy band topology allows for the precise manipulation of topological quantum states, which has attracted intensive interest in condensed matter physics. In this work, using first-principles calculations, we studied the topological transition of ferromagnetic (FM) MnBi$_2$Te$_4$ upon irradiation with circularly polarized light (CPL). We revealed that… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  29. arXiv:2405.04101  [pdf, other

    cs.LG cs.AI

    Continual Learning in the Presence of Repetition

    Authors: Hamed Hemati, Lorenzo Pellegrini, Xiaotian Duan, Zixuan Zhao, Fangfang Xia, Marc Masana, Benedikt Tscheschner, Eduardo Veas, Yuxiang Zheng, Shiji Zhao, Shao-Yuan Li, Sheng-Jun Huang, Vincenzo Lomonaco, Gido M. van de Ven

    Abstract: Continual learning (CL) provides a framework for training models in ever-evolving environments. Although re-occurrence of previously seen objects or tasks is common in real-world problems, the concept of repetition in the data stream is not often considered in standard benchmarks for CL. Unlike with the rehearsal mechanism in buffer-based strategies, where sample repetition is controlled by the st… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Preprint; Challenge Report of the 4th Workshop on Continual Learning in Computer Vision at CVPR

  30. arXiv:2405.03908  [pdf, other

    cs.DC cs.DS

    Deterministic Expander Routing: Faster and More Versatile

    Authors: Yi-Jun Chang, Shang-En Huang, Hsin-Hao Su

    Abstract: We consider the expander routing problem formulated by Ghaffari, Kuhn, and Su (PODC 2017), where the goal is to route all the tokens to their destinations given that each vertex is the source and the destination of at most $°(v)$ tokens. They developed $\textit{randomized algorithms}$ that solve this problem in $\text{poly}(φ^{-1}) \cdot 2^{O(\sqrt{\log n \log \log n})}$ rounds in the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to PODC 2024

  31. arXiv:2405.02425  [pdf, other

    cs.RO cs.AI

    Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning

    Authors: Dhruva Tirumala, Markus Wulfmeier, Ben Moran, Sandy Huang, Jan Humplik, Guy Lever, Tuomas Haarnoja, Leonard Hasenclever, Arunkumar Byravan, Nathan Batchelor, Neil Sreendra, Kushal Patel, Marlon Gwira, Francesco Nori, Martin Riedmiller, Nicolas Heess

    Abstract: We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-b… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  32. arXiv:2405.01345  [pdf, other

    cs.CL

    The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights

    Authors: Wenhao Zhu, Shujian Huang, Fei Yuan, Cheng Chen, Jiajun Chen, Alexandra Birch

    Abstract: Bridging the significant gap between large language model's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment approach leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In t… ▽ More

    Submitted 29 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  33. arXiv:2404.18491  [pdf, other

    cond-mat.dis-nn

    Emergent Non-Abelian Thouless Pumping Induced by the Quasiperiodic Disorder

    Authors: Sen Huang, Yan-Qing Zhu, Zhi Li

    Abstract: We investigate the non-Abelian Thouless pumping in a disorder tunable Lieb chain with degenerate flat bands. The results reveal that quasiperiodic disorder will cause a topological phase transition from the trivial (without non-Abelian Thouless pumping) to the non-trivial (with non-Abelian Thouless pumping) phase. The mechanism behind is that the monopole originally outside the topological region… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 9 pages,5 figures

  34. arXiv:2404.17521  [pdf, other

    cs.RO cs.CV

    Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

    Authors: Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, modern methods (e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task repre… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Project website and open-source code: https://xiaoyao-li.github.io/research/ag2manip

  35. arXiv:2404.17025  [pdf, other

    cs.HC

    How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

    Authors: Shih-Hong Huang, Ya-Fang Lin, Zeyu He, Chieh-Yang Huang, Ting-Hao 'Kenneth' Huang

    Abstract: Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing LLMs to engage in longer or shorter conversations affect conversation quality? In this paper, we developed two Slack chatbots using GPT-4 with the ability to vary… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  36. arXiv:2404.16666  [pdf, other

    cs.CV

    PhyRecon: Physically Plausible Neural Scene Reconstruction

    Authors: Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: Neural implicit representations have gained popularity in multi-view 3D reconstruction. However, most previous work struggles to yield physically plausible results, limiting their utility in domains requiring rigorous physical accuracy, such as embodied AI and robotics. This lack of plausibility stems from the absence of physics modeling in existing methods and their inability to recover intricate… ▽ More

    Submitted 2 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: project page: https://phyrecon.github.io/. arXiv admin note: text overlap with arXiv:2303.08605 by other authors

  37. arXiv:2404.16306  [pdf, other

    cs.CV

    TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

    Authors: Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks

    Abstract: Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  38. arXiv:2404.15584  [pdf

    eess.SY

    Research on OPF control of three-phase four-wire low-voltage distribution network considering uncertainty

    Authors: Rui Wang, Xiaoqing Bai, Shengquan Huang, Shoupu Wei

    Abstract: As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to addres… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: systems optimization, robust optimization, local control

  39. arXiv:2404.15121  [pdf, other

    cs.GR cs.AI cs.CV

    Taming Diffusion Probabilistic Models for Character Control

    Authors: Rui Chen, Mingyi Shi, Shaoli Huang, Ping Tan, Taku Komura, Xuelin Chen

    Abstract: We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's his… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGGRAPH 2024 (Conference Track). Project page and source codes: https://aiganimation.github.io/CAMDM/

  40. arXiv:2404.15100  [pdf, other

    cs.CV cs.MM

    Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

    Authors: Xun Wu, Shaohan Huang, Furu Wei

    Abstract: Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  41. arXiv:2404.15045  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-Head Mixture-of-Experts

    Authors: Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei

    Abstract: Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a small subset of experts are activated for optimization. (2) Lacking fine-grained analytical capabilities for multiple semantic concepts within individual tokens. We propose Multi-Head Mixture-of-Experts… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  42. arXiv:2404.14986  [pdf, other

    cs.LG cs.AI

    $\texttt{MiniMol}$: A Parameter-Efficient Foundation Model for Molecular Learning

    Authors: Kerstin Kläser, Błażej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis Müller, Ali Parviz, Shenyang Huang, Andrew Fitzgibbon

    Abstract: In biological tasks, data is rarely plentiful as it is generated from hard-to-gather measurements. Therefore, pre-training foundation models on large quantities of available data and then transfer to low-data downstream tasks is a promising direction. However, how to design effective foundation models for molecular learning remains an open question, with existing approaches typically focusing on m… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  43. Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

    Abstract: Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be… ▽ More

    Submitted 13 July, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 19 pages, 10 figures

    Journal ref: Phys. Rev. D 110, 012006 (2024)

  44. arXiv:2404.13677  [pdf, other

    cs.CV eess.IV

    A Dataset and Model for Realistic License Plate Deblurring

    Authors: Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, Jingxin Liu, Siqi Huang, Hongbin Liu

    Abstract: Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we int… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  45. arXiv:2404.13628  [pdf, other

    cs.CL cs.LG cs.MM

    Mixture of LoRA Experts

    Authors: Xun Wu, Shaohan Huang, Furu Wei

    Abstract: LoRA has gained widespread acceptance in the fine-tuning of large pre-trained models to cater to a diverse array of downstream tasks, showcasing notable effectiveness and efficiency, thereby solidifying its position as one of the most prevalent fine-tuning techniques. Due to the modular nature of LoRA's plug-and-play plugins, researchers have delved into the amalgamation of multiple LoRAs to empow… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 17 pages, 11 figures

  46. arXiv:2404.13086  [pdf

    cond-mat.mtrl-sci

    Band Structure Engineering in Highly Crystalline Organic Semiconductors

    Authors: Shu-Jen Wang, Sebastian Hutsch, Felix Talnack, Marielle Deconinck, Shiyu Huang, Zongbao Zhang, Hans Kleemann, Yana Vaynzof, Stefan C. B. Mannsfeld, Frank Ortmann, Karl Leo

    Abstract: Blending of semiconductors for controlling the energy levels (band structure engineering) is an important technique, in particular, for optoelectronic applications. The underlying physics is the delocalized Bloch states, which average over the potential landscape of the blend. For organic semiconductors, it has been shown that two quite different effects, the dielectric constant and electrostatic… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  47. arXiv:2404.12768  [pdf, other

    cs.CV cs.AI cs.GR

    MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models

    Authors: Xinlong Ji, Fangneng Zhan, Shijian Lu, Shi-Sheng Huang, Hua Huang

    Abstract: Accurately estimating scene lighting is critical for applications such as mixed reality. Existing works estimate illumination by generating illumination maps or regressing illumination parameters. However, the method of generating illumination maps has poor generalization performance and parametric models such as Spherical Harmonic (SH) and Spherical Gaussian (SG) fall short in capturing high-freq… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  48. arXiv:2404.11996  [pdf, other

    cs.AI

    DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

    Authors: Songtao Huang, Hongjin Song, Tianqi Jiang, Akbar Telikani, Jun Shen, Qingguo Zhou, Binbin Yong, Qiang Wu

    Abstract: Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  49. arXiv:2404.11994  [pdf, other

    quant-ph

    Image Compression and Reconstruction Based on Quantum Network

    Authors: Xun Ji, Qin Liu, Shan Huang, Andi Chen, Shengjun Wu

    Abstract: Quantum network is an emerging type of network structure that leverages the principles of quantum mechanics to transmit and process information. Compared with classical data reconstruction algorithms, quantum networks make image reconstruction more efficient and accurate. They can also process more complex image information using fewer bits and faster parallel computing capabilities. Therefore, th… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 8 pages,5 figures

    ACM Class: I.4

  50. arXiv:2404.11865  [pdf, other

    cs.CV

    From Image to Video, what do we need in multimodal LLMs?

    Authors: Suyuan Huang, Haoxin Zhang, Yan Gao, Yao Hu, Zengchang Qin

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated profound capabilities in understanding multimodal information, covering from Image LLMs to the more complex Video LLMs. Numerous studies have illustrated their exceptional cross-modal comprehension. Recently, integrating video foundation models with large language models to build a comprehensive video understanding system has been proposed… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.