Zum Hauptinhalt springen

Showing 1–50 of 21,041 results for author: Wang, Y

.
  1. arXiv:2408.16767  [pdf, other

    cs.CV cs.AI cs.GR

    ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

    Authors: Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan

    Abstract: Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Project page: https://liuff19.github.io/ReconX

  2. arXiv:2408.16760  [pdf, other

    cs.CV

    OmniRe: Omni Urban Scene Reconstruction

    Authors: Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, Yue Wang

    Abstract: We introduce OmniRe, a holistic approach for efficiently reconstructing high-fidelity dynamic urban scenes from on-device logs. Recent methods for modeling driving sequences using neural radiance fields or Gaussian Splatting have demonstrated the potential of reconstructing challenging dynamic scenes, but often overlook pedestrians and other non-vehicle dynamic actors, hindering a complete pipelin… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: See the project page for code, video results and demos: https://ziyc.github.io/omnire/

  3. arXiv:2408.16751  [pdf, other

    cs.CL cs.LG stat.ML

    A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models

    Authors: Yi-Lin Tuan, William Yang Wang

    Abstract: Beyond maximum likelihood estimation (MLE), the standard objective of a language model (LM) that optimizes good examples probabilities, many studies have explored ways that also penalize bad examples for enhancing the quality of output distribution, including unlikelihood training, exponential maximizing average treatment effect (ExMATE), and direct preference optimization (DPO). To systematically… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  4. arXiv:2408.16646  [pdf, other

    hep-ex

    Study of the rare decay $J/ψ\to μ^+μ^-μ^+μ^-$

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1096 additional authors not shown)

    Abstract: The rare electromagnetic $J/ψ\to μ^+μ^-μ^+μ^-$ decay is observed with a significance greatly exceeding the discovery threshold, using proton-proton collision data collected by the LHCb experiment during 2016-2018 at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of $5.4\,\text{fb}^{-1}$. The rate of this decay is measured relative to that of the $J/ψ\to μ^+μ^-$ mode.… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3453 (LHCb public pages)

    Report number: LHCb-PAPER-2024-016, CERN-EP-2024-201

  5. arXiv:2408.16615  [pdf, ps, other

    cond-mat.str-el

    Topological flat bands in hyperbolic lattices

    Authors: Dong-Hao Guan, Lu Qi, Yuan Zhou, Ai-Lei He, Yi-Fei Wang

    Abstract: Topological flat bands (TFBs) provide a promising platform to investigate intriguing fractionalization phenomena, such as the fractional Chern insulators (FCIs). Most of TFB models are established in two-dimensional Euclidean lattices with zero curvature. In this work, we systematically explore TFBs in a class of two-dimensional non-Euclidean lattices with constant negative curvature, {\emph i.e.,… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 7 pages, 5 figures, comments are welcome

  6. arXiv:2408.16564  [pdf, other

    cs.MM cs.SD eess.AS

    Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing

    Authors: Qianhui Liu, Jiadong Wang, Yang Wang, Xin Yang, Gang Pan, Haizhou Li

    Abstract: Humans naturally perform audiovisual speech recognition (AVSR), enhancing the accuracy and robustness by integrating auditory and visual information. Spiking neural networks (SNNs), which mimic the brain's information-processing mechanisms, are well-suited for emulating the human capability of AVSR. Despite their potential, research on SNNs for AVSR is scarce, with most existing audio-visual multi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  7. arXiv:2408.16530  [pdf, other

    cs.CV

    A Comprehensive Review of 3D Object Detection in Autonomous Driving: Technological Advances and Future Directions

    Authors: Yu Wang, Shaohua Wang, Yicheng Li, Mingchun Liu

    Abstract: In recent years, 3D object perception has become a crucial component in the development of autonomous driving systems, providing essential environmental awareness. However, as perception tasks in autonomous driving evolve, their variants have increased, leading to diverse insights from industry and academia. Currently, there is a lack of comprehensive surveys that collect and summarize these perce… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  8. arXiv:2408.16500  [pdf, other

    cs.CV

    CogVLM2: Visual Language Models for Image and Video Understanding

    Authors: Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

    Abstract: Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.16498  [pdf, other

    cs.SE

    A Survey on Evaluating Large Language Models in Code Generation Tasks

    Authors: Liguo Chen, Qi Guo, Hongrui Jia, Zhengran Zeng, Xin Wang, Yijiang Xu, Jian Wu, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang

    Abstract: This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applicatio… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  10. Enhancing Sound Source Localization via False Negative Elimination

    Authors: Zengjie Song, Jiangshe Zhang, Yuxi Wang, Junsong Fan, Zhaoxiang Zhang

    Abstract: Sound source localization aims to localize objects emitting the sound in visual scenes. Recent works obtaining impressive results typically rely on contrastive learning. However, the common practice of randomly sampling negatives in prior arts can lead to the false negative issue, where the sounds semantically similar to visual instance are sampled as negatives and incorrectly pushed away from the… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2203.13412

  11. arXiv:2408.16434  [pdf

    physics.flu-dyn

    Physical Similarity of Fluid Flow in Bimodal Porous Media: Part 1 -- Basic Model and Solution Characteristics

    Authors: Yuhe Wang, Yating Wang

    Abstract: Fluid flow through bimodal porous media, characterized by a distinct separation in pore size distribution, is critical in various scientific and engineering applications, including groundwater management, oil and gas production, and carbon sequestration. This note delves into the physical similarity of fluid flow within such media, bridging the gap between microscale phenomena and macroscale obser… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 7 pages, 2 figures

  12. arXiv:2408.16431  [pdf, other

    cs.CV

    Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS

    Authors: Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang

    Abstract: Video object segmentation (VOS) is a crucial task in computer vision, but current VOS methods struggle with complex scenes and prolonged object motions. To address these challenges, the MOSE dataset aims to enhance object recognition and differentiation in complex environments, while the LVOS dataset focuses on segmenting objects exhibiting long-term, intricate movements. This report introduces a… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 1st Place Solution for 6th LSVOS VOS Track. arXiv admin note: substantial text overlap with arXiv:2406.04600

  13. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

  14. arXiv:2408.16308  [pdf, other

    cs.SI

    AdaMotif: Graph Simplification via Adaptive Motif Design

    Authors: Hong Zhou, Peifeng Lai, Zhida Sun, Xiangyuan Chen, Yang Chen, Huisi Wu, Yong Wang

    Abstract: With the increase of graph size, it becomes difficult or even impossible to visualize graph structures clearly within the limited screen space. Consequently, it is crucial to design effective visual representations for large graphs. In this paper, we propose AdaMotif, a novel approach that can capture the essential structure patterns of large graphs and effectively reveal the overall structures vi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  15. arXiv:2408.16279  [pdf, ps, other

    hep-ex

    Model-independent determination of the strong-phase difference between $D^0$ and $\bar{D}^0 \to π^+π^-π^+π^-$ decays

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (647 additional authors not shown)

    Abstract: Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  16. arXiv:2408.16266  [pdf, other

    cs.CV

    Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation

    Authors: Yanghao Wang, Long Chen

    Abstract: Data Augmentation (DA), \ie, synthesizing faithful and diverse samples to expand the original training set, is a prevalent and effective strategy to improve various visual recognition tasks. With the powerful image generation ability, diffusion-based DA has shown strong performance gains on different benchmarks. In this paper, we analyze today's diffusion-based DA methods, and argue that they cann… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  17. arXiv:2408.16258  [pdf, other

    cs.GR cs.CV

    Advancing Architectural Floorplan Design with Geometry-enhanced Graph Diffusion

    Authors: Sizhe Hu, Wenming Wu, Yuntao Wang, Benzhu Xu, Liping Zheng

    Abstract: Automating architectural floorplan design is vital for housing and interior design, offering a faster, cost-effective alternative to manual sketches by architects. However, existing methods, including rule-based and learning-based approaches, face challenges in design complexity and constrained generation with extensive post-processing, and tend to obvious geometric inconsistencies such as misalig… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  18. arXiv:2408.16244  [pdf, other

    quant-ph

    Quantum Advantage via Efficient Post-processing on Qudit Shadow tomography

    Authors: Yu Wang

    Abstract: Efficiently computing the trace of the product of exponential-scale matrices $A$ and $B$ presents a significant challenge in classical computation, particularly when $A$ is a $d$-dimensional positive Hermitian matrix with trace 1, and $B$ is a Hermitian matrix with a bounded norm. This computation traditionally requires $O(d^2)$ time complexity. We explore leveraging quantum advantage to perform t… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: The initial version, open to any comments

  19. arXiv:2408.16201  [pdf, other

    cs.CV cs.LG

    Uni-3DAD: GAN-Inversion Aided Universal 3D Anomaly Detection on Model-free Products

    Authors: Jiayu Liu, Shancong Mou, Nathan Gaw, Yinan Wang

    Abstract: Anomaly detection is a long-standing challenge in manufacturing systems. Traditionally, anomaly detection has relied on human inspectors. However, 3D point clouds have gained attention due to their robustness to environmental factors and their ability to represent geometric data. Existing 3D anomaly detection methods generally fall into two categories. One compares scanned 3D point clouds with des… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  20. arXiv:2408.16197  [pdf, other

    eess.SY

    Economic Optimal Power Management of Second-Life Battery Energy Storage Systems

    Authors: Amir Farakhor, Di Wu, Pingen Chen, Junmin Wang, Yebin Wang, Huazhen Fang

    Abstract: Second-life battery energy storage systems (SL-BESS) are an economical means of long-duration grid energy storage. They utilize retired battery packs from electric vehicles to store and provide electrical energy at the utility scale. However, they pose critical challenges in achieving optimal utilization and extending their remaining useful life. These complications primarily result from the const… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  21. arXiv:2408.16192  [pdf

    cond-mat.mes-hall

    Molecular-Scale Insights into the Heterogeneous Interactions Between an m-Terphenyl Isocyanide Ligand and Noble Metal Nanoparticles

    Authors: Liya Bi, Yufei Wang, Zhe Wang, Alexandria Do, Alexander Fuqua, Krista P. Balto, Yanning Zhang, Joshua S. Figueroa, Tod A. Pascal, Andrea R. Tao, Shaowei Li

    Abstract: The structural and chemical properties of metal nanoparticles are often dictated by their interactions with molecular ligand shells. These interactions are highly material-specific and can vary significantly even among elements within the same group or materials with similar crystal structure. Precise characterization of ligand-metal interactions is crucial for the rational design of ligands and t… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  22. arXiv:2408.16170  [pdf, other

    cs.DB cs.LG

    CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases

    Authors: Yannis Chronis, Yawen Wang, Yu Gan, Sami Abu-El-Haija, Chelsea Lin, Carsten Binnig, Fatma Özcan

    Abstract: Cardinality estimation is crucial for enabling high query performance in relational databases. Recently learned cardinality estimation models have been proposed to improve accuracy but there is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches and even systematically develop new learned approaches. In this paper, we are releasing a… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  23. arXiv:2408.15902  [pdf, ps, other

    cond-mat.supr-con

    Growth of (Cu,C)Ba$_{2}$Ca$_{2}$Cu$_3$O$_{9+δ}$ thin films on flexible Hastelloy tapes

    Authors: Meng-Jun Ou, Yuecong Liu, Yi Wang, Hai-Hu Wen

    Abstract: The applications of superconducting cable or magnet require that the superconductors are made into wires or tapes. For cuprate superconductors, this is a big challenge because of the strong flux motion induced by high anisotropy, very short coherence length and strong thermal fluctuation, etc. One of the ways is to fabricate superconducting films on flexible metallic tapes with oxide buffer layers… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 11 pages, 5 figures

  24. arXiv:2408.15777  [pdf, other

    cs.CV

    A Survey on Facial Expression Recognition of Static and Dynamic Emotions

    Authors: Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan

    Abstract: Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies. As the FER field evolves from controlled laboratory environments to more complex in-the-wild scenarios, advanced methods have been rapidly developed and new… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  25. arXiv:2408.15772  [pdf, other

    cs.IT

    220 GHz Urban Microcell Channel Measurement and Characterization on a University Campus

    Authors: Yuanbo Li, Yiqin Wang, Yejian Lyu, Ziming Yu, Chong Han

    Abstract: Owning abundant bandwidth resources, the Terahertz (THz) band (0.1-10~THz) is envisioned as a key technology to realize ultra-high-speed communications in 6G and beyond wireless networks. To realize reliable THz communications in urban microcell (UMi) environments, propagation analysis and channel characterization are still insufficient. In this paper, channel measurement campaigns are conducted i… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 5 pages, 4 figures, 1 table

  26. arXiv:2408.15588  [pdf, other

    physics.flu-dyn

    Opposition control applied to turbulent wings

    Authors: Yuning Wang, Marco Atzori, Ricardo Vinuesa

    Abstract: We conducted high-resolution large-eddy simulations (LESs) to explore the effects of opposition control (OC) on turbulent boundary layers (TBLs) over a wing at a chord-based Reynolds number (${Re}_c$) of 200,000. Two scenarios were studied: flow over the suction sides of the NACA0012 wing section at a $0^{\circ}$ angle of attack, and the NACA4412 wing section at a $5^{\circ}$ angle of attack, repr… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    MSC Class: 76-00

  27. arXiv:2408.15576  [pdf, other

    quant-ph

    Quantum Assemblage Tomography

    Authors: Luis Villegas-Aguilar, Yuanlong Wang, Alex Pepper, Travis J. Baker, Geoff J. Pryde, Sergei Slussarenko, Nora Tischler, Howard M. Wiseman

    Abstract: A central requirement in asymmetric quantum nonlocality protocols, such as quantum steering, is the precise reconstruction of state assemblages -- statistical ensembles of quantum states correlated with remote classical signals. Here we introduce a generalized loss model for assemblage tomography that uses conical optimization techniques combined with maximum likelihood estimation. Using an eviden… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 5 + 4 pages, 3 + 3 figures (Main + Supplemental Material)

  28. arXiv:2408.15568  [pdf, other

    cs.AR

    Affordable HPC: Leveraging Small Clusters for Big Data and Graph Computing

    Authors: Ruilong Wu, Yisu Wang, Dirk Kutscher

    Abstract: This study explores strategies for academic researchers to optimize computational resources within limited budgets, focusing on building small, efficient computing clusters. It delves into the comparative costs of purchasing versus renting servers, guided by market research and economic theories on tiered pricing. The paper offers detailed insights into the selection and assembly of hardware compo… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  29. arXiv:2408.15542  [pdf, other

    cs.CV cs.AI cs.MM

    Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

    Authors: Jiajun Liu, Yibing Wang, Hanghang Ma, Xiaoping Wu, Xiaoqi Ma, Xiaoming Wei, Jianbin Jiao, Enhua Wu, Jie Hu

    Abstract: Rapid advancements have been made in extending Large Language Models (LLMs) to Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data remains a challenging endeavor, especially for long videos. Due to insufficient access to large-scale high-quality video data and the excessive compression of visual features, current methods exhibit limitations in effectively proce… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  30. arXiv:2408.15518  [pdf, other

    cs.CL

    Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

    Authors: Wei Chen, Zhiyuan Li, Shuo Xin, Yihao Wang

    Abstract: This paper presents Dolphin, a novel decoder-decoder architecture for energy-efficient processing of long contexts in language models. Our approach addresses the significant energy consumption and latency challenges inherent in on-device models. Dolphin employs a compact 0.5B parameter decoder to distill extensive contextual information into a memory embedding, substantially reducing the input len… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  31. arXiv:2408.15484  [pdf, other

    cs.CV

    NAS-BNN: Neural Architecture Search for Binary Neural Networks

    Authors: Zhihao Lin, Yongtao Wang, Jinhe Zhang, Xiaojie Chu, Haibin Ling

    Abstract: Binary Neural Networks (BNNs) have gained extensive attention for their superior inferencing efficiency and compression ratio compared to traditional full-precision networks. However, due to the unique characteristics of BNNs, designing a powerful binary architecture is challenging and often requires significant manpower. A promising solution is to utilize Neural Architecture Search (NAS) to assis… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 23 pages

  32. arXiv:2408.15431  [pdf, other

    cond-mat.soft cond-mat.stat-mech physics.bio-ph

    Integer Topological Defects Reveal Effective Forces in Active Nematics

    Authors: Zihui Zhao, Yisong Yao, He Li, Yongfeng Zhao, Yujia Wang, Hepeng Zhang, Hugues Chat'e, Masaki Sano

    Abstract: Cell layers are often categorized as contractile or extensile active nematics but recent experiments on neural progenitor cells with induced $+1$ topological defects challenge this classification. In a bottom-up approach, we first study a relevant particle-level model and then analyze a continuous theory derived from it. We show that both model and theory account qualitatively for the main experim… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 7 pages, 5 figures, plus Supplemental Information

  33. arXiv:2408.15299  [pdf, other

    q-bio.BM cs.AI cs.LG

    TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering

    Authors: Yiqing Shen, Zan Chen, Michail Mamalakis, Yungeng Liu, Tianbin Li, Yanzhou Su, Junjun He, Pietro Liò, Yu Guang Wang

    Abstract: The structural similarities between protein sequences and natural languages have led to parallel advancements in deep learning across both domains. While large language models (LLMs) have achieved much progress in the domain of natural language processing, their potential in protein engineering remains largely unexplored. Previous approaches have equipped LLMs with protein understanding capabiliti… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  34. arXiv:2408.15270  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    SkillMimic: Learning Reusable Basketball Skills from Demonstrations

    Authors: Yinhuai Wang, Qihan Zhao, Runyi Yu, Ailing Zeng, Jing Lin, Zhengyi Luo, Hok Wai Tsui, Jiwen Yu, Xiu Li, Qifeng Chen, Jian Zhang, Lei Zhang, Ping Tan

    Abstract: Mastering basketball skills such as diverse layups and dribbling involves complex interactions with the ball and requires real-time adjustments. Traditional reinforcement learning methods for interaction skills rely on labor-intensive, manually designed rewards that do not generalize well across different skills. Inspired by how humans learn from demonstrations, we propose SkillMimic, a data-drive… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  35. arXiv:2408.15176  [pdf, other

    cs.SD cs.CL eess.AS

    Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement

    Authors: Longshen Ou, Jingwei Zhao, Ziyu Wang, Gus Xia, Ye Wang

    Abstract: Large language models have shown significant capabilities across various domains, including symbolic music generation. However, leveraging these pre-trained models for controllable music arrangement tasks, each requiring different forms of musical information as control, remains a novel challenge. In this paper, we propose a unified sequence-to-sequence framework that enables the fine-tuning of a… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Submitted to AAAI 2025

  36. arXiv:2408.15115  [pdf, other

    physics.comp-ph physics.flu-dyn

    A novel numerical framework for three-dimensional fully resolved simulation of freely falling particles of arbitrary shape

    Authors: Taraprasad Bhowmick, Jonas Latt, Yong Wang, Gholamhossein Bagheri

    Abstract: This article introduces a novel numerical framework designed to model the interplay between free-falling particles and their surrounding fluid in situations of high particle to fluid density ratio, typically exhibited by atmospheric particles. This method is designed to complement experimental studies in vertical wind tunnels to improve the understanding of the aerodynamic behavior of small atmosp… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  37. arXiv:2408.14961  [pdf, other

    cs.CV cs.AI

    CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task

    Authors: Lingyun Huang, Jianxu Mao, Yaonan Wang, Junfei Yi, Ziming Tao

    Abstract: In recent years, the rapid expansion of model sizes has led to large-scale pre-trained models demonstrating remarkable capabilities. Consequently, there has been a trend towards increasing the scale of models. However, this trend introduces significant challenges, including substantial computational costs of training and transfer to downstream tasks. To address these issues, Parameter-Efficient Fi… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  38. arXiv:2408.14866  [pdf, other

    cs.CL cs.CR cs.LG

    Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models

    Authors: Hongfu Liu, Yuxi Xie, Ye Wang, Michael Shieh

    Abstract: Language Language Models (LLMs) face safety concerns due to potential misuse by malicious users. Recent red-teaming efforts have identified adversarial suffixes capable of jailbreaking LLMs using the gradient-based search algorithm Greedy Coordinate Gradient (GCG). However, GCG struggles with computational inefficiency, limiting further investigations regarding suffix transferability and scalabili… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 11 pages, 4 figures

  39. arXiv:2408.14812  [pdf, other

    cs.CV

    HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling

    Authors: Yubin Wang, Xinyang Jiang, De Cheng, Wenli Sun, Dongsheng Li, Cairong Zhao

    Abstract: Prompt learning has become a prevalent strategy for adapting vision-language foundation models (VLMs) such as CLIP to downstream tasks. With the emergence of large language models (LLMs), recent studies have explored the potential of using category-related descriptions to enhance prompt effectiveness. However, conventional descriptions lack explicit structured information necessary to represent th… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 19 pages, 7 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:2312.06323

  40. Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning

    Authors: Zichen Tang, Junlin Huang, Rudan Yan, Yuxin Wang, Zhenheng Tang, Shaohuai Shi, Amelie Chi Zhou, Xiaowen Chu

    Abstract: Current data compression methods, such as sparsification in Federated Averaging (FedAvg), effectively enhance the communication efficiency of Federated Learning (FL). However, these methods encounter challenges such as the straggler problem and diminished model performance due to heterogeneous bandwidth and non-IID (Independently and Identically Distributed) data. To address these issues, we intro… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  41. arXiv:2408.14472  [pdf, other

    cs.RO cs.AI eess.SY

    Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

    Authors: Xinyang Gu, Yen-Jen Wang, Xiang Zhu, Chengming Shi, Yanjiang Guo, Yichen Liu, Jianyu Chen

    Abstract: Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinfor… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Robotics: Science and Systems (RSS), 2024. (Best Paper Award Finalist)

  42. arXiv:2408.14354  [pdf, other

    cs.SE cs.AI cs.CL

    SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

    Authors: Daoguang Zan, Zhirong Huang, Ailun Yu, Shaoxin Lin, Yifan Shi, Wei Liu, Dong Chen, Zongshuai Qi, Hao Yu, Lei Yu, Dezhi Ran, Muhan Zeng, Bo Shen, Pan Bian, Guangtai Liang, Bei Guan, Pengjie Huang, Tao Xie, Yongji Wang, Qianxiang Wang

    Abstract: GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large language models (LLMs), but has so far only focused on Python version. However, supporting more programming languages is also important, as there is a strong demand in… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This work is in progress

  43. arXiv:2408.14254  [pdf, other

    q-bio.NC cs.LG

    Integrated Brain Connectivity Analysis with fMRI, DTI, and sMRI Powered by Interpretable Graph Neural Networks

    Authors: Gang Qu, Ziyu Zhou, Vince D. Calhoun, Aiying Zhang, Yu-Ping Wang

    Abstract: Multimodal neuroimaging modeling has becomes a widely used approach but confronts considerable challenges due to heterogeneity, which encompasses variability in data types, scales, and formats across modalities. This variability necessitates the deployment of advanced computational methods to integrate and interpret these diverse datasets within a cohesive analytical framework. In our research, we… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  44. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  45. arXiv:2408.14071  [pdf, other

    physics.ins-det hep-ex

    Benchmarking the design of the cryogenics system for the underground argon in DarkSide-20k

    Authors: DarkSide-20k Collaboration, :, F. Acerbi, P. Adhikari, P. Agnes, I. Ahmad, S. Albergo, I. F. M. Albuquerque, T. Alexander, A. K. Alton, P. Amaudruz, M. Angiolilli, E. Aprile, R. Ardito, M. Atzori Corona, D. J. Auty, M. Ave, I. C. Avetisov, O. Azzolini, H. O. Back, Z. Balmforth, A. Barrado Olmedo, P. Barrillon, G. Batignani, P. Bhowmick , et al. (294 additional authors not shown)

    Abstract: DarkSide-20k (DS-20k) is a dark matter detection experiment under construction at the Laboratori Nazionali del Gran Sasso (LNGS) in Italy. It utilises ~100 t of low radioactivity argon from an underground source (UAr) in its inner detector, with half serving as target in a dual-phase time projection chamber (TPC). The UAr cryogenics system must maintain stable thermodynamic conditions throughout t… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 45 pages, 24 figures

  46. arXiv:2408.14070  [pdf

    physics.chem-ph cond-mat.dis-nn physics.bio-ph physics.comp-ph

    RiD-kit: Software package designed to do enhanced sampling using reinforced dynamics

    Authors: Jiahao Fan, Yanze Wang, Dongdong Wang, Linfeng Zhang

    Abstract: Developing an efficient method to accelerate the speed of molecular dynamics is a central theme in the field of molecular simulation. One category among the methods are collective-variable-based methods, which rely on predefined collective variables (CVs). The difficulty of selecting a few important CVs hinders the methods to be applied to large systems easily. Here we present a CV-based enhanced… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 43 pages,4 figures

  47. arXiv:2408.14047  [pdf

    cs.CV

    Alleviating Class Imbalance in Semi-supervised Multi-organ Segmentation via Balanced Subclass Regularization

    Authors: Zhenghao Feng, Lu Wen, Binyu Yan, Jiaqi Cui, Yan Wang

    Abstract: Semi-supervised learning (SSL) has shown notable potential in relieving the heavy demand of dense prediction tasks on large-scale well-annotated datasets, especially for the challenging multi-organ segmentation (MoS). However, the prevailing class-imbalance problem in MoS, caused by the substantial variations in organ size, exacerbates the learning difficulty of the SSL network. To alleviate this… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  48. arXiv:2408.14022  [pdf, other

    cs.DS

    An Efficient and Exact Algorithm for Locally h-Clique Densest Subgraph Discovery

    Authors: Xiaojia Xu, Haoyu Liu, Xiaowei Lv, Yongcai Wang, Deying Li

    Abstract: Detecting locally, non-overlapping, near-clique densest subgraphs is a crucial problem for community search in social networks. As a vertex may be involved in multiple overlapped local cliques, detecting locally densest sub-structures considering h-clique density, i.e., locally h-clique densest subgraph (LhCDS) attracts great interests. This paper investigates the LhCDS detection problem and propo… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by SIGMOD 2025

  49. arXiv:2408.13981  [pdf

    cs.CV

    ARANet: Attention-based Residual Adversarial Network with Deep Supervision for Radiotherapy Dose Prediction of Cervical Cancer

    Authors: Lu Wen, Wenxia Yin, Zhenghao Feng, Xi Wu, Deng Xiong, Yan Wang

    Abstract: Radiation therapy is the mainstay treatment for cervical cancer, and its ultimate goal is to ensure the planning target volume (PTV) reaches the prescribed dose while reducing dose deposition of organs-at-risk (OARs) as much as possible. To achieve these clinical requirements, the medical physicist needs to manually tweak the radiotherapy plan repeatedly in a trial-anderror manner until finding th… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted by 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM)

  50. arXiv:2408.13980  [pdf, other

    cs.CV

    FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation

    Authors: Daixun Li, Weiying Xie, Mingxiang Cao, Yunke Wang, Jiaqing Zhang, Yunsong Li, Leyuan Fang, Chang Xu

    Abstract: Multimodal image fusion and segmentation enhance scene understanding in autonomous driving by integrating data from various sensors. However, current models struggle to efficiently segment densely packed elements in such scenes, due to the absence of comprehensive fusion features that can guide mid-process fine-tuning and focus attention on relevant areas. The Segment Anything Model (SAM) has emer… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.