Zum Hauptinhalt springen

Showing 1–50 of 1,243 results for author: Yan, Z

.
  1. arXiv:2408.17065  [pdf, other

    cs.CV

    Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

    Authors: Zhiyuan Yan, Yandan Zhao, Shen Chen, Xinghe Fu, Taiping Yao, Shouhong Ding, Li Yuan

    Abstract: Three key challenges hinder the development of current deepfake video detection: (1) Temporal features can be complex and diverse: how can we identify general temporal artifacts to enhance model generalization? (2) Spatiotemporal models often lean heavily on one type of artifact and ignore the other: how can we ensure balanced learning from both? (3) Videos are naturally resource-intensive: how ca… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.17052  [pdf, other

    cs.CV

    Can We Leave Deepfake Data Behind in Training Deepfake Detector?

    Authors: Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, Chen Li

    Abstract: The generalization ability of deepfake detectors is vital for their applications in real-world scenarios. One effective solution to enhance this ability is to train the models with manually-blended data, which we termed "blendfake", encouraging models to learn generic forgery artifacts like blending boundary. Interestingly, current SoTA methods utilize blendfake without incorporating any deepfake… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  3. arXiv:2408.16853  [pdf, other

    cs.IT eess.SP

    RIS-Aided Backscattering Tag-to-Tag Networks: Performance Analysis

    Authors: Masoud Kaveh, Farshad Rostami Ghadi, Zheng Yan, Riku Jantti

    Abstract: Backscattering tag-to-tag networks (BTTNs) represent a passive radio frequency identification (RFID) system that enables direct communication between tags within an external radio frequency (RF) field. However, low spectral efficiency and short-range communication capabilities, along with the ultra-low power nature of the tags, create significant challenges for reliable and practical applications… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  4. arXiv:2408.15487  [pdf, ps, other

    math.CO

    A strong structural stability of $C_{2k+1}$-free graphs

    Authors: Zilong Yan, Yuejian Peng

    Abstract: Füredi and Gunderson showed that $ex(n, C_{2k+1})$ is achieved only on $K_{\lfloor\frac{n}{2}\rfloor, \lceil\frac{n}{2}\rceil}$ if $n\ge 4k-2$. It is natural to study how far a $ C_{2k+1}$-free graph is from being bipartite.Let $T^*(r, n)$ be obtained by adding a suspension $K_{r}$ with $1$ suspension point to $K_{\lfloor\frac{n-r+1}{2}\rfloor, \lceil\frac{n-r+1}{2}\rceil}$. We show that for integ… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  5. arXiv:2408.12301  [pdf, other

    gr-qc hep-th

    Compact star in noninteger power model of $f(R)$ gravity

    Authors: Yong-Xiang Cui, Zu Yan, Kota Numajiri, Taishi Katsuragawa, Shin'ichi Nojiri

    Abstract: We investigate compact stars in the noninteger power (NIP) model of $f(R)$ gravity theory, which includes the higher-curvature correction to the Einstein-Hilbert action. The mass-radius relation of the compact stars in the NIP model predicts large deviations from those in the general relativity in the low-mass region, potentially allowing us to test the NIP model by future astrophysical observatio… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 35 pages, 27 figures

    Report number: KEK-TH-2646, KEK-Cosmo-0354

  6. arXiv:2408.08228  [pdf, other

    eess.IV cs.CV

    Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective

    Authors: Zixuan Pan, Jun Xia, Zheyu Yan, Guoyue Xu, Yawen Wu, Zhenge Jia, Jianxu Chen, Yiyu Shi

    Abstract: Reconstruction-based methods, particularly those leveraging autoencoders, have been widely adopted to perform anomaly detection in brain MRI. While most existing works try to improve detection accuracy by proposing new model structures or algorithms, we tackle the problem through image quality assessment, an underexplored perspective in the field. We propose a fusion quality loss function that com… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  7. arXiv:2408.07197  [pdf, other

    cond-mat.mtrl-sci physics.optics

    Hybrid Magnonics with Localized Spoof Surface Plasmon Polaritons

    Authors: Yuzan Xiong, Andrew Christy, Zixin Yan, Amin Pishehvar, Muntasir Mahdi, Junming Wu, James F. Cahoon, Binbin Yang, Michael C. Hamilton, Xufeng Zhang, Wei Zhang

    Abstract: Hybrid magnonic systems have emerged as a promising direction for information propagation with preserved coherence. Due to high tunability of magnons, their interactions with microwave photons can be engineered to probe novel phenomena based on strong photon-magnon coupling. Improving the photon-magnon coupling strength can be done by tuning the structure of microwave resonators to better interact… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 13 pages, 13 figures

  8. arXiv:2408.06779  [pdf, other

    cs.CV

    ED$^4$: Explicit Data-level Debiasing for Deepfake Detection

    Authors: Jikang Cheng, Ying Zhang, Qin Zou, Zhiyuan Yan, Chao Liang, Zhongyuan Wang, Chen Li

    Abstract: Learning intrinsic bias from limited data has been considered the main reason for the failure of deepfake detection with generalizability. Apart from the discovered content and specific-forgery bias, we reveal a novel spatial bias, where detectors inertly anticipate observing structural forgery clues appearing at the image center, also can lead to the poor generalization of existing methods. We pr… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  9. arXiv:2408.06550  [pdf, other

    cs.HC

    Stretch or Vibrate? Rendering Spatial Information of Static and Moving Objects in VR via Haptic Feedback for Blind People

    Authors: Jiasheng Li, Zining Zhang, Zeyu Yan, Yuhang Zhao, Huaishu Peng

    Abstract: Perceiving spatial information of a virtual object (e.g., direction, distance) is critical yet challenging for blind users seeking an immersive virtual reality experience. To facilitate VR accessibility for blind users, in this paper, we investigate the effectiveness of two types of haptic cues--vibrotactile and skin-stretch cues--in conveying the spatial information of a virtual object when appli… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  10. arXiv:2408.03286  [pdf, other

    cs.CV

    Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

    Authors: Zhiling Yan, Weixiang Sun, Rong Zhou, Zhengqing Yuan, Kai Zhang, Yiwei Li, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun

    Abstract: Medical image segmentation and video object segmentation are essential for diagnosing and analyzing diseases by identifying and measuring biological structures. Recent advances in natural domain have been driven by foundation models like the Segment Anything Model 2 (SAM-2). To explore the performance of SAM-2 in biomedical applications, we designed three evaluation pipelines for single-frame 2D i… ▽ More

    Submitted 17 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  11. JetUnit: Rendering Diverse Force Feedback in Virtual Reality Using Water Jets

    Authors: Zining Zhang, Jiasheng Li, Zeyu Yan, Jun Nishida, Huaishu Peng

    Abstract: We propose JetUnit, a water-based VR haptic system designed to produce force feedback with a wide spectrum of intensities and frequencies through water jets. The key challenge in designing this system lies in optimizing parameters to enable the haptic device to generate force feedback that closely replicates the most intense force produced by direct water jets while ensuring the user remains dry.… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Journal ref: ACM UIST 2024

  12. arXiv:2408.01960  [pdf, other

    cs.CV cs.AI

    AnomalySD: Few-Shot Multi-Class Anomaly Detection with Stable Diffusion Model

    Authors: Zhenyu Yan, Qingqing Fang, Wenxi Lv, Qinliang Su

    Abstract: Anomaly detection is a critical task in industrial manufacturing, aiming to identify defective parts of products. Most industrial anomaly detection methods assume the availability of sufficient normal data for training. This assumption may not hold true due to the cost of labeling or data privacy policies. Additionally, mainstream methods require training bespoke models for different objects, whic… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures

  13. arXiv:2408.01618  [pdf, ps, other

    cond-mat.mtrl-sci

    Magnetic order-dependent giant tunneling magnetoresistance and electroresistance in van der Waals antiferromagnetic-multiferroic tunnel junctions

    Authors: Zhi Yan, Dan Qiao, Wentian Lu, Xinlong Dong, Xiaohong Xu

    Abstract: Antiferromagnetic spintronics exhibits ultra-high operational speed and stability in a magnetic field, holding promise for the realization of next-generation ultra-high-speed magnetic storage. However, theoretical exploration of the electronic transport properties of antiferromagnetic-multiferroic tunnel junction (AMFTJ) devices remains largely unexplored. Here, we design an antiferromagnet/ferroe… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  14. arXiv:2408.01607  [pdf

    cs.CV cs.LG

    Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives

    Authors: Lei Ma, Ziyun Yan, Mengmeng Li, Tao Liu, Liqin Tan, Xuan Wang, Weiqiang He, Ruikun Wang, Guangjun He, Heng Lu, Thomas Blaschke

    Abstract: Deep learning has gained significant attention in remote sensing, especially in pixel- or patch-level applications. Despite initial attempts to integrate deep learning into object-based image analysis (OBIA), its full potential remains largely unexplored. In this article, as OBIA usage becomes more widespread, we conducted a comprehensive review and expansion of its task subdomains, with or withou… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  15. arXiv:2408.01246  [pdf, other

    cs.CR

    MapComp: A Secure View-based Collaborative Analytics Framework for Join-Group-Aggregation

    Authors: Xinyu Peng, Feng Han, Li Peng, Weiran Liu, Zheng Yan, Kai Kang, Xinyuan Zhang, Guoxing Wei, Jianling Sun, Jinfei Liu

    Abstract: This paper introduces MapComp, a novel view-based framework to facilitate join-group-aggregation (JGA) queries for collaborative analytics. Through specially crafted materialized view for join and novel design of group-aggregation (GA) protocols, MapComp removes duplicated join workload and expedites subsequent GA, improving the efficiency of JGA query execution. To support continuous data updates… ▽ More

    Submitted 15 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: 12 pages

  16. arXiv:2408.01077  [pdf, other

    cs.CV

    PhysMamba: State Space Duality Model for Remote Physiological Measurement

    Authors: Zhixin Yan, Yan Zhong, Hongbin Xu, Wenjun Zhang, Lin Shu, Hongbin Xu, Wenxiong Kang

    Abstract: Remote Photoplethysmography (rPPG) is a non-contact technique for extracting physiological signals from facial videos, used in applications like emotion monitoring, medical assistance, and anti-face spoofing. Unlike controlled laboratory settings, real-world environments often contain motion artifacts and noise, affecting the performance of existing rPPG methods. To address this, we propose PhysMa… ▽ More

    Submitted 17 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

  17. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  18. arXiv:2407.21415  [pdf, other

    quant-ph

    In situ Qubit Frequency Tuning Circuit for Scalable Superconducting Quantum Computing: Scheme and Experiment

    Authors: Lei Jiang, Yu Xu, Shaowei Li, Zhiguang Yan, Ming Gong, Tao Rong, Chenyin Sun, Tianzuo Sun, Tao Jiang, Hui Deng, Chen Zha, Jin Lin, Fusheng Chen, Qingling Zhu, Yangsen Ye, Hao Rong, Kai Yan, Sirui Cao, Yuan Li, Shaojun Guo, Haoran Qian, Yisen Hu, Yulin Wu, Yuhuai Li, Gang Wu , et al. (8 additional authors not shown)

    Abstract: Frequency tunable qubit plays a significant role for scalable superconducting quantum processors. The state-of-the-art room-temperature electronics for tuning qubit frequency suffers from unscalable limit, such as heating problem, linear growth of control cables, etc. Here we propose a scalable scheme to tune the qubit frequency by using in situ superconducting circuit, which is based on radio fre… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 9 pages, 6 figures

  19. arXiv:2407.20262  [pdf

    eess.SP

    A Neural-Network-Embedded Equivalent Circuit Model for Lithium-ion Battery State Estimation

    Authors: Zelin Guo, Yiyan Li, Zheng Yan, Mo-Yuen Chow

    Abstract: Equivalent Circuit Model(ECM)has been widelyused in battery modeling and state estimation because of itssimplicity, stability and interpretability.However, ECM maygenerate large estimation errors in extreme working conditionssuch as freezing environmenttemperature andcomplexcharging/discharging behaviors,in whichscenariostheelectrochemical characteristics of the battery become extremelycomplex and… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 8 pages

  20. arXiv:2407.18866  [pdf, other

    hep-th gr-qc

    A Comment on Deriving the Gibbons-Hawking-York Term From the String Worldsheet

    Authors: Amr Ahmadain, Vasudev Shyam, Zihan Yan

    Abstract: In this note, we show that the noncovariant metric boundary term obtained from the nonlinear sigma model worldsheet derivation of the bulk off-shell sphere partition function is closely related to the Einstein boundary term in the Gamma-Gamma noncovariant action. In fact, when expressed in terms of the trace of the extrinsic curvature tensor, we illustrate that this boundary term has one-half the… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 11 pages

  21. arXiv:2407.16260  [pdf, other

    cs.CV

    DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

    Authors: Zizheng Yan, Jiapeng Zhou, Fanpeng Meng, Yushuang Wu, Lingteng Qiu, Zisheng Ye, Shuguang Cui, Guanying Chen, Xiaoguang Han

    Abstract: Text-to-3D generation has recently seen significant progress. To enhance its practicality in real-world applications, it is crucial to generate multiple independent objects with interactions, similar to layer-compositing in 2D image editing. However, existing text-to-3D methods struggle with this task, as they are designed to generate either non-independent objects or independent objects lacking s… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://chester256.github.io/dreamdissector

  22. arXiv:2407.14796  [pdf, other

    cs.CV cs.AI

    PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates

    Authors: Junjie Shi, Caozhi Shang, Zhaobin Sun, Li Yu, Xin Yang, Zengqiang Yan

    Abstract: Incomplete multi-modal image segmentation is a fundamental task in medical imaging to refine deployment efficiency when only partial modalities are available. However, the common practice that complete-modality data is visible during model training is far from realistic, as modalities can have imbalanced missing rates in clinical scenarios. In this paper, we, for the first time, formulate such a c… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  23. arXiv:2407.14769  [pdf, other

    cs.HC

    A Two-Phase Visualization System for Continuous Human-AI Collaboration in Sequelae Analysis and Modeling

    Authors: Yang Ouyang, Chenyang Zhang, He Wang, Tianle Ma, Chang Jiang, Yuheng Yan, Zuoqin Yan, Xiaojuan Ma, Chuhan Shi, Quan Li

    Abstract: In healthcare, AI techniques are widely used for tasks like risk assessment and anomaly detection. Despite AI's potential as a valuable assistant, its role in complex medical data analysis often oversimplifies human-AI collaboration dynamics. To address this, we collaborated with a local hospital, engaging six physicians and one data scientist in a formative study. From this collaboration, we prop… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: To appear at the IEEE VIS Conference 2024

  24. arXiv:2407.13691  [pdf, other

    eess.SP

    Unsupervised and Interpretable Synthesizing for Electrical Time Series Based on Information Maximizing Generative Adversarial Nets

    Authors: Zhenghao Zhou, Yiyan Li, Runlong Liu, Zheng Yan, Mo-Yuen Chow

    Abstract: Generating synthetic data has become a popular alternative solution to deal with the difficulties in accessing and sharing field measurement data in power systems. However, to make the generation results controllable, existing methods (e.g. Conditional Generative Adversarial Nets, cGAN) require labeled dataset to train the model, which is demanding in practice because many field measurement data l… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  25. arXiv:2407.13338  [pdf, other

    cs.CV

    Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM

    Authors: Baicheng Li, Zike Yan, Dong Wu, Hanqing Jiang, Hongbin Zha

    Abstract: Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention due to the expressive representation power and the innovative paradigm of continual learning. However, deploying such a system within a dynamic environment has not been well-studied. Such challenges are intractable even for conventional algorithms since observations from different vie… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  26. arXiv:2407.12446  [pdf, other

    cs.CV

    Non-parametric regularization for class imbalance federated medical image classification

    Authors: Jeffry Wicaksana, Zengqiang Yan, Kwang-Ting Cheng

    Abstract: Limited training data and severe class imbalance pose significant challenges to developing clinically robust deep learning models. Federated learning (FL) addresses the former by enabling different medical clients to collaboratively train a deep model without sharing privacy-sensitive data. However, class imbalance worsens due to variation in inter-client class distribution. We propose federated l… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.00738

  27. arXiv:2407.12441  [pdf, ps, other

    nlin.PS math-ph physics.comp-ph physics.optics quant-ph

    Dynamics of discrete solitons in the fractional discrete nonlinear Schrödinger equation with the quasi-Riesz derivative

    Authors: Ming Zhong, Boris A. Malomed, Zhenya Yan

    Abstract: We elaborate a fractional discrete nonlinear Schrödinger (FDNLS) equation based on an appropriately modified definition of the Riesz fractional derivative, which is characterized by its Lévy index (LI). This FDNLS equation represents a novel discrete system, in which the nearest-neighbor coupling is combined with long-range interactions, that decay as the inverse square of the separation between l… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 15 pages, 8 figures (to be published in Phys. Rev. E, 2024)

  28. arXiv:2407.12280  [pdf, ps, other

    math.DG math.CO

    Juhl type formulas for curved Ovsienko--Redou operators

    Authors: Shane Chern, Zetian Yan

    Abstract: We prove Juhl type formulas for the curved Ovsienko--Redou operators and their linear analogues, which indicate the associated formal self-adjointness, thereby confirming two conjectures of Case, Lin, and Yuan. We also offer an extension of Juhl's original formula for the GJMS operators.

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 37 pages. Comments are welcome

    MSC Class: Primary 58J70; Secondary 53A40; 33C20

  29. arXiv:2407.10166  [pdf, other

    cond-mat.mes-hall

    A general theory for infernal points in non-Hermitian systems

    Authors: Shu-Xuan Wang, Zhongbo Yan

    Abstract: The coalescence of eigenstates is a unique phenomena in non-Hermitian systems. Remarkably, it has been noticed in some non-Hermitian systems under open boundary conditions that the whole set of eigenstates can coalesce to only a few eigenstates. In the parameter space, the point at which such a coalescence of macroscopic eigenstates occurs is dubbed as an infernal point. In this paper, based on th… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 7+9 pages, 2+3 figures

  30. arXiv:2407.08421  [pdf, other

    astro-ph.HE

    X-ray spectral and timing evolution during the 2018 outburst of MAXI J1820+070

    Authors: YaXing Li, Zhen Yan, ChenXu Gao, Wenfei Yu

    Abstract: We made use high-cadence observations from the $Insight$-HXMT and $NICER$ to scrutinize the spectral and timing evolution during the 2018 outburst of the black hole X-ray binary (BHXRB) MAXI J1820+070. It's hardness-intensity diagram (HID) displays a ''q''-like track including all the spectral states, along a unique loop in the hard state. The tracks observed in the HID is anticipated in the evolu… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures, submitted to MNRAS

  31. arXiv:2407.05407  [pdf, other

    cs.SD cs.AI eess.AS

    CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

    Authors: Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

    Abstract: Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role… ▽ More

    Submitted 9 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: work in progress. arXiv admin note: substantial text overlap with arXiv:2407.04051

  32. arXiv:2407.04942  [pdf, other

    cs.RO cs.LG

    FOSP: Fine-tuning Offline Safe Policy through World Models

    Authors: Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang

    Abstract: Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 21 pages

  33. arXiv:2407.04242  [pdf, other

    cs.CV

    Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

    Authors: Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni

    Abstract: Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependen… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024. This is the submitted manuscript and the preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections

  34. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  35. arXiv:2407.03699  [pdf, other

    cs.CV

    Generalized Robust Fundus Photography-based Vision Loss Estimation for High Myopia

    Authors: Zipei Yan, Zhile Liang, Zhengji Liu, Shuai Wang, Rachel Ka-Man Chun, Jizhou Li, Chea-su Kee, Dong Liang

    Abstract: High myopia significantly increases the risk of irreversible vision loss. Traditional perimetry-based visual field (VF) assessment provides systematic quantification of visual loss but it is subjective and time-consuming. Consequently, machine learning models utilizing fundus photographs to estimate VF have emerged as promising alternatives. However, due to the high variability and the limited ava… ▽ More

    Submitted 17 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024, code: https://github.com/yanzipei/VF_RED

  36. arXiv:2407.02280  [pdf, other

    cs.CV cs.AI

    FedIA: Federated Medical Image Segmentation with Heterogeneous Annotation Completeness

    Authors: Yangyang Xiang, Nannan Wu, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan

    Abstract: Federated learning has emerged as a compelling paradigm for medical image segmentation, particularly in light of increasing privacy concerns. However, most of the existing research relies on relatively stringent assumptions regarding the uniformity and completeness of annotations across clients. Contrary to this, this paper highlights a prevalent challenge in medical practice: incomplete annotatio… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Early accepted by MICCAI 2024

  37. arXiv:2406.18995  [pdf, other

    cs.LG cs.AI

    FedMLP: Federated Multi-Label Medical Image Classification under Task Heterogeneity

    Authors: Zhaobin Sun, Nannan Wu, Junjie Shi, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan

    Abstract: Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Early accepted by MICCAI 2024

  38. arXiv:2406.18361  [pdf, other

    cs.CV cs.AI eess.IV

    Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process

    Authors: Tianyu Lin, Zhiguang Chen, Zhonghao Yan, Weijiang Yu, Fudan Zheng

    Abstract: Diffusion models have demonstrated their effectiveness across various generative tasks. However, when applied to medical image segmentation, these models encounter several challenges, including significant resource and time requirements. They also necessitate a multi-step reverse process and multiple samples to produce reliable predictions. To address these challenges, we introduce the first laten… ▽ More

    Submitted 9 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted at MICCAI 2024. Code and citation info see https://github.com/lin-tianyu/Stable-Diffusion-Seg

  39. arXiv:2406.16168  [pdf, other

    cs.LG

    An All-MLP Sequence Modeling Architecture That Excels at Copying

    Authors: Chenwei Cui, Zehao Yan, Gedeon Muhawenayo, Hannah Kerner

    Abstract: Recent work demonstrated Transformers' ability to efficiently copy strings of exponential sizes, distinguishing them from other architectures. We present the Causal Relation Network (CausalRN), an all-MLP sequence modeling architecture that can match Transformers on the copying task. Extending Relation Networks (RNs), we implemented key innovations to support autoregressive sequence modeling while… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024 Next Generation of Sequence Modeling Architectures Workshop

  40. arXiv:2406.15994  [pdf, other

    astro-ph.HE astro-ph.CO

    The delayed radio emission in the black hole X-ray binary MAXI J1348$-$630

    Authors: Bei You, Shuai-kang Yang, Zhen Yan, Xinwu Cao, Andrzej A. Zdziarski

    Abstract: We explore the coupling between the accretion flow and the jet in black hole X-ray binary (BHXRB) MAXI J1348-630 by analyzing the X-ray and radio observations during its 2019 outburst. We measure the time delay between the radio and Comptonization fluxes with the interpolated cross-correlation function. For the first time, we find that the radio emission lags behind the X-ray Comptonization emissi… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures, Accepted for publication in ApJ Letters

  41. arXiv:2406.13495  [pdf, other

    cs.CV

    DF40: Toward Next-Generation Deepfake Detection

    Authors: Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Li Yuan, Chengjie Wang, Shouhong Ding, Yunsheng Wu

    Abstract: We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset (e.g., FF++) and testing them on other prevalent deepfake datasets. This protocol is often regarded as a "golden compass"… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  42. arXiv:2406.13275  [pdf, other

    cs.SD cs.CL eess.AS

    Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

    Authors: Jizhong Liu, Gang Li, Junbo Zhang, Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Yujun Wang, Bin Wang

    Abstract: Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened up possibilities for improving AAC. Thus, we explore enhancing AAC from three aspects: 1) a pre-trained audio encoder via consistent ensemble distillation (CED)… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  43. arXiv:2406.12477  [pdf, other

    astro-ph.HE

    An atypical low-frequency QPO detected in the hard state of MAXI J1348-630 with $Insight$-HXMT

    Authors: Xin-Lei Wang, Zhen Yan, Fu-Guo Xie, Jun-Feng Wang, Ren-Yi Ma

    Abstract: Based on the $Insight$-HXMT archival data, we have detected a new atypical low-frequency quasi-periodic oscillation (LFQPO) in the black hole X-ray binary MAXI J1348$-$630. The new LFQPO is detected in all the three instruments of $Insight$-HXMT with a combined significance of 3--5 $σ$, covering a wide energy range of 1--100 keV. The fractional root-mean-square (RMS) seems decrease with energy. It… ▽ More

    Submitted 19 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: 20 pages, 6 figures. Accepted by ApJ

  44. arXiv:2406.11495  [pdf, other

    cs.RO cs.AI

    Online Context Learning for Socially-compliant Navigation

    Authors: Iaroslav Okunevich, Alexandre Lombard, Tomas Krajnik, Yassine Ruichek, Zhi Yan

    Abstract: Robot social navigation needs to adapt to different human factors and environmental contexts. However, since these factors and contexts are difficult to predict and cannot be exhaustively enumerated, traditional learning-based methods have difficulty in ensuring the social attributes of robots in long-term and cross-environment deployments. This letter introduces an online context learning method… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages, 4 figures, 1 table, 1 algorithm

  45. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  46. arXiv:2406.08563  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall cond-mat.quant-gas cond-mat.supr-con quant-ph

    Field-sensitive dislocation bound states in two-dimensional $d$-wave altermagnets

    Authors: Di Zhu, Dongling Liu, Zheng-Yang Zhuang, Zhigang Wu, Zhongbo Yan

    Abstract: When a two-dimensional $d$-wave altermagnet is grown on a substrate, the interplay of momentum-dependent spin splittings arising from altermagnetism and Rashba spin-orbit coupling gives rise to a nodal band structure with band degeneracies enforced by a $C_{4z}\mathcal{T}$ symmetry. If we break the $C_{4z}\mathcal{T}$ symmetry by an exchange field, the band degeneracies are found to be immediately… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures

  47. arXiv:2406.07487  [pdf, other

    cs.CV

    GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

    Authors: Hang Yao, Ming Liu, Haolin Wang, Zhicun Yin, Zifei Yan, Xiaopeng Hong, Wangmeng Zuo

    Abstract: Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with dif… ▽ More

    Submitted 2 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ECCV 2024, code and models: https://github.com/hyao1/GLAD. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  48. arXiv:2406.07012  [pdf, other

    cs.SD cs.CL eess.AS

    Bridging Language Gaps in Audio-Text Retrieval

    Authors: Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang

    Abstract: Audio-text retrieval is a challenging task, requiring the search for an audio clip or a text caption within a database. The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data. To address these linguistic disparities, we propose a language enhancement (LE), using a multi… ▽ More

    Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: interspeech2024

  49. arXiv:2406.06992  [pdf, other

    cs.SD eess.AS

    Scaling up masked audio encoder learning for general audio classification

    Authors: Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang, Bin Wang

    Abstract: Despite progress in audio classification, a generalization gap remains between speech and other sound domains, such as environmental sounds and music. Models trained for speech tasks often fail to perform well on environmental or musical audio tasks, and vice versa. While self-supervised (SSL) audio representations offer an alternative, there has been limited exploration of scaling both model and… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  50. arXiv:2406.06544  [pdf, other

    cs.AR cs.AI

    TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators

    Authors: Yifan Qin, Zheyu Yan, Zixuan Pan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Compute-in-memory (CIM) accelerators using non-volatile memory (NVM) devices offer promising solutions for energy-efficient and low-latency Deep Neural Network (DNN) inference execution. However, practical deployment is often hindered by the challenge of dealing with the massive amount of model weight parameters impacted by the inherent device variations within non-volatile computing-in-memory (NV… ▽ More

    Submitted 21 August, 2024; v1 submitted 8 May, 2024; originally announced June 2024.

    Comments: 9 pages, accepted to IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2024)