Zum Hauptinhalt springen

Showing 1–50 of 99 results for author: Song, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14734  [pdf

    cs.LG math-ph math.NA

    General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

    Authors: Sen Wang, Peizhi Zhao, Qinglong Ma, Tao Song

    Abstract: Physics-Informed Neural Networks (PINNs) have become a promising research direction in the field of solving Partial Differential Equations (PDEs). Dealing with singular perturbation problems continues to be a difficult challenge in the field of PINN. The solution of singular perturbation problems often exhibits sharp boundary layers and steep gradients, and traditional PINN cannot achieve approxim… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  2. arXiv:2408.13378  [pdf, other

    cs.AI cs.CL cs.IR cs.LG q-bio.QM

    DrugAgent: Explainable Drug Repurposing Agent with Large Language Model-based Reasoning

    Authors: Yoshitaka Inoue, Tianci Song, Tianfan Fu

    Abstract: Drug repurposing offers a promising avenue for accelerating drug development by identifying new therapeutic potentials of existing drugs. In this paper, we propose a multi-agent framework to enhance the drug repurposing process using state-of-the-art machine learning techniques and knowledge integration. Our framework comprises several specialized agents: an AI Agent trains robust drug-target inte… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 18 pages, 1 figure

  3. OFL-W3: A One-shot Federated Learning System on Web 3.0

    Authors: Linshan Jiang, Moming Duan, Bingsheng He, Yulin Sun, Peishen Yan, Yang Hua, Tao Song

    Abstract: Federated Learning (FL) addresses the challenges posed by data silos, which arise from privacy, security regulations, and ownership concerns. Despite these barriers, FL enables these isolated data repositories to participate in collaborative learning without compromising privacy or security. Concurrently, the advancement of blockchain technology and decentralized applications (DApps) within Web 3.… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: VLDB 24 demo paper

  4. arXiv:2407.15389  [pdf, other

    cs.LG cs.CR cs.DC

    Poisoning with A Pill: Circumventing Detection in Federated Learning

    Authors: Hanxi Guo, Hao Wang, Tao Song, Tianhang Zheng, Yang Hua, Haibing Guan, Xiangyu Zhang

    Abstract: Without direct access to the client's data, federated learning (FL) is well-known for its unique strength in data privacy protection among existing distributed machine learning techniques. However, its distributive and iterative nature makes FL inherently vulnerable to various poisoning attacks. To counteract these threats, extensive defenses have been proposed to filter out malicious clients, usi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  5. arXiv:2407.01614  [pdf, other

    cs.LG cs.AI

    Enhancing Stability for Large Models Training in Constrained Bandwidth Networks

    Authors: Yun Dai, Tejas Dharamsi, Byron Hsu, Tao Song, Hamed Firooz

    Abstract: Training extremely large language models with billions of parameters is a computationally intensive task that pushes the limits of current data parallel training systems. While techniques like ZeRO++ have enabled efficient distributed training of such giant models on inexpensive low-bandwidth clusters, they can suffer from convergence issues due to potential race conditions in the hierarchical par… ▽ More

    Submitted 31 July, 2024; v1 submitted 27 June, 2024; originally announced July 2024.

  6. arXiv:2406.07572  [pdf, ps, other

    cs.AI cs.CE cs.LG

    Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

    Authors: Tao Song, Yuwei Fan, Chenlong Feng, Keyu Song, Chao Liu, Dongxiang Jiang

    Abstract: This study explores the application of large language models (LLMs) with callable tools in energy and power engineering domain, focusing on gas path analysis of gas turbines. We developed a dual-agent tool-calling process to integrate expert knowledge, predefined tools, and LLM reasoning. We evaluated various LLMs, including LLama3, Qwen1.5 and GPT. Smaller models struggled with tool usage and par… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  7. arXiv:2406.05472  [pdf, other

    cs.CR eess.SY

    A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications

    Authors: Aydin Zaboli, Seong Lok Choi, Tai-Jin Song, Junho Hong

    Abstract: Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures, Submitted to IEEE Transactions on Information Forensics and Security

  8. arXiv:2405.19931  [pdf, other

    cs.CV cs.AI cs.LG

    Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

    Authors: Xiaoyu Wu, Jiaru Zhang, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan

    Abstract: Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications. However, we explore the training dynamics of DMs and observe an unanticipated phenomenon: during the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Preprint. Under review

  9. arXiv:2405.18784  [pdf, other

    cs.CV

    LP-3DGS: Learning to Prune 3D Gaussian Splatting

    Authors: Zhaoliang Zhang, Tianchen Song, Yongjae Lee, Li Yang, Cheng Peng, Rama Chellappa, Deliang Fan

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical and preset prunin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  10. arXiv:2405.09200  [pdf, ps, other

    cs.IT

    Performance Analysis of RIS-aided MISO Systems with EMI and Channel Aging

    Authors: Taoyu Song, Enyu Shi, Yu Lu, Yiyang Zhu, Jiayi Zhang, Bo Ai

    Abstract: In this paper, we investigate a reconfigurable intelligent surface (RIS)-aided multiple-input single-output (MISO) system in the presence of electromagnetic interference (EMI) and channel aging with a Rician fading channel model between the base station (BS) and user equipment (UE). Specifically, we derive the closed-form expression for downlink spectral efficiency (SE) with maximum ratio transmis… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  11. arXiv:2405.06277  [pdf, other

    cs.CV

    Learning A Spiking Neural Network for Efficient Image Deraining

    Authors: Tianyu Song, Guiyue Jin, Pengpeng Li, Kui Jiang, Xiang Chen, Jiyu Jin

    Abstract: Recently, spiking neural networks (SNNs) have demonstrated substantial potential in computer vision tasks. In this paper, we present an Efficient Spiking Deraining Network, called ESDNet. Our work is motivated by the observation that rain pixel values will lead to a more pronounced intensity of spike signals in SNNs. However, directly applying deep SNNs to image deraining task still remains a sign… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI2024

  12. arXiv:2404.19192  [pdf, other

    cs.CL cs.AI

    Mix of Experts Language Model for Named Entity Recognition

    Authors: Xinwei Chen, Kun Li, Tianyou Song, Jiangjian Guo

    Abstract: Named Entity Recognition (NER) is an essential steppingstone in the field of natural language processing. Although promising performance has been achieved by various distantly supervised models, we argue that distant supervision inevitably introduces incomplete and noisy annotations, which may mislead the model training process. To address this issue, we propose a robust NER model named BOND-MoE b… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  13. arXiv:2404.14617  [pdf, other

    cs.AR

    TDRAM: Tag-enhanced DRAM for Efficient Caching

    Authors: Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael Miller, Taeksang Song, Thomas Vogelsang, Steven Woo, Jason Lowe-Power

    Abstract: As SRAM-based caches are hitting a scaling wall, manufacturers are integrating DRAM-based caches into system designs to continue increasing cache sizes. While DRAM caches can improve the performance of memory systems, existing DRAM cache designs suffer from high miss penalties, wasted data movement, and interference between misses and demand requests. In this paper, we propose TDRAM, a novel DRAM… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  14. arXiv:2404.13903  [pdf, other

    cs.CV

    Accelerating Image Generation with Sub-path Linear Approximation Model

    Authors: Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

    Abstract: Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining hi… ▽ More

    Submitted 21 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.09405  [pdf, other

    cs.CL cs.AI

    Few-shot Name Entity Recognition on StackOverflow

    Authors: Xinwei Chen, Kun Li, Tianyou Song, Jiangjian Guo

    Abstract: StackOverflow, with its vast question repository and limited labeled examples, raise an annotation challenge for us. We address this gap by proposing RoBERTa+MAML, a few-shot named entity recognition (NER) method leveraging meta-learning. Our approach, evaluated on the StackOverflow NER corpus (27 entity types), achieves a 5% F1 score improvement over the baseline. We improved the results further… ▽ More

    Submitted 27 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: 5 pages

  16. arXiv:2404.01230  [pdf, other

    cs.CL

    LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models

    Authors: Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei

    Abstract: This paper presents a comprehensive survey of the current status and opportunities for Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. Strategic reasoning is distinguished by its focus on the dynamic and uncertain nature of interact… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures

  17. arXiv:2403.13254  [pdf, other

    cs.SD eess.AS

    Onset and offset weighted loss function for sound event detection

    Authors: Tao Song

    Abstract: In a typical sound event detection (SED) system, the existence of a sound event is detected at a frame level, and consecutive frames with the same event detected are combined as one sound event. The median filter is applied as a post-processing step to remove detection errors as much as possible. However, detection errors occurring around the onset and offset of a sound event are beyond the capaci… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  18. arXiv:2403.13252  [pdf, other

    cs.SD eess.AS

    Frequency-aware convolution for sound event detection

    Authors: Tao Song

    Abstract: In sound event detection (SED), convolution neural networks (CNNs) are widely used to extract time-frequency patterns from the input spectrogram. However, features extracted by CNN can be insensitive to the shift of time-frequency patterns along the frequency axis. To address this issue, frequency dynamic convolution (FDY) has been proposed, which applies different kernels to different frequency c… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  19. arXiv:2403.11162  [pdf, other

    cs.CV cs.AI cs.CR cs.CY cs.LG

    CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion

    Authors: Xiaoyu Wu, Yang Hua, Chumeng Liang, Jiaru Zhang, Hao Wang, Tao Song, Haibing Guan

    Abstract: Diffusion Models (DMs) have evolved into advanced image generation tools, especially for few-shot generation where a pretrained model is fine-tuned on a small set of images to capture a specific style or object. Despite their success, concerns exist about potential copyright violations stemming from the use of unauthorized data in this process. In response, we present Contrasting Gradient Inversio… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  20. arXiv:2403.03742  [pdf, other

    cs.HC

    Mitigating Ageism through Virtual Reality: Intergenerational Collaborative Escape Room Design

    Authors: Ruotong Zou, Shuyu Yin, Tianqi Song, Peinuan Qin, Yi-Chieh Lee

    Abstract: As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape r… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  21. arXiv:2403.00193  [pdf

    cs.NI cs.CR

    Structural Resilience and Connectivity of the IPv6 Internet: An AS-level Topology Examination

    Authors: Bin Yuan, Tianbo Song

    Abstract: The study utilizes a comprehensive dataset informed by IPv6 routing information to provide statistics, degree distribution, joint degree distribution, and clustering analysis of the IPv6 Internet's structure and resilience.The dataset includes 17,232 unique ASes and 10,000 unique IPv6 prefixes. Analysis reveals an interconnected network with an average path length of approximately 3 hops, suggesti… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  22. arXiv:2403.00190  [pdf

    cs.SI cs.AI

    Identification of important nodes in the information propagation network based on the artificial intelligence method

    Authors: Bin Yuan, Tianbo Song, Jerry Yao

    Abstract: This study presents an integrated approach for identifying key nodes in information propagation networks using advanced artificial intelligence methods. We introduce a novel technique that combines the Decision-making Trial and Evaluation Laboratory (DEMATEL) method with the Global Structure Model (GSM), creating a synergistic model that effectively captures both local and global influences within… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  23. arXiv:2402.18923  [pdf, other

    cs.CL cs.SD eess.AS

    Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition

    Authors: Jeehyun Lee, Yerin Choi, Tae-Jin Song, Myoung-Wan Koo

    Abstract: Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an in… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024

  24. arXiv:2401.12994  [pdf, other

    cs.CL

    Automated Scoring of Clinical Patient Notes using Advanced NLP and Pseudo Labeling

    Authors: Jingyu Xu, Yifeng Jiang, Bin Yuan, Shulin Li, Tianbo Song

    Abstract: Clinical patient notes are critical for documenting patient interactions, diagnoses, and treatment plans in medical practice. Ensuring accurate evaluation of these notes is essential for medical education and certification. However, manual evaluation is complex and time-consuming, often resulting in variability and resource-intensive assessments. To tackle these challenges, this research introduce… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  25. arXiv:2401.09699  [pdf, other

    cs.CL cs.AI

    Curriculum Recommendations Using Transformer Base Model with InfoNCE Loss And Language Switching Method

    Authors: Xiaonan Xu, Bin Yuan, Yongyao Mo, Tianbo Song, Shulin Li

    Abstract: The Curriculum Recommendations paradigm is dedicated to fostering learning equality within the ever-evolving realms of educational technology and curriculum development. In acknowledging the inherent obstacles posed by existing methodologies, such as content conflicts and disruptions from language translation, this paradigm aims to confront and overcome these challenges. Notably, it addresses cont… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 4pages, 2 figures, ICAICA2023

    MSC Class: 68T50

  26. arXiv:2312.17507  [pdf, other

    cs.RO

    Actuator-Constrained Reinforcement Learning for High-Speed Quadrupedal Locomotion

    Authors: Young-Ha Shin, Tae-Gyu Song, Gwanghyeon Ji, Hae-Won Park

    Abstract: This paper presents a method for achieving high-speed running of a quadruped robot by considering the actuator torque-speed operating region in reinforcement learning. The physical properties and constraints of the actuator are included in the training process to reduce state transitions that are infeasible in the real world due to motor torque-speed limitations. The gait reward is designed to dis… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  27. arXiv:2312.12484  [pdf, other

    cs.CR cs.DC cs.LG

    SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks

    Authors: Peishen Yan, Hao Wang, Tao Song, Yang Hua, Ruhui Ma, Ningxin Hu, Mohammad R. Haghighat, Haibing Guan

    Abstract: Federated Learning (FL) is becoming a popular paradigm for leveraging distributed data and preserving data privacy. However, due to the distributed characteristic, FL systems are vulnerable to Byzantine attacks that compromised clients attack the global model by uploading malicious model updates. With the development of layer-level and parameter-level fine-grained attacks, the attacks' stealthines… ▽ More

    Submitted 18 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by ECCV2024

  28. Two-Stage Adaptive Network for Semi-Supervised Cross-Domain Crater Detection under Varying Scenario Distributions

    Authors: Yifan Liu, Tiecheng Song, Chengye Xian, Ruiyuan Chen, Yi Zhao, Rui Li, Tan Guo

    Abstract: Crater detection can provide valuable information for humans to explore the topography and understand the history of extraterrestrial planets. Due to the significantly varying scenario distributions, existing detection models trained on known labelled crater datasets are hardly effective when applied to new unlabelled planets. To address this issue, we propose a two-stage adaptive network (TAN) fo… ▽ More

    Submitted 10 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Journal ref: Liu, Y.; Song, T.; Xian, C.; Chen, R.; Zhao, Y.; Li, R.; Guo, T. Two-Stage Adaptive Network for Semi-Supervised Cross-Domain Crater Detection under Varying Scenario Distributions. Remote Sens. 2024, 16, 2024

  29. arXiv:2312.04992  [pdf, ps, other

    cs.LG cs.DC

    PFLlib: Personalized Federated Learning Algorithm Library

    Authors: Jianqing Zhang, Yang Liu, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Jian Cao

    Abstract: Amid the ongoing advancements in Federated Learning (FL), a machine learning paradigm that allows collaborative learning with data privacy protection, personalized FL (pFL) has gained significant prominence as a research direction within the FL domain. Whereas traditional FL (tFL) focuses on jointly learning a global model, pFL aims to achieve a balance between the global and personalized objectiv… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  30. arXiv:2311.14975  [pdf, other

    cs.LG cs.DC

    Eliminating Domain Bias for Federated Learning in Representation Space

    Authors: Jianqing Zhang, Yang Hua, Jian Cao, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan

    Abstract: Recently, federated learning (FL) is popular for its privacy-preserving and collaborative learning abilities. However, under statistically heterogeneous scenarios, we observe that biased data domains on clients cause a representation bias phenomenon and further degenerate generic representations during local training, i.e., the representation degeneration phenomenon. To address these issues, we pr… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: Accepted by NeurIPS 2023, 24 pages

  31. LDConv: Linear deformable convolution for improving convolutional neural networks

    Authors: Xin Zhang, Yingze Song, Tingting Song, Degang Yang, Yichen Ye, Jie Zhou, Liming Zhang

    Abstract: Neural networks based on convolutional operations have achieved remarkable results in the field of deep learning, but there are two inherent flaws in standard convolutional operations. On the one hand, the convolution operation is confined to a local window, so it cannot capture information from other locations, and its sampled shapes is fixed. On the other hand, the size of the convolutional kern… ▽ More

    Submitted 22 July, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 14 pages, 9 figures

    Journal ref: Image and Vision Computing, 105190 (2024)

  32. arXiv:2311.08024  [pdf, other

    eess.IV cs.CV cs.LG

    MD-IQA: Learning Multi-scale Distributed Image Quality Assessment with Semi Supervised Learning for Low Dose CT

    Authors: Tao Song, Ruizhi Hou, Lisong Dai, Lei Xiang

    Abstract: Image quality assessment (IQA) plays a critical role in optimizing radiation dose and developing novel medical imaging techniques in computed tomography (CT). Traditional IQA methods relying on hand-crafted features have limitations in summarizing the subjective perceptual experience of image quality. Recent deep learning-based approaches have demonstrated strong modeling capabilities and potentia… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  33. arXiv:2311.05462  [pdf, other

    cs.CR eess.SY

    ChatGPT and Other Large Language Models for Cybersecurity of Smart Grid Applications

    Authors: Aydin Zaboli, Seong Lok Choi, Tai-Jin Song, Junho Hong

    Abstract: Cybersecurity breaches targeting electrical substations constitute a significant threat to the integrity of the power grid, necessitating comprehensive defense and mitigation strategies. Any anomaly in information and communication technology (ICT) should be detected for secure communications between devices in digital substations. This paper proposes large language models (LLM), e.g., ChatGPT, fo… ▽ More

    Submitted 25 February, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures, Accepted, 2024 IEEE Power & Energy Society General Meeting (PESGM), Seattle, WA, USA

  34. arXiv:2310.19202  [pdf

    q-bio.QM cs.LG eess.SP

    Improved Motor Imagery Classification Using Adaptive Spatial Filters Based on Particle Swarm Optimization Algorithm

    Authors: Xiong Xiong, Ying Wang, Tianyuan Song, Jinguo Huang, Guixia Kang

    Abstract: As a typical self-paced brain-computer interface (BCI) system, the motor imagery (MI) BCI has been widely applied in fields such as robot control, stroke rehabilitation, and assistance for patients with stroke or spinal cord injury. Many studies have focused on the traditional spatial filters obtained through the common spatial pattern (CSP) method. However, the CSP method can only obtain fixed sp… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: 25 pages, 8 figures

  35. arXiv:2310.16402  [pdf, other

    cs.CV cs.CL

    Video Referring Expression Comprehension via Transformer with Content-conditioned Query

    Authors: Ji Jiang, Meng Cao, Tengtao Song, Long Chen, Yi Wang, Yuexian Zou

    Abstract: Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language. Recent improvements in video REC have been made using Transformer-based methods with learnable queries. However, we contend that this naive query design is not ideal given the open-world nature of video REC brought by text supervision. With numerous potential semantic ca… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to ACM International Conference on Multimedia Workshop (ACM MM), 2023. arXiv admin note: substantial text overlap with arXiv:2210.02953

  36. arXiv:2310.16354  [pdf

    cs.AR

    RAMPART: RowHammer Mitigation and Repair for Server Memory Systems

    Authors: Steven C. Woo, Wendy Elsasser, Mike Hamburg, Eric Linstadt, Michael R. Miller, Taeksang Song, James Tringali

    Abstract: RowHammer attacks are a growing security and reliability concern for DRAMs and computer systems as they can induce many bit errors that overwhelm error detection and correction capabilities. System-level solutions are needed as process technology and circuit improvements alone are unlikely to provide complete protection against RowHammer attacks in the future. This paper introduces RAMPART, a nove… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 16 pages, 13 figures. A version of this paper will appear in the Proceedings of MEMSYS23

    ACM Class: B.3.1; B.3.4

  37. arXiv:2309.11715   

    cs.CV eess.IV

    Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal

    Authors: Xiao Feng Zhang, Tian Yi Song, Jia Wei Yao

    Abstract: Segment Anything (SAM), an advanced universal image segmentation model trained on an expansive visual dataset, has set a new benchmark in image segmentation and computer vision. However, it faced challenges when it came to distinguishing between shadows and their backgrounds. To address this, we developed Deshadow-Anything, considering the generalization of large-scale datasets, and we performed F… ▽ More

    Submitted 2 January, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: it needs revised

  38. arXiv:2309.09553  [pdf, other

    cs.CV cs.AI cs.MM

    Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis

    Authors: Tianyi Song, Jiuxin Cao, Kun Wang, Bo Liu, Xiaofeng Zhang

    Abstract: The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It conne… ▽ More

    Submitted 6 March, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by 2024 International Conference on Acoustics, Speech and Signal Processing(ICASSP 2024)

  39. arXiv:2308.10279  [pdf, other

    cs.LG cs.CR cs.DC

    GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning

    Authors: Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Jian Cao, Haibing Guan

    Abstract: Federated Learning (FL) is popular for its privacy-preserving and collaborative learning capabilities. Recently, personalized FL (pFL) has received attention for its ability to address statistical heterogeneity and achieve personalization in FL. However, from the perspective of feature extraction, most existing pFL methods only focus on extracting global or personalized feature information during… ▽ More

    Submitted 13 October, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  40. arXiv:2308.06945  [pdf, other

    cs.CV cs.LG

    Semantic-aware Network for Aerial-to-Ground Image Synthesis

    Authors: Jinhyun Jang, Taeyong Song, Kwanghoon Sohn

    Abstract: Aerial-to-ground image synthesis is an emerging and challenging problem that aims to synthesize a ground image from an aerial image. Due to the highly different layout and object representation between the aerial and ground images, existing approaches usually fail to transfer the components of the aerial scene into the ground scene. In this paper, we propose a novel framework to explore the challe… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: ICIP 2021. Code is available at https://github.com/jinhyunj/SANet

  41. arXiv:2307.11339  [pdf, other

    cs.DC

    Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs

    Authors: Zinuo Cai, Hao Wang, Tao Song, Yang Hua, Ruhui Ma, Haibing Guan

    Abstract: Deploying deep learning models in cloud clusters provides efficient and prompt inference services to accommodate the widespread application of deep learning. These clusters are usually equipped with host CPUs and accelerators with distinct responsibilities to handle serving requests, i.e. generalpurpose CPUs for input preprocessing and domain-specific GPUs for forward computation. Recurrent neural… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  42. FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy

    Authors: Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan

    Abstract: Recently, personalized federated learning (pFL) has attracted increasing attention in privacy protection, collaborative learning, and tackling statistical heterogeneity among clients, e.g., hospitals, mobile smartphones, etc. Most existing pFL methods focus on exploiting the global information and personalized information in the client-level model parameters while neglecting that data is the sourc… ▽ More

    Submitted 28 October, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

    Comments: Accepted by KDD 2023

  43. arXiv:2307.00969  [pdf, other

    cs.NI eess.SY

    High Altitude Platform Stations: the New Network Energy Efficiency Enabler in the 6G Era

    Authors: Tailai Song, David Lopez, Michela Meo, Nicola Piovesan, Daniela Renga

    Abstract: The rapidly evolving communication landscape, with the advent of 6G technology, brings new challenges to the design and operation of wireless networks. One of the key concerns is the energy efficiency of the Radio Access Network (RAN), as the exponential growth in wireless traffic demands increasingly higher energy consumption. In this paper, we assess the potential of integrating a High Altitude… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  44. arXiv:2306.08798  [pdf, other

    cs.CL stat.ML

    MPSA-DenseNet: A novel deep learning model for English accent classification

    Authors: Tianyu Song, Linh Thi Hoai Nguyen, Ton Viet Ta

    Abstract: This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  45. arXiv:2305.17860  [pdf, other

    cs.SD eess.AS

    speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

    Authors: Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang, Xiaobao Wang, Shiliang Zhang

    Abstract: In recent years, the joint training of speech enhancement front-end and automatic speech recognition (ASR) back-end has been widely used to improve the robustness of ASR systems. Traditional joint training methods only use enhanced speech as input for the backend. However, it is difficult for speech enhancement systems to directly separate speech from input due to the diverse types of noise with d… ▽ More

    Submitted 30 May, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  46. arXiv:2305.16789  [pdf, other

    cs.LG cs.CV eess.SP

    Modulate Your Spectrum in Self-Supervised Learning

    Authors: Xi Weng, Yunhao Ni, Tengwei Song, Jie Luo, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan, Lei Huang

    Abstract: Whitening loss offers a theoretical guarantee against feature collapse in self-supervised learning (SSL) with joint embedding architectures. Typically, it involves a hard whitening approach, transforming the embedding and applying loss to the whitened output. In this work, we introduce Spectral Transformation (ST), a framework to modulate the spectrum of embedding and to seek for functions beyond… ▽ More

    Submitted 21 January, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at ICLR 2024. The code is available at https://github.com/winci-ai/intl

  47. arXiv:2305.15896  [pdf, other

    cs.CV

    MixFormerV2: Efficient Fully Transformer Tracking

    Authors: Yutao Cui, Tianhui Song, Gangshan Wu, Limin Wang

    Abstract: Transformer-based trackers have achieved strong accuracy on the standard benchmarks. However, their efficiency remains an obstacle to practical deployment on both GPU and CPU platforms. In this paper, to overcome this issue, we propose a fully transformer tracking framework, coined as \emph{MixFormerV2}, without any dense convolutional operation and complex score prediction module. Our key design… ▽ More

    Submitted 7 February, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: NIPS2023

  48. arXiv:2305.07004  [pdf, other

    cs.CL

    Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

    Authors: Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei

    Abstract: Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages. In this work, we introduce a simple yet effective method, called cross-lingual-thought prompting (XLT), to systematically improve the multilingual capability of LLMs. Specifically, XLT is a generic template prompt that stimulates cross-lingual and logi… ▽ More

    Submitted 22 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted by EMNLP 2023 Findings

  49. arXiv:2304.08103  [pdf, other

    cs.CL cs.HC

    Low-code LLM: Graphical User Interface over Large Language Models

    Authors: Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

    Abstract: Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. Through visual interaction with a graphica… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: Accepted as a Demo Track paper at NAACL 2024

  50. arXiv:2304.03198  [pdf, ps, other

    cs.CV

    RFAConv: Innovating Spatial Attention and Standard Convolutional Operation

    Authors: Xin Zhang, Chen Liu, Degang Yang, Tingting Song, Yichen Ye, Ke Li, Yingze Song

    Abstract: Spatial attention has been widely used to improve the performance of convolutional neural networks. However, it has certain limitations. In this paper, we propose a new perspective on the effectiveness of spatial attention, which is that the spatial attention mechanism essentially solves the problem of convolutional kernel parameter sharing. However, the information contained in the attention map… ▽ More

    Submitted 28 March, 2024; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: 12 pages, 11figures