Skip to main content

Showing 1–50 of 231 results for author: Tang, L

Searching in archive cs. Search in all archives.
.
  1. PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

    Authors: Yuyang Ye, Lu-An Tang, Haoyu Wang, Runlong Yu, Wenchao Yu, Erhu He, Haifeng Chen, Hui Xiong

    Abstract: Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emission… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  2. arXiv:2407.08517  [pdf, other

    cs.CV

    Generalized Low-Rank Matrix Completion Model with Overlapping Group Error Representation

    Authors: Wenjing Lu, Zhuang Fang, Liang Wu, Liming Tang, Hanxin Liu

    Abstract: The low-rank matrix completion (LRMC) technology has achieved remarkable results in low-level visual tasks. There is an underlying assumption that the real-world matrix data is low-rank in LRMC. However, the real matrix data does not satisfy the strict low-rank property, which undoubtedly present serious challenges for the above-mentioned matrix recovery methods. Fortunately, there are feasible sc… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  3. arXiv:2407.08498  [pdf, other

    cs.CV eess.IV

    ERD: Exponential Retinex decomposition based on weak space and hybrid nonconvex regularization and its denoising application

    Authors: Wenjing Lu, Liang Wu, Liming Tang, Zhuang Fang

    Abstract: The Retinex theory models the image as a product of illumination and reflection components, which has received extensive attention and is widely used in image enhancement, segmentation and color restoration. However, it has been rarely used in additive noise removal due to the inclusion of both multiplication and addition operations in the Retinex noisy image modeling. In this paper, we propose an… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  4. arXiv:2407.05342  [pdf, other

    cs.CV

    Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models

    Authors: Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia

    Abstract: This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. However, this incurs a new problem: the knowledge encoded in the pre-trained VLM… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  5. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  6. Learning Retrieval Augmentation for Personalized Dialogue Generation

    Authors: Qiushi Huang, Shuai Fu, Xubo Liu, Wenwu Wang, Tom Ko, Yu Zhang, Lilian Tang

    Abstract: Personalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting in current personalized dialogue datasets, typically composed of merely four to five sentences, may not offer comprehensive descriptions of the perso… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP-2023

  7. arXiv:2406.18187  [pdf, other

    cs.CL cs.AI cs.LG

    Selective Prompting Tuning for Personalized Conversations with LLMs

    Authors: Qiushi Huang, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, Yu Zhang, Lilian Tang

    Abstract: In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 findings

  8. arXiv:2406.11160  [pdf, other

    cs.AI

    Context Graph

    Authors: Chengjin Xu, Muzhi Li, Cehao Yang, Xuhui Jiang, Lumingyuan Tang, Yiyan Qi, Jian Guo

    Abstract: Knowledge Graphs (KGs) are foundational structures in many AI applications, representing entities and their interrelations through triples. However, triple-based KGs lack the contextual information of relational knowledge, like temporal dynamics and provenance details, which are crucial for comprehensive knowledge representation and effective reasoning. Instead, \textbf{Context Graphs} (CGs) expan… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  9. arXiv:2406.11138  [pdf, other

    cs.CV cs.AI

    Diffusion Models in Low-Level Vision: A Survey

    Authors: Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, Xiu Li

    Abstract: Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compellin… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 20 pages, 23 figures, 4 tables

  10. arXiv:2406.07966  [pdf, other

    cs.CV

    Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

    Authors: Chengyu Fang, Chunming He, Fengyang Xiao, Yulun Zhang, Longxiang Tang, Yuelin Zhang, Kai Li, Xiu Li

    Abstract: Real-world Image Dehazing (RID) aims to alleviate haze-induced degradation in real-world settings. This task remains challenging due to the complexities in accurately modeling real haze distributions and the scarcity of paired real-world data. To address these challenges, we first introduce a cooperative unfolding network that jointly models atmospheric scattering and image scenes, effectively int… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures, 6 tables

  11. arXiv:2406.02495  [pdf, other

    cs.CV

    GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

    Authors: Rui Peng, Xiaodong Gu, Luyang Tang, Shihe Shen, Fanqi Yu, Ronggang Wang

    Abstract: Combining the signed distance function (SDF) and differentiable volume rendering has emerged as a powerful paradigm for surface reconstruction from multi-view images without 3D supervision. However, current methods are impeded by requiring long-time per-scene optimizations and cannot generalize to new scenes. In this paper, we present GenS, an end-to-end generalizable neural surface reconstruction… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2023 Accepted

  12. arXiv:2406.01069  [pdf, other

    cs.CV

    UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

    Authors: Hantao Zhou, Longxiang Tang, Rui Yang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li

    Abstract: Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Existing methods typically address these tasks independently due to distinct learning objectives. However, they neglect the underlying interconnectedness of both tasks, which hinders the learning of task-agnostic shared representations for hu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  13. arXiv:2405.20560  [pdf, other

    cs.DC

    Collaborative Resource Management and Workloads Scheduling in Cloud-Assisted Mobile Edge Computing across Timescales

    Authors: Lujie Tang, Minxian Xu, Chengzhong Xu, Kejiang Ye

    Abstract: Due to the limited resource capacity of edge servers and the high purchase costs of edge resources, service providers are facing the new challenge of how to take full advantage of the constrained edge resources for Internet of Things (IoT) service hosting and task scheduling to maximize system performance. In this paper, we study the joint optimization problem on service placement, resource provis… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 11 pages, 10 figures

    Journal ref: IEEE ICWS 2024

  14. arXiv:2405.14480  [pdf, other

    cs.CV

    Scalable Visual State Space Model with Fractal Scanning

    Authors: Lv Tang, HaoKe Xiao, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li

    Abstract: Foundational models have significantly advanced in natural language processing (NLP) and computer vision (CV), with the Transformer architecture becoming a standard backbone. However, the Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images. To address this challenge, State Space Models (SSMs) like Mamba have emerged as efficient alternativ… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: This paper is working in progress

  15. arXiv:2405.08965  [pdf, other

    cs.PL cs.AI

    LLMs are Meaning-Typed Code Constructs

    Authors: Jason Mars, Yiping Kang, Jayanaka Dantanarayana, Chandra Irugalbandara, Kugesan Sivasothynathan, Lingjia Tang

    Abstract: Programming with Generative AI (GenAI) models is a type of Neurosymbolic programming and has seen tremendous adoption across many domains. However, leveraging GenAI models in code today can be complex, counter-intuitive and often require specialized frameworks, leading to increased complexity. This is because it is currently unclear as to the right abstractions through which we should marry GenAI… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  16. arXiv:2405.00256  [pdf, other

    cs.CV

    ASAM: Boosting Segment Anything Model with Adversarial Tuning

    Authors: Bo Li, Haoke Xiao, Lv Tang

    Abstract: In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This paper is accepted by CVPR2024

  17. arXiv:2404.14693  [pdf, other

    cs.CR cs.CV eess.IV

    Double Privacy Guard: Robust Traceable Adversarial Watermarking against Face Recognition

    Authors: Yunming Zhang, Dengpan Ye, Sipeng Shen, Caiyun Xie, Ziyi Liu, Jiacheng Deng, Long Tang

    Abstract: The wide deployment of Face Recognition (FR) systems poses risks of privacy leakage. One countermeasure to address this issue is adversarial attacks, which deceive malicious FR searches but simultaneously interfere the normal identity verification of trusted authorizers. In this paper, we propose the first Double Privacy Guard (DPG) scheme based on traceable adversarial watermarking. DPG employs a… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  18. arXiv:2404.12241  [pdf, other

    cs.CL cs.AI

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

    Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  19. arXiv:2404.10774  [pdf, other

    cs.CL cs.AI

    MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

    Authors: Liyan Tang, Philippe Laban, Greg Durrett

    Abstract: Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of "fact-checking" are based on verifying each piece of a model generation against potential evidence using an LLM. However, this process can be very computationally expensive, requiring many call… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: LLM-AggreFact benchmark, MiniCheck models, data generation code at https://github.com/Liyan06/MiniCheck

  20. arXiv:2403.18760  [pdf, other

    cs.RO

    MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

    Authors: Yike Wu, Jiatao Zhang, Nan Hu, LanLing Tang, Guilin Qi, Jun Shao, Jie Ren, Wei Song

    Abstract: In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models' planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which… ▽ More

    Submitted 1 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  21. arXiv:2403.16387  [pdf, other

    cs.CV

    Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

    Authors: Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma

    Abstract: Image fusion aims to combine information from different source images to create a comprehensively representative image. Existing fusion methods are typically helpless in dealing with degradations in low-quality source images and non-interactive to multiple subjective and objective needs. To solve them, we introduce a novel approach that leverages semantic text guidance image fusion model for degra… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  22. arXiv:2403.14974  [pdf, other

    cs.CV

    AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies

    Authors: Rui Wang, Dengpan Ye, Long Tang, Yunming Zhang, Jiacheng Deng

    Abstract: With the continuous improvements of deepfake methods, forgery messages have transitioned from single-modality to multi-modal fusion, posing new challenges for existing forgery detection algorithms. In this paper, we propose AVT2-DWF, the Audio-Visual dual Transformers grounded in Dynamic Weight Fusion, which aims to amplify both intra- and cross-modal forgery cues, thereby enhancing detection capa… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  23. arXiv:2403.11448  [pdf, other

    cs.CV

    Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

    Authors: Linyu Tang, Lei Zhang

    Abstract: Numerous studies have demonstrated the susceptibility of deep neural networks (DNNs) to subtle adversarial perturbations, prompting the development of many advanced adversarial defense methods aimed at mitigating adversarial attacks. Current defense strategies usually train DNNs for a specific adversarial attack method and can achieve good robustness in defense against this type of adversarial att… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  24. arXiv:2402.19009  [pdf, other

    cs.LG cs.AI

    Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

    Authors: Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Zichao Yang, Eric P. Xing, Zhiting Hu

    Abstract: The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images. Existing model families, like variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, and… ▽ More

    Submitted 5 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: ICML 2024 camera-ready. Code is available at https://github.com/guangyliu/EDDPM

  25. arXiv:2402.15537  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Evaluating the Performance of ChatGPT for Spam Email Detection

    Authors: Shijing Si, Yuwei Wu, Le Tang, Yugui Zhang, Jedrek Wosik

    Abstract: Email continues to be a pivotal and extensively utilized communication medium within professional and commercial domains. Nonetheless, the prevalence of spam emails poses a significant challenge for users, disrupting their daily routines and diminishing productivity. Consequently, accurately identifying and filtering spam based on content has become crucial for cybersecurity. Recent advancements i… ▽ More

    Submitted 19 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 12 pages, 4 figures

  26. Sketching AI Concepts with Capabilities and Examples: AI Innovation in the Intensive Care Unit

    Authors: Nur Yildirim, Susanna Zlotnikov, Deniz Sayar, Jeremy M. Kahn, Leigh A. Bukowski, Sher Shah Amin, Kathryn A. Riman, Billie S. Davis, John S. Minturn, Andrew J. King, Dan Ricketts, Lu Tang, Venkatesh Sivaraman, Adam Perer, Sarah M. Preum, James McCann, John Zimmerman

    Abstract: Advances in artificial intelligence (AI) have enabled unprecedented capabilities, yet innovation teams struggle when envisioning AI concepts. Data science teams think of innovations users do not want, while domain experts think of innovations that cannot be built. A lack of effective ideation seems to be a breakdown point. How might multidisciplinary teams identify buildable and desirable use case… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: to appear at CHI 2024

  27. arXiv:2402.13249  [pdf, other

    cs.CL cs.AI

    TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

    Authors: Liyan Tang, Igor Shalyminov, Amy Wing-mei Wong, Jon Burnsky, Jake W. Vincent, Yu'an Yang, Siffi Singh, Song Feng, Hwanjun Song, Hang Su, Lijia Sun, Yi Zhang, Saab Mansour, Kathleen McKeown

    Abstract: Single document news summarization has seen substantial progress on faithfulness in recent years, driven by research on the evaluation of factual consistency, or hallucinations. We ask whether these advances carry over to other text summarization domains. We propose a new evaluation benchmark on topic-focused dialogue summarization, generated by LLMs of varying sizes. We provide binary sentence-le… ▽ More

    Submitted 31 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: NAACL 2024; Linguistic annotations available at https://github.com/amazon-science/tofueval

  28. arXiv:2402.03009  [pdf, other

    cs.CL cs.AI

    UniMem: Towards a Unified View of Long-Context Large Language Models

    Authors: Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce U… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  29. arXiv:2402.02164  [pdf

    cs.AI q-bio.BM

    TSIS with A Comparative Study on Linear Molecular Representation

    Authors: Juan-Ni Wu, Tong Wang, Li-Juan Tang, Hai-Long Wu, Ru-Qin Yu

    Abstract: Encoding is the carrier of information. AI models possess basic capabilities in syntax, semantics, and reasoning, but these capabilities are sensitive to specific inputs. In this study, we introduce an encoding algorithm, TSIS (Simplified TSID), to the t-SMILES family as a fragment-based linear molecular representation. TSID has been demonstrated to significantly outperform classical SMILES, DeepS… ▽ More

    Submitted 26 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  30. arXiv:2402.01467  [pdf, other

    eess.SY cs.AI cs.CE cs.NE q-bio.NC

    Brain-Like Replay Naturally Emerges in Reinforcement Learning Agents

    Authors: Jiyi Wang, Likai Tang, Huimiao Chen, Sen Song

    Abstract: Can replay, as a widely observed neural activity pattern in brain regions, particularly in the hippocampus and neocortex, emerge in an artificial agent? If yes, does it contribute to the tasks? In this work, without heavy dependence on complex assumptions, we discover naturally emergent replay under task-optimized paradigm using a recurrent neural network-based reinforcement learning model, which… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  31. arXiv:2401.08627  [pdf, other

    cond-mat.dis-nn cond-mat.mtrl-sci cs.LG

    Predicting and Interpreting Energy Barriers of Metallic Glasses with Graph Neural Networks

    Authors: Haoyu Li, Shichang Zhang, Longwen Tang, Mathieu Bauchy, Yizhou Sun

    Abstract: Metallic Glasses (MGs) are widely used materials that are stronger than steel while being shapeable as plastic. While understanding the structure-property relationship of MGs remains a challenge in materials science, studying their energy barriers (EBs) as an intermediary step shows promise. In this work, we utilize Graph Neural Networks (GNNs) to model MGs and study EBs. We contribute a new datas… ▽ More

    Submitted 20 June, 2024; v1 submitted 7 December, 2023; originally announced January 2024.

    Comments: Accepted at ICML 2024

  32. arXiv:2401.07123  [pdf, other

    cs.HC cs.CL

    One Agent Too Many: User Perspectives on Approaches to Multi-agent Conversational AI

    Authors: Christopher Clarke, Karthik Krishnamurthy, Walter Talamonti, Yiping Kang, Lingjia Tang, Jason Mars

    Abstract: Conversational agents have been gaining increasing popularity in recent years. Influenced by the widespread adoption of task-oriented agents such as Apple Siri and Amazon Alexa, these agents are being deployed into various applications to enhance user experience. Although these agents promote "ask me anything" functionality, they are typically built to focus on a single or finite set of expertise.… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  33. arXiv:2401.05794  [pdf, ps, other

    cs.LG cs.DM math.CO

    Bounds on the price of feedback for mistake-bounded online learning

    Authors: Jesse Geneson, Linus Tang

    Abstract: We improve several worst-case bounds for various online learning scenarios from (Auer and Long, Machine Learning, 1999). In particular, we sharpen an upper bound for delayed ambiguous reinforcement learning by a factor of 2 and an upper bound for learning compositions of families of functions by a factor of 2.41. We also improve a lower bound from the same paper for learning compositions of $k$ fa… ▽ More

    Submitted 17 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  34. arXiv:2401.05707  [pdf, other

    cs.CL

    CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer

    Authors: Zhen Tao, Dinghao Xi, Zhiyu Li, Liumin Tang, Wei Xu

    Abstract: Text style transfer is increasingly prominent in online entertainment and social media. However, existing research mainly concentrates on style transfer within individual English sentences, while ignoring the complexity of long Chinese texts, which limits the wider applicability of style transfer in digital media realm. To bridge this gap, we propose a Chinese Article-style Transfer framework (CAT… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 9 pages

  35. arXiv:2401.01842  [pdf, ps, other

    cs.LG

    Wasserstein Nonnegative Tensor Factorization with Manifold Regularization

    Authors: Jianyu Wang, Linruize Tang

    Abstract: Nonnegative tensor factorization (NTF) has become an important tool for feature extraction and part-based representation with preserved intrinsic structure information from nonnegative high-order data. However, the original NTF methods utilize Euclidean or Kullback-Leibler divergence as the loss function which treats each feature equally leading to the neglect of the side-information of features.… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  36. arXiv:2312.15304  [pdf, other

    cs.CL cs.AI

    Exploring the Capabilities of ChatGPT in Ancient Chinese Translation and Person Name Recognition

    Authors: Shijing Si, Siqing Zhou, Le Tang, Xiaoqing Cheng, Yugui Zhang

    Abstract: ChatGPT's proficiency in handling modern standard languages suggests potential for its use in understanding ancient Chinese. This paper explores ChatGPT's capabilities on ancient Chinese via two tasks: translating ancient Chinese to modern Chinese and recognizing ancient Chinese names. A comparison of ChatGPT's output with human translations serves to evaluate its comprehension of ancient Chinese.… ▽ More

    Submitted 23 February, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: Technical report

  37. arXiv:2312.14972  [pdf, other

    cs.SE cs.AI cs.LG

    Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production

    Authors: Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Jayanaka Dantanarayana, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars

    Abstract: Many companies use large language models (LLMs) offered as a service, like OpenAI's GPT-4, to create AI-enabled product experiences. Along with the benefits of ease-of-use and shortened time-to-solution, this reliance on proprietary services has downsides in model control, performance reliability, uptime predictability, and cost. At the same time, a flurry of open-source small language models (SLM… ▽ More

    Submitted 16 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Updated title, Revised content

    Journal ref: ISPASS-2024: 2024 IEEE International Symposium on Performance Analysis of Systems and Software

  38. arXiv:2312.09128  [pdf, other

    cs.CV

    Tokenize Anything via Prompting

    Authors: Ting Pan, Lulu Tang, Xinlong Wang, Shiguang Shan

    Abstract: We present a unified, promptable model capable of simultaneously segmenting, recognizing, and captioning anything. Unlike SAM, we aim to build a versatile region representation in the wild via visual prompting. To achieve this, we train a generalizable model with massive segmentation masks, \eg, SA-1B masks, and semantic priors from a pre-trained CLIP model with 5 billion parameters. Specifically,… ▽ More

    Submitted 17 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: code, model, and demo: https://github.com/baaivision/tokenize-anything

  39. arXiv:2312.06550  [pdf, other

    cs.CL cs.AI cs.LG

    LLM360: Towards Fully Transparent Open-Source LLMs

    Authors: Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Liping Tang, Nikhil Ranjan, Zhiqiang Shen, Xuguang Ren, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze , et al. (3 additional authors not shown)

    Abstract: The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder prog… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  40. arXiv:2311.12582  [pdf, other

    eess.IV cs.AI cs.CV

    Echocardiogram Foundation Model -- Application 1: Estimating Ejection Fraction

    Authors: Adil Dahlan, Cyril Zakka, Abhinav Kumar, Laura Tang, Rohan Shad, Robyn Fong, William Hiesinger

    Abstract: Cardiovascular diseases stand as the primary global cause of mortality. Among the various imaging techniques available for visualising the heart and evaluating its function, echocardiograms emerge as the preferred choice due to their safety and low cost. Quantifying cardiac function based on echocardiograms is very laborious, time-consuming and subject to high interoperator variability. In this wo… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  41. arXiv:2311.11638  [pdf, other

    cs.CV

    Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model

    Authors: Chunming He, Chengyu Fang, Yulun Zhang, Tian Ye, Kai Li, Longxiang Tang, Zhenhua Guo, Xiu Li, Sina Farsiu

    Abstract: Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. To ta… ▽ More

    Submitted 9 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 20 pages, 11 figures, 11 tables

  42. arXiv:2311.11273  [pdf, other

    cs.CV

    Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens

    Authors: Lv Tang, Peng-Tao Jiang, Zhihao Shen, Hao Zhang, Jinwei Chen, Bo Li

    Abstract: Large Vision-Language Model (LVLM) has seen burgeoning development and increasing attention recently. In this paper, we propose a novel framework, camo-perceptive vision-language framework (CPVLF), to explore whether LVLM can generalize to the challenging camouflaged object detection (COD) scenario in a training-free manner. During the process of generalization, we find that due to hallucination i… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  43. arXiv:2311.08086  [pdf

    cs.AI

    CPSOR-GCN: A Vehicle Trajectory Prediction Method Powered by Emotion and Cognitive Theory

    Authors: L. Tang, Y. Li, J. Yuan, A. Fu, J. Sun

    Abstract: Active safety systems on vehicles often face problems with false alarms. Most active safety systems predict the driver's trajectory with the assumption that the driver is always in a normal emotion, and then infer risks. However, the driver's trajectory uncertainty increases under abnormal emotions. This paper proposes a new trajectory prediction model: CPSOR-GCN, which predicts vehicle trajectori… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 15 pages, 31 figures, submitted to IEEE Transactions on Intelligent Vehicles

  44. arXiv:2311.05152  [pdf, other

    cs.LG cs.AI cs.CV cs.MM

    Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks

    Authors: Haoyi Duan, Yan Xia, Mingze Zhou, Li Tang, Jieming Zhu, Zhou Zhao

    Abstract: In recent years, the deployment of large-scale pre-trained models in audio-visual downstream tasks has yielded remarkable outcomes. However, these models, primarily trained on single-modality unconstrained datasets, still encounter challenges in feature extraction for multi-modal tasks, leading to suboptimal performance. This limitation arises due to the introduction of irrelevant modality-specifi… ▽ More

    Submitted 20 December, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023

  45. arXiv:2311.00559  [pdf, ps, other

    cs.LG

    Learning to optimize by multi-gradient for multi-objective optimization

    Authors: Linxi Yang, Xinmin Yang, Liping Tang

    Abstract: The development of artificial intelligence (AI) for science has led to the emergence of learning-based research paradigms, necessitating a compelling reevaluation of the design of multi-objective optimization (MOO) methods. The new generation MOO methods should be rooted in automated learning rather than manual design. In this paper, we introduce a new automatic learning paradigm for optimizing MO… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  46. arXiv:2310.16540  [pdf, other

    cs.CV

    Dual Defense: Adversarial, Traceable, and Invisible Robust Watermarking against Face Swapping

    Authors: Yunming Zhang, Dengpan Ye, Caiyun Xie, Long Tang, Chuanxi Chen, Ziyi Liu, Jiacheng Deng

    Abstract: The malicious applications of deep forgery, represented by face swapping, have introduced security threats such as misinformation dissemination and identity fraud. While some research has proposed the use of robust watermarking methods to trace the copyright of facial images for post-event traceability, these methods cannot effectively prevent the generation of forgeries at the source and curb the… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  47. arXiv:2310.13574  [pdf, other

    eess.IV cs.CV cs.LG

    Progressive Dual Priori Network for Generalized Breast Tumor Segmentation

    Authors: Li Wang, Lihui Wang, Zixiang Kuai, Lei Tang, Yingfeng Ou, Chen Ye, Yuemin Zhu

    Abstract: To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regi… ▽ More

    Submitted 16 June, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 14 pages, 12 figures

    Journal ref: IEEE Journal of Biomedical and Health Informatics, 2024

  48. arXiv:2310.10912  [pdf, other

    cs.CV

    Towards Training-free Open-world Segmentation via Image Prompt Foundation Models

    Authors: Lv Tang, Peng-Tao Jiang, Hao-Ke Xiao, Bo Li

    Abstract: The realm of computer vision has witnessed a paradigm shift with the advent of foundational models, mirroring the transformative influence of large language models in the domain of natural language processing. This paper delves into the exploration of open-world segmentation, presenting a novel approach called Image Prompt Segmentation (IPSeg) that harnesses the power of vision foundational models… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: This paper is accepted by IJCV2024

  49. arXiv:2310.04835  [pdf, other

    cs.AI

    On the Evolution of Knowledge Graphs: A Survey and Perspective

    Authors: Xuhui Jiang, Chengjin Xu, Yinghan Shen, Xun Sun, Lumingyuan Tang, Saizhuo Wang, Zhongwu Chen, Yuanzhuo Wang, Jian Guo

    Abstract: Knowledge graphs (KGs) are structured representations of diversified knowledge. They are widely used in various intelligent applications. In this article, we provide a comprehensive survey on the evolution of various types of knowledge graphs (i.e., static KGs, dynamic KGs, temporal KGs, and event KGs) and techniques for knowledge extraction and reasoning. Furthermore, we introduce the practical a… ▽ More

    Submitted 10 October, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

  50. arXiv:2310.04724  [pdf, other

    cs.CV cs.LG

    Activate and Reject: Towards Safe Domain Generalization under Category Shift

    Authors: Chaoqi Chen, Luyao Tang, Leitian Tao, Hong-Yu Zhou, Yue Huang, Xiaoguang Han, Yizhou Yu

    Abstract: Albeit the notable performance on in-domain test points, it is non-trivial for deep neural networks to attain satisfactory accuracy when deploying in the open world, where novel domains and object classes often occur. In this paper, we study a practical problem of Domain Generalization under Category Shift (DGCS), which aims to simultaneously detect unknown-class samples and classify known-class s… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: ICCV 2023