Skip to main content

Showing 1–50 of 355 results for author: Ye, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11638  [pdf, other

    cs.CL cs.IR

    A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

    Authors: He Chang, Chenchen Ye, Zhulin Tao, Jie Wu, Zhengmao Yang, Yunshan Ma, Xianglin Huang, Tat-Seng Chua

    Abstract: Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluat… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11555  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Guided Generation of Minority Samples Using Diffusion Models

    Authors: Soobin Um, Jong Chul Ye

    Abstract: We present a novel approach for generating minority samples that live on low-density regions of a data manifold. Our framework is built upon diffusion models, leveraging the principle of guided sampling that incorporates an arbitrary energy-based guidance during inference time. The key defining feature of our sampler lies in its \emph{self-contained} nature, \ie, implementable solely with a pretra… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  3. arXiv:2407.11435  [pdf, other

    q-bio.GN cs.LG stat.ML

    Genomic Language Models: Opportunities and Challenges

    Authors: Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song

    Abstract: Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to signif… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Review article; 25 pages, 3 figures, 1 table

    MSC Class: 92-08; 92B20; 68T50; 68T07

  4. arXiv:2407.11244  [pdf, other

    cs.LG

    (Deep) Generative Geodesics

    Authors: Beomsu Kim, Michael Puthawala, Jong Chul Ye, Emanuele Sansone

    Abstract: In this work, we propose to study the global geometrical properties of generative models. We introduce a new Riemannian metric to assess the similarity between any two data points. Importantly, our metric is agnostic to the parametrization of the generative model and requires only the evaluation of its data likelihood. Moreover, the metric leads to the conceptual definition of generative distances… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 10 pages, 9 figures

  5. arXiv:2407.10641  [pdf, other

    cs.CV cs.LG

    Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems

    Authors: Hyungjin Chung, Jong Chul Ye

    Abstract: Recent inverse problem solvers that leverage generative diffusion priors have garnered significant attention due to their exceptional quality. However, adaptation of the prior is necessary when there exists a discrepancy between the training and testing distributions. In this work, we propose deep diffusion image prior (DDIP), which generalizes the recent adaptation method of SCD by introducing a… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, 25 pages, 8 figures

  6. arXiv:2407.09651  [pdf, other

    cs.DS cs.CC

    Fine-Grained Optimality of Partially Dynamic Shortest Paths and More

    Authors: Barna Saha, Virginia Vassilevska Williams, Yinzhan Xu, Christopher Ye

    Abstract: Single Source Shortest Paths ($\textrm{SSSP}$) is among the most well-studied problems in computer science. In the incremental (resp. decremental) setting, the goal is to maintain distances from a fixed source in a graph undergoing edge insertions (resp. deletions). A long line of research culminated in a near-optimal deterministic $(1 + \varepsilon)$-approximate data structure with… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 54 pages, 4 figures, abstract shortened to meet arXiv requirements

    ACM Class: F.2.2; F.1.3

  7. arXiv:2407.01231  [pdf, other

    cs.CL cs.AI

    MIRAI: Evaluating LLM Agents for Event Forecasting

    Authors: Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang

    Abstract: Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 66 pages, 8 figures, 6 tables; Website: https://mirai-llm.github.io/

  8. arXiv:2406.16864  [pdf, other

    cs.CV cs.AI cs.GR

    StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

    Authors: Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han

    Abstract: This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the e… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: HF Demo: hf.co/Stable-X, Video: https://www.youtube.com/watch?v=sylXTxG_U2U

  9. arXiv:2406.16074  [pdf, other

    eess.IV cs.CV

    CAVM: Conditional Autoregressive Vision Model for Contrast-Enhanced Brain Tumor MRI Synthesis

    Authors: Lujun Gui, Chuyang Ye, Tianyi Yan

    Abstract: Contrast-enhanced magnetic resonance imaging (MRI) is pivotal in the pipeline of brain tumor segmentation and analysis. Gadolinium-based contrast agents, as the most commonly used contrast agents, are expensive and may have potential side effects, and it is desired to obtain contrast-enhanced brain tumor MRI scans without the actual use of contrast agents. Deep learning methods have been applied t… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: The work has been accepted by MICCAI 2024

  10. arXiv:2406.15804  [pdf, other

    cs.DC

    Split Federated Learning Empowered Vehicular Edge Intelligence: Adaptive Parellel Design and Future Directions

    Authors: Xianke Qiang, Zheng Chang, Chaoxiong Ye, Timo Hamalainen, Geyong Min

    Abstract: To realize ubiquitous intelligence of future vehicular networks, artificial intelligence (AI) is critical since it can mine knowledge from vehicular data to improve the quality of many AI driven vehicular services. By combining AI techniques with vehicular networks, Vehicular Edge Intelligence (VEI) can utilize the computing, storage, and communication resources of vehicles to train the AI models.… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  11. arXiv:2406.09923  [pdf, other

    cs.CL cs.AI cs.LG

    CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

    Authors: Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S Chang, Wei Wang

    Abstract: The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care. While LLMs have shown some promise in the medical domain, their application in clinical diagnosis remains underexplored, especially in real-world clinical practice, where highly sophis… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://clibench.github.io

  12. arXiv:2406.08070  [pdf, ps, other

    cs.CV cs.AI cs.LG

    CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

    Authors: Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye

    Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.05413  [pdf, other

    cs.LG cs.AI cs.CV cs.MM

    Discover Your Neighbors: Advanced Stable Test-Time Adaptation in Dynamic World

    Authors: Qinting Jiang, Chuyang Ye, Dongyan Wei, Yuan Xue, Jingyan Jiang, Zhi Wang

    Abstract: Despite progress, deep neural networks still suffer performance declines under distribution shifts between training and test domains, leading to a substantial decrease in Quality of Experience (QoE) for multimedia applications. Existing test-time adaptation (TTA) methods are challenged by dynamic, multiple test distributions within batches. This work provides a new perspective on analyzing batch n… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages

  14. arXiv:2406.02642  [pdf, other

    cs.LG cs.AI

    E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory

    Authors: Zhou Yang, Zhaochun Ren, Chenglong Ye, Yufeng Wang, Haizhou Sun, Chao Chen, Xiaofei Zhu, Yunbing Wu, Xiangwen Liao

    Abstract: In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performanc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures, 5 tables

  15. arXiv:2406.02628  [pdf, ps, other

    stat.ML cs.CC cs.DS cs.LG

    Replicability in High Dimensional Statistics

    Authors: Max Hopkins, Russell Impagliazzo, Daniel Kane, Sihan Liu, Christopher Ye

    Abstract: The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for $1$-dimensional tasks including statistical queries. In this work, we study the comput… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 119 pages

    ACM Class: F.2.0

  16. arXiv:2406.02472  [pdf, other

    cs.CL

    Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding

    Authors: Zhihan Zhang, Yixin Cao, Chenchen Ye, Yunshan Ma, Lizi Liao, Tat-Seng Chua

    Abstract: The digital landscape is rapidly evolving with an ever-increasing volume of online news, emphasizing the need for swift and precise analysis of complex events. We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE). This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event c… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  17. arXiv:2406.02100  [pdf, other

    cs.CL

    Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data

    Authors: Haolong Li, Yu Ma, Yinqi Zhang, Chen Ye, Jie Chen

    Abstract: Large Language Models (LLMs) have shown excellent performance in language understanding, text generation, code synthesis, and many other tasks, while they still struggle in complex multi-step reasoning problems, such as mathematical reasoning. In this paper, through a newly proposed arithmetical puzzle problem, we show that the model can perform well on multi-step reasoning tasks via fine-tuning o… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accept by Findings of ACL 2024

  18. arXiv:2405.20389  [pdf, other

    astro-ph.IM cs.AI cs.HC cs.IR

    Designing an Evaluation Framework for Large Language Models in Astronomy Research

    Authors: John F. Wu, Alina Hyk, Kiera McCormick, Christine Ye, Simone Astarita, Elina Baral, Jo Ciuca, Jesse Cranney, Anjalie Field, Kartheik Iyer, Philipp Koehn, Jenn Kotler, Sandor Kruk, Michelle Ntampaka, Charles O'Neill, Joshua E. G. Peek, Sanjib Sharma, Mikaeel Yunus

    Abstract: Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currently no standard for evaluating the use of LLMs in astronomy. Therefore, we present the experimental design for an evaluation study on how astronomy rese… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 7 pages, 3 figures. Code available at https://github.com/jsalt2024-evaluating-llms-for-astronomy/astro-arxiv-bot

  19. arXiv:2405.17829  [pdf, other

    cs.LG cs.AI

    LDMol: Text-Conditioned Molecule Diffusion Model Leveraging Chemically Informative Latent Space

    Authors: Jinho Chang, Jong Chul Ye

    Abstract: With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques using conditional diffusion models. However, due to the fundamental nature of a molecule, which carries highly entangled correlations within a small number of atoms and bonds, it becomes difficult for a model to connect raw data with the conditions when the co… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  20. arXiv:2405.17720  [pdf, other

    cs.CV cs.AI cs.LG

    MindFormer: A Transformer Architecture for Multi-Subject Brain Decoding via fMRI

    Authors: Inhwa Han, Jaayeon Lee, Jong Chul Ye

    Abstract: Research efforts to understand neural signals have been ongoing for many years, with visual decoding from fMRI signals attracting considerable attention. Particularly, the advent of image diffusion models has advanced the reconstruction of images from fMRI data significantly. However, existing approaches often introduce inter- and intra- subject variations in the reconstructed images, which can co… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  21. arXiv:2405.16823  [pdf, other

    cs.CV cs.AI

    Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection

    Authors: Gihyun Kwon, Jangho Park, Jong Chul Ye

    Abstract: While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing with self attention injection and video editing with shared attention, we propose a novel unified editing framework that combines the strengths of both approache… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://unifyediting.github.io/

  22. arXiv:2405.11191  [pdf, other

    cs.DB cs.LG

    Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines

    Authors: Chaokun Chang, Eric Lo, Chunxiao Ye

    Abstract: Machine learning inference pipelines commonly encountered in data science and industries often require real-time responsiveness due to their user-facing nature. However, meeting this requirement becomes particularly challenging when certain input features require aggregating a large volume of data online. Recent literature on interpretable machine learning reveals that most machine learning models… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  23. arXiv:2405.10246  [pdf, other

    eess.IV cs.CV

    A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts

    Authors: Xinru Zhang, Ni Ou, Berke Doga Basaran, Marco Visentin, Mengyun Qiao, Renyang Gu, Cheng Ouyang, Yaou Liu, Paul M. Matthew, Chuyang Ye, Wenjia Bai

    Abstract: Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: The work has been early accepted by MICCAI 2024

  24. arXiv:2405.08036  [pdf, other

    cs.LG cs.AI

    POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

    Authors: Chang Huang, Junqiao Zhao, Shatong Zhu, Hongtu Zhou, Chen Ye, Tiantian Feng, Changjun Jiang

    Abstract: Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting… ▽ More

    Submitted 15 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: change reference format

  25. arXiv:2405.01316  [pdf, other

    cs.RO

    LOG-LIO2: A LiDAR-Inertial Odometry with Efficient Uncertainty Analysis

    Authors: Kai Huang, Junqiao Zhao, Jiaye Lin, Zhongyang Zhu, Shuangfu Song, Chen Ye, Tiantian Feng

    Abstract: Uncertainty in LiDAR measurements, stemming from factors such as range sensing, is crucial for LIO (LiDAR-Inertial Odometry) systems as it affects the accurate weighting in the loss function. While recent LIO systems address uncertainty related to range sensing, the impact of incident angle on uncertainty is often overlooked by the community. Moreover, the existing uncertainty propagation methods… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  26. arXiv:2404.10518  [pdf, other

    cs.CV

    MobileNetV4 -- Universal Models for the Mobile Ecosystem

    Authors: Danfeng Qin, Chas Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, Andrew Howard

    Abstract: We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB,… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  27. arXiv:2404.04517  [pdf, other

    cs.CV cs.AI

    Latent-based Diffusion Model for Long-tailed Recognition

    Authors: Pengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li

    Abstract: Long-tailed imbalance distribution is a common issue in practical computer vision applications. Previous works proposed methods to address this problem, which can be categorized into several classes: re-sampling, re-weighting, transfer learning, and feature augmentation. In recent years, diffusion models have shown an impressive generation ability in many sub-problems of deep computer vision. Howe… ▽ More

    Submitted 23 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures. Accepted by L3DIVU-CVPR2024

  28. arXiv:2404.03913  [pdf, other

    cs.CV cs.AI cs.LG

    Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

    Authors: Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

    Abstract: While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with t… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  29. arXiv:2404.02928  [pdf, other

    cs.CR cs.AI

    Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models

    Authors: Jiachen Ma, Anda Cao, Zhiqing Xiao, Jie Zhang, Chao Ye, Junbo Zhao

    Abstract: Text-to-Image (T2I) models have received widespread attention due to their remarkable generation capabilities. However, concerns have been raised about the ethical implications of the models in generating Not Safe for Work (NSFW) images because NSFW images may cause discomfort to people or be used for illegal purposes. To mitigate the generation of such images, T2I models deploy various types of s… ▽ More

    Submitted 2 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  30. arXiv:2403.19632  [pdf, other

    cs.CV

    GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond

    Authors: Chongjie Ye, Yinyu Nie, Jiahao Chang, Yuantao Chen, Yihao Zhi, Xiaoguang Han

    Abstract: We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdo… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/GAP-LAB-CUHK-SZ/gaustudio

  31. arXiv:2403.15249  [pdf, other

    cs.CV cs.AI cs.LG

    Spectral Motion Alignment for Video Motion Transfer using Diffusion Models

    Authors: Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee, Jong Chul Ye

    Abstract: The evolution of diffusion models has greatly impacted video generation and understanding. Particularly, text-to-video diffusion models (VDMs) have significantly facilitated the customization of input video with target appearance, motion, etc. Despite these advances, challenges persist in accurately distilling motion information from video frames. While existing works leverage the consecutive fram… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://geonyeong-park.github.io/spectral-motion-alignment/

  32. arXiv:2403.14830  [pdf, other

    stat.ML cs.LG

    Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

    Authors: Zeya Wang, Chenglong Ye

    Abstract: Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensio… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  33. arXiv:2403.14183  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

    Authors: Kwanyoung Kim, Yujin Oh, Jong Chul Ye

    Abstract: The recent success of CLIP has demonstrated promising results in zero-shot semantic segmentation by transferring muiltimodal knowledge to pixel-level classification. However, leveraging pre-trained CLIP knowledge to closely align text embeddings with pixel embeddings still has limitations in existing approaches. To address this issue, we propose OTSeg, a novel multimodal attention mechanism aimed… ▽ More

    Submitted 11 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: ECCV 2024; 23 pages, 8 tables, 8 figures; Project Page: https://cubeyoung.github.io/OTSeg_project/

  34. arXiv:2403.13551  [pdf, other

    cs.CV cs.LG

    Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing

    Authors: Hangeol Chang, Jinho Chang, Jong Chul Ye

    Abstract: Despite recent advancements in text-to-image diffusion models facilitating various image editing techniques, complex text prompts often lead to an oversight of some requests due to a bottleneck in processing text information. To tackle this challenge, we present Ground-A-Score, a simple yet powerful model-agnostic image editing method by incorporating grounding during score distillation. This appr… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  35. arXiv:2403.12510  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Consistency Trajectory Models for Image Manipulation

    Authors: Beomsu Kim, Jaemin Kim, Jeongsol Kim, Jong Chul Ye

    Abstract: Diffusion-based generative models excel in unconditional generation, as well as on applied tasks such as image editing and restoration. The success of diffusion models lies in the iterative nature of diffusion: diffusion breaks down the complex process of mapping noise to data into a sequence of simple denoising tasks. Moreover, we are able to exert fine-grained control over the generation process… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  36. arXiv:2403.12002  [pdf, other

    cs.CV cs.AI

    DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

    Authors: Hyeonho Jeong, Jinho Chang, Geon Yeong Park, Jong Chul Ye

    Abstract: Text-driven diffusion-based video editing presents a unique challenge not encountered in image editing literature: establishing real-world motion. Unlike existing video editing approaches, here we focus on score distillation sampling to circumvent the standard reverse diffusion process and initiate optimization from videos that already exhibit natural motion. Our analysis reveals that while video… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to ECCV 2024, Project page: https://hyeonho99.github.io/dreammotion/

  37. arXiv:2403.11415  [pdf, other

    cs.CV cs.AI cs.LG

    DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation

    Authors: Jeongsol Kim, Geon Yeong Park, Jong Chul Ye

    Abstract: Reverse sampling and score-distillation have emerged as main workhorses in recent years for image manipulation using latent diffusion models (LDMs). While reverse diffusion sampling often requires adjustments of LDM architecture or feature engineering, score distillation offers a simple yet powerful model-agnostic approach, but it is often prone to mode-collapsing. To address these limitations and… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  38. arXiv:2403.07883  [pdf, other

    cs.CV cs.AI

    Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection

    Authors: Wei Ye, Chaoya Jiang, Haiyang Xu, Chenhao Ye, Chenliang Li, Ming Yan, Shikun Zhang, Songhang Huang, Fei Huang

    Abstract: Vision Transformers (ViTs) have become increasingly popular in large-scale Vision and Language Pre-training (VLP) models. Although previous VLP research has demonstrated the efficacy of ViTs, these efforts still struggle with computational inefficiencies caused by lengthy visual sequences. To address this challenge, we introduce an efficient VLP approach called TRIPS, which stands for Text-Relevan… ▽ More

    Submitted 11 January, 2024; originally announced March 2024.

  39. arXiv:2403.06275  [pdf, other

    cs.CV cs.AI cs.LG physics.med-ph

    UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation

    Authors: Kwanyoung Kim, Jaa-Yeon Lee, Jong Chul Ye

    Abstract: Nakagami imaging holds promise for visualizing and quantifying tissue scattering in ultrasound waves, with potential applications in tumor diagnosis and fat fraction estimation which are challenging to discern by conventional ultrasound B-mode images. Existing methods struggle with optimal window size selection and suffer from estimator instability, leading to degraded resolution images. To addres… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figure

  40. arXiv:2403.01433  [pdf, other

    cs.CE q-bio.NC

    BrainMass: Advancing Brain Network Analysis for Diagnosis with Large-scale Self-Supervised Learning

    Authors: Yanwu Yang, Chenfei Ye, Guinan Su, Ziyao Zhang, Zhikai Chang, Hairui Chen, Piu Chan, Yue Yu, Ting Ma

    Abstract: Foundation models pretrained on large-scale datasets via self-supervised learning demonstrate exceptional versatility across various tasks. Due to the heterogeneity and hard-to-collect medical data, this approach is especially beneficial for medical image analysis and neuroscience research, as it streamlines broad downstream tasks without the need for numerous costly annotations. However, there ha… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  41. arXiv:2403.00426  [pdf, other

    cs.CV

    Deep Learning Computed Tomography based on the Defrise and Clack Algorithm

    Authors: Chengze Ye, Linda-Sophie Schneider, Yipeng Sun, Andreas Maier

    Abstract: This study presents a novel approach for reconstructing cone beam computed tomography (CBCT) for specific orbits using known operator learning. Unlike traditional methods, this technique employs a filtered backprojection type (FBP-type) algorithm, which integrates a unique, adaptive filtering process. This process involves a series of operations, including weightings, differentiations, the 2D Rado… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  42. arXiv:2402.08991  [pdf, ps, other

    stat.ML cs.LG

    Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

    Authors: Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang

    Abstract: This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, these techniques cannot be directly applied to mod… ▽ More

    Submitted 14 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  43. arXiv:2402.08601  [pdf, other

    cs.CV

    Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing

    Authors: Yunji Jung, Seokju Lee, Tair Djanibekov, Hyunjung Shim, Jong Chul Ye

    Abstract: Text-guided non-rigid editing involves complex edits for input images, such as changing motion or compositions within their surroundings. Since it requires manipulating the input structure, existing methods often struggle with preserving object identity and background, particularly when combined with Stable Diffusion. In this work, we propose a training-free approach for non-rigid editing with Sta… ▽ More

    Submitted 14 February, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  44. arXiv:2402.07443  [pdf, other

    cs.LG cs.CC cs.DS cs.IT

    The I/O Complexity of Attention, or How Optimal is Flash Attention?

    Authors: Barna Saha, Christopher Ye

    Abstract: Self-attention is at the heart of the popular Transformer architecture, yet suffers from quadratic time and memory complexity. The breakthrough FlashAttention algorithm revealed I/O complexity as the true bottleneck in scaling Transformers. Given two levels of memory hierarchy, a fast cache (e.g. GPU on-chip SRAM) and a slow memory (e.g. GPU high-bandwidth memory), the I/O complexity measures the… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 24 pages, 3 figures

  45. arXiv:2402.07314  [pdf, other

    cs.LG stat.ML

    Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

    Authors: Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, Tong Zhang

    Abstract: We study Reinforcement Learning from Human Feedback (RLHF) under a general preference oracle. In particular, we do not assume that there exists a reward function and the preference signal is drawn from the Bradley-Terry model as most of the prior works do. We consider a standard mathematical formulation, the reverse-KL regularized minimax game between two LLMs for RLHF under general preference ora… ▽ More

    Submitted 25 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: RLHF, Preference Learning, Alignment for LLMs

  46. arXiv:2402.03046  [pdf, other

    cs.LG

    Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

    Authors: Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut , et al. (8 additional authors not shown)

    Abstract: In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, i… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Under review

  47. arXiv:2402.02407  [pdf, other

    cs.LG cs.CV cs.NE

    Defining Neural Network Architecture through Polytope Structures of Dataset

    Authors: Sangmin Lee, Abbas Mammadov, Jong Chul Ye

    Abstract: Current theoretical and empirical research in neural networks suggests that complex datasets require large network architectures for thorough classification, yet the precise nature of this relationship remains unclear. This paper tackles this issue by defining upper and lower bounds for neural network widths, which are informed by the polytope structure of the dataset in question. We also delve in… ▽ More

    Submitted 30 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  48. arXiv:2402.01509  [pdf, other

    eess.IV cs.CV cs.LG

    Advancing Brain Tumor Inpainting with Generative Models

    Authors: Ruizhi Zhu, Xinru Zhang, Haowen Pang, Chundan Xu, Chuyang Ye

    Abstract: Synthesizing healthy brain scans from diseased brain scans offers a potential solution to address the limitations of general-purpose algorithms, such as tissue segmentation and brain extraction algorithms, which may not effectively handle diseased images. We consider this a 3D inpainting task and investigate the adaptation of 2D inpainting methods to meet the requirements of 3D magnetic resonance… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  49. arXiv:2401.03412  [pdf, other

    cs.RO

    N$^{3}$-Mapping: Normal Guided Neural Non-Projective Signed Distance Fields for Large-scale 3D Mapping

    Authors: Shuangfu Song, Junqiao Zhao, Kai Huang, Jiaye Lin, Chen Ye, Tiantian Feng

    Abstract: Accurate and dense mapping in large-scale environments is essential for various robot applications. Recently, implicit neural signed distance fields (SDFs) have shown promising advances in this task. However, most existing approaches employ projective distances from range data as SDF supervision, introducing approximation errors and thus degrading the mapping quality. To address this problem, we i… ▽ More

    Submitted 29 April, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: 8 pages, 10 figures. Accepted by RAL2024

  50. arXiv:2312.12418  [pdf, other

    cs.CV

    LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

    Authors: Haolin Liu, Chongjie Ye, Yinyu Nie, Yingfan He, Xiaoguang Han

    Abstract: Instance shape reconstruction from a 3D scene involves recovering the full geometries of multiple objects at the semantic instance level. Many methods leverage data-driven learning due to the intricacies of scene complexity and significant indoor occlusions. Training these methods often requires a large-scale, high-quality dataset with aligned and paired shape annotations with real-world scans. Ex… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: homepage: https://gap-lab-cuhk-sz.github.io/LASA/