Skip to main content

Showing 1–50 of 539 results for author: Ren, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13108  [pdf, other

    cs.CV

    UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt

    Authors: Xin Li, Bingchen Li, Yeying Jin, Cuiling Lan, Hanxin Zhu, Yulin Ren, Zhibo Chen

    Abstract: Compressed Image Super-resolution (CSR) aims to simultaneously super-resolve the compressed images and tackle the challenging hybrid distortions caused by compression. However, existing works on CSR usually focuses on a single compression codec, i.e., JPEG, ignoring the diverse traditional or learning-based codecs in the practical application, e.g., HEVC, VVC, HIFIC, etc. In this work, we propose… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  2. arXiv:2407.10833  [pdf, other

    eess.IV cs.CV

    MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration

    Authors: Yulin Ren, Xin Li, Bingchen Li, Xingrui Wang, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

    Abstract: We present MoE-DiffIR, an innovative universal compressed image restoration (CIR) method with task-customized diffusion priors. This intends to handle two pivotal challenges in the existing CIR methods: (i) lacking adaptability and universality for different image codecs, e.g., JPEG and WebP; (ii) poor texture generation capability, particularly at low bitrates. Specifically, our MoE-DiffIR develo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  3. arXiv:2407.10490  [pdf, other

    cs.LG cs.AI cs.CL

    Learning Dynamics of LLM Finetuning

    Authors: Yi Ren, Danica J. Sutherland

    Abstract: Learning dynamics, which describes how the learning of specific training examples influences the model's prediction of other examples, give us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during finetuning, by analyzing the step-wise decomposition and accumulated influence among different responses. Our framework a… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 32 pages

  4. arXiv:2407.09833  [pdf, other

    cs.CV

    LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment

    Authors: Yiming Ren, Xiao Han, Yichen Yao, Xiaoxiao Long, Yujing Sun, Yuexin Ma

    Abstract: LiDAR-based human motion capture has garnered significant interest in recent years for its practicability in large-scale and unconstrained environments. However, most methods rely on cleanly segmented human point clouds as input, the accuracy and smoothness of their motion results are compromised when faced with noisy data, rendering them unsuitable for practical applications. To address these lim… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  5. arXiv:2407.09697  [pdf, other

    cs.CV

    Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion

    Authors: Shiqi Tan, Hamidreza Fazlali, Yixuan Xu, Yuan Ren, Bingbing Liu

    Abstract: Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentat… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  6. arXiv:2407.08239  [pdf, other

    cs.SD cs.LG eess.AS

    An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio

    Authors: Siding Zeng, Jiangyan Yi, Jianhua Tao, Yujie Chen, Shan Liang, Yong Ren, Xiaohui Zhang

    Abstract: When the task of locating manipulation regions in partially-fake audio (PFA) involves cross-domain datasets, the performance of deep learning models drops significantly due to the shift between the source and target domains. To address this issue, existing approaches often employ data augmentation before training. However, they overlook the characteristics in target domain that are absent in sourc… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.07464  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Video-to-Audio Generation with Hidden Alignment

    Authors: Manjie Xu, Chenxing Li, Yong Ren, Rilin Chen, Yu Gu, Wei Liang, Dong Yu

    Abstract: Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techni… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: https://sites.google.com/view/vta-ldm

  8. arXiv:2407.06516  [pdf, other

    cs.CV

    VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

    Authors: Yibo Liu, Zheyuan Yang, Guile Wu, Yuan Ren, Kejian Lin, Bingbing Liu, Yang Liu, Jinjun Shan

    Abstract: Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of in-the-wild vehicles (such as car models, manufacturers, etc.). This leads to their poor zero-shot prediction capability to handle real-world obser… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  9. arXiv:2407.04216  [pdf, other

    cs.RO

    Safe MPC Alignment with Human Directional Feedback

    Authors: Zhixian Xie, Wenlong Zhang, Yi Ren, Zhaoran Wang, George J. Pappas, Wanxin Jin

    Abstract: In safety-critical robot planning or control, manually specifying safety constraints or learning them from demonstrations can be challenging. In this paper, we propose a certifiable alignment method for a robot to learn a safety constraint in its model predictive control (MPC) policy with human online directional feedback. To our knowledge, it is the first method to learn safety constraints from h… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 18 pages, submission to T-RO

  10. arXiv:2407.03000  [pdf, other

    cs.CL cs.CV

    VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values

    Authors: Zhe Hu, Yixiao Ren, Jing Li, Yu Yin

    Abstract: This paper introduces VIVA, a benchmark for VIsion-grounded decision-making driven by human VAlues. While most large vision-language models (VLMs) focus on physical-level skills, our work is the first to examine their multimodal capabilities in leveraging human values to make decisions under a vision-depicted situation. VIVA contains 1,062 images depicting diverse real-world situations and the man… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  11. arXiv:2407.02839  [pdf, other

    cs.IR cs.AI

    CRUISE on Quantum Computing for Feature Selection in Recommender Systems

    Authors: Jiayang Niu, Jie Li, Ke Deng, Yongli Ren

    Abstract: Using Quantum Computers to solve problems in Recommender Systems that classical computers cannot address is a worthwhile research topic. In this paper, we use Quantum Annealers to address the feature selection problem in recommendation algorithms. This feature selection problem is a Quadratic Unconstrained Binary Optimization(QUBO) problem. By incorporating Counterfactual Analysis, we significantl… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted by QuantumCLEF 2024

  12. arXiv:2407.02598  [pdf, other

    cs.CV cs.AI

    AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

    Authors: Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, Bingbing Liu

    Abstract: Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatti… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  13. arXiv:2407.00167  [pdf, other

    cs.CL cs.AI cs.ET cs.HC cs.SI

    Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach

    Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

    Abstract: In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted for the AI Applications in Public Health and Social Services workshop at the 22nd International Conference on Artificial Intelligence in Medicine (AIME 2024)

  14. arXiv:2407.00072  [pdf, other

    cs.IR cs.CL

    Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

    Authors: Yu Bai, Yukai Miao, Li Chen, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

    Abstract: In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing… ▽ More

    Submitted 11 July, 2024; v1 submitted 21 June, 2024; originally announced July 2024.

  15. arXiv:2406.16786  [pdf, other

    cs.CE

    Generalized and high-efficiency arbitrary-positioned buffer for smoothed particle hydrodynamics

    Authors: Shuoguo Zhang, Yu Fan, Yaru Ren, Bin Qian, Xiangyu Hu

    Abstract: This paper develops an arbitrary-positioned buffer for the smoothed particle hydrodynamics (SPH) method, whose generality and high efficiency are achieved through two techniques. First, with the local coordinate system established at each arbitrary-positioned in-/outlet, particle positions in the global coordinate system are transformed into those in it via coordinate transformation. Since one loc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 34 pages and 17 figures

  16. arXiv:2406.11066  [pdf, other

    cs.CV

    Parameter Blending for Multi-Camera Harmonization for Automotive Surround View Systems

    Authors: Yuzhuo Ren, Yining Deng, David Pajak, Robin Jenkin, Niranjan Avadhanam, Varsha Hedau

    Abstract: In a surround view system, the image color and tone captured by multiple cameras can be different due to cameras applying auto white balance (AWB), global tone mapping (GTM) individually for each camera. The color and brightness along stitched seam location may look discontinuous among multiple cameras which impacts overall stitched image visual quality. To improve the color transition between adj… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  17. arXiv:2406.10391  [pdf, other

    q-bio.QM cs.LG

    BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

    Authors: Yuchen Ren, Zhiyuan Chen, Lifeng Qiao, Hongtai Jing, Yuchen Cai, Sheng Xu, Peng Ye, Xinzhu Ma, Siqi Sun, Hongliang Yan, Dong Yuan, Wanli Ouyang, Xihui Liu

    Abstract: RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we i… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  18. arXiv:2406.10093  [pdf, other

    cs.RO cs.LG

    BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation

    Authors: Dongjie Yu, Hang Xu, Yizhou Chen, Yi Ren, Jia Pan

    Abstract: Bimanual manipulation tasks typically involve multiple stages which require efficient interactions between two arms, posing step-wise and stage-wise challenges for imitation learning systems. Specifically, failure and delay of one step will broadcast through time, hinder success and efficiency of each sub-stage task, and thereby overall task performance. Although recent works have made strides in… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  19. arXiv:2406.07230  [pdf, other

    cs.CV cs.AI

    Needle In A Multimodal Haystack

    Authors: Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang

    Abstract: With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplored. In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capab… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  20. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  21. arXiv:2406.04214  [pdf, other

    cs.CL

    ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

    Authors: Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, Guojie Song

    Abstract: Large Language Models (LLMs) are transforming diverse fields and gaining increasing influence as human proxies. This development underscores the urgent need for evaluating value orientations and understanding of LLMs to ensure their responsible integration into public-facing applications. This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  22. arXiv:2406.04115  [pdf, other

    cs.CV cs.GR

    Global Parameterization-based Texture Space Optimization

    Authors: Wei Chen, Yuxue Ren, Na Lei, Zhongxuan Luo, Xianfeng Gu

    Abstract: Texture mapping is a common technology in the area of computer graphics, it maps the 3D surface space onto the 2D texture space. However, the loose texture space will reduce the efficiency of data storage and GPU memory addressing in the rendering process. Many of the existing methods focus on repacking given textures, but they still suffer from high computational cost and hardly produce a wholly… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Preprint submitted to Comput. Math. Math. Phys

  23. arXiv:2406.02329  [pdf, other

    cs.CL cs.LG

    On Affine Homotopy between Language Encoders

    Authors: Robin SM Chan, Reda Boumasmoud, Anej Svete, Yuxin Ren, Qipeng Guo, Zhijing Jin, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Mennatallah El-Assady, Ryan Cotterell

    Abstract: Pre-trained language encoders -- functions that represent text as vectors -- are an integral component of many NLP tasks. We tackle a natural question in language encoder analysis: What does it mean for two encoders to be similar? We contend that a faithful measure of similarity needs to be \emph{intrinsic}, that is, task-independent, yet still be informative of \emph{extrinsic} similarity -- the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages

  24. arXiv:2405.20337  [pdf, other

    cs.CV cs.AI

    OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

    Authors: Lening Wang, Wenzhao Zheng, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jiwen Lu

    Abstract: Understanding the evolution of 3D scenes is important for effective autonomous driving. While conventional methods mode scene development with the motion of individual instances, world models emerge as a generative framework to describe the general scene dynamics. However, most existing methods adopt an autoregressive framework to perform next-token prediction, which suffer from inefficiency in mo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/wzzheng/OccSora

  25. arXiv:2405.16405  [pdf, other

    cs.LG cs.AI

    Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level

    Authors: Runlin Lei, Yuwei Hu, Yuchen Ren, Zhewei Wei

    Abstract: Graph Neural Networks (GNNs) excel across various applications but remain vulnerable to adversarial attacks, particularly Graph Injection Attacks (GIAs), which inject malicious nodes into the original graph and pose realistic threats. Text-attributed graphs (TAGs), where nodes are associated with textual features, are crucial due to their prevalence in real-world applications and are commonly used… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 29 pages

  26. arXiv:2405.15986  [pdf, ps, other

    cs.LG cs.DC math.NA stat.ML

    Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity

    Authors: Haoxuan Chen, Yinuo Ren, Lexing Ying, Grant M. Rotskoff

    Abstract: Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~\cite{shih2024parallel}, we propose to divide the sampling proce… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  27. arXiv:2405.13238  [pdf

    cs.IR cs.LG

    Enhancing User Interest based on Stream Clustering and Memory Networks in Large-Scale Recommender Systems

    Authors: Peng Liu, Nian Wang, Cong Xu, Ming Zhao, Bin Wang, Yi Ren

    Abstract: Recommender Systems (RSs) provide personalized recommendation service based on user interest, which are widely used in various platforms. However, there are lots of users with sparse interest due to lacking consumption behaviors, which leads to poor recommendation results for them. This problem is widespread in large-scale RSs and is particularly difficult to address. To solve this problem, we pro… ▽ More

    Submitted 26 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  28. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  29. arXiv:2405.04909  [pdf, other

    cs.CV cs.AI

    Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

    Authors: Zhengxing Lan, Hongbo Li, Lingshan Liu, Bo Fan, Yisheng Lv, Yilong Ren, Zhiyong Cui

    Abstract: Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explici… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  30. arXiv:2405.00435  [pdf, other

    cs.HC

    CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model

    Authors: Wei Zhang, Wong Kam-Kwai, Biying Xu, Yiwen Ren, Yuhuai Li, Minfeng Zhu, Yingchaojie Feng, Wei Chen

    Abstract: The integration of new technology with cultural studies enhances our understanding of cultural heritage but often struggles to connect with diverse audiences. It is challenging to align personal interpretations with the intended meanings across different cultures. Our study investigates the important factors in appreciating art from a cross-cultural perspective. We explore the application of Large… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  31. arXiv:2404.17607  [pdf, other

    cs.IR cs.AI cs.CL cs.LG cs.SI

    Utilizing Large Language Models to Identify Reddit Users Considering Vaping Cessation for Digital Interventions

    Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Caleb Henry, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

    Abstract: The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of vaping or e-cigarette use in the United States and other countr… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  32. arXiv:2404.17589  [pdf

    cs.IR cs.LG

    An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

    Authors: Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

    Abstract: As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However,… ▽ More

    Submitted 6 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  33. arXiv:2404.16064  [pdf

    cs.HC cs.LG cs.LO

    Transparent AI: Developing an Explainable Interface for Predicting Postoperative Complications

    Authors: Yuanfang Ren, Chirayu Tripathi, Ziyuan Guan, Ruilin Zhu, Victoria Hougha, Yingbo Ma, Zhenhong Hu, Jeremy Balch, Tyler J. Loftus, Parisa Rashidi, Benjamin Shickel, Tezcan Ozrazgat-Baslanti, Azra Bihorac

    Abstract: Given the sheer volume of surgical procedures and the significant rate of postoperative fatalities, assessing and managing surgical complications has become a critical public health concern. Existing artificial intelligence (AI) tools for risk surveillance and diagnosis often lack adequate interpretability, fairness, and reproducibility. To address this, we proposed an Explainable AI (XAI) framewo… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 32 pages, 7 figures, 4 supplement figures and 1 supplement table

  34. arXiv:2404.14409  [pdf, other

    cs.CV

    CrossScore: Towards Multi-View Image Evaluation and Scoring

    Authors: Zirui Wang, Wenjing Bian, Omkar Parkhi, Yuheng Ren, Victor Adrian Prisacariu

    Abstract: We introduce a novel cross-reference image quality assessment method that effectively fills the gap in the image assessment landscape, complementing the array of established evaluation schemes -- ranging from full-reference metrics like SSIM, no-reference metrics such as NIQE, to general-reference metrics including FID, and Multi-modal-reference metrics, e.g., CLIPScore. Utilising a neural network… ▽ More

    Submitted 14 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted at ECCV 2024. Project page see https://crossscore.active.vision

  35. arXiv:2404.13686  [pdf, other

    cs.CV

    Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

    Authors: Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao

    Abstract: Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degra… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Project Page: https://hyper-sd.github.io/

  36. arXiv:2404.07493  [pdf, other

    cs.LG cs.AI

    Characterizing the Influence of Topology on Graph Learning Tasks

    Authors: Kailong Wu, Yule Xie, Jiaxin Ding, Yuxiang Ren, Luoyi Fu, Xinbing Wang, Chenghu Zhou

    Abstract: Graph neural networks (GNN) have achieved remarkable success in a wide range of tasks by encoding features combined with topology to create effective representations. However, the fundamental problem of understanding and analyzing how graph topology influences the performance of learning models on downstream tasks has not yet been well understood. In this paper, we propose a metric, TopoInf, which… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  37. arXiv:2404.06723  [pdf, other

    cs.LG cs.CL

    Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

    Authors: Yingbo Ma, Suraj Kolla, Zhenhong Hu, Dhruv Kaliraman, Victoria Nolan, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Jeremy A. Balch, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, Benjamin Shickel

    Abstract: Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsi… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages, 3 figures. arXiv admin note: text overlap with arXiv:2403.04012

  38. arXiv:2404.06641  [pdf

    cs.LG cs.AI cs.CY

    Federated learning model for predicting major postoperative complications

    Authors: Yonggi Park, Yuanfang Ren, Benjamin Shickel, Ziyuan Guan, Ayush Patela, Yingbo Ma, Zhenhong Hu, Tyler J. Loftus, Parisa Rashidi, Tezcan Ozrazgat-Baslanti, Azra Bihorac

    Abstract: Background: The accurate prediction of postoperative complication risk using Electronic Health Records (EHR) and artificial intelligence shows great potential. Training a robust artificial intelligence model typically requires large-scale and diverse datasets. In reality, collecting medical data often encounters challenges surrounding privacy protection. Methods: This retrospective cohort study in… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 57 pages. 2 figures, 3 tables, 2 supplemental figures, 8 supplemental tables

  39. arXiv:2404.06418  [pdf, other

    cs.LG cs.AI

    Studying the Impact of Latent Representations in Implicit Neural Networks for Scientific Continuous Field Reconstruction

    Authors: Wei Xu, Derek Freeman DeSantis, Xihaier Luo, Avish Parmar, Klaus Tan, Balu Nadiga, Yihui Ren, Shinjae Yoo

    Abstract: Learning a continuous and reliable representation of physical fields from sparse sampling is challenging and it affects diverse scientific disciplines. In a recent work, we present a novel model called MMGN (Multiplicative and Modulated Gabor Network) with implicit neural networks. In this work, we design additional studies leveraging explainability methods to complement the previous experiments a… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  40. arXiv:2404.05976  [pdf, other

    cs.LG eess.SY stat.ME

    A Cyber Manufacturing IoT System for Adaptive Machine Learning Model Deployment by Interactive Causality Enabled Self-Labeling

    Authors: Yutian Ren, Yuqi He, Xuyin Zhang, Aaron Yen, G. P. Li

    Abstract: Machine Learning (ML) has been demonstrated to improve productivity in many manufacturing applications. To host these ML applications, several software and Industrial Internet of Things (IIoT) systems have been proposed for manufacturing applications to deploy ML applications and provide real-time intelligence. Recently, an interactive causality enabled self-labeling method has been proposed to ad… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  41. arXiv:2404.05809  [pdf, other

    cs.LG cs.AI stat.ME

    Self-Labeling in Multivariate Causality and Quantification for Adaptive Machine Learning

    Authors: Yutian Ren, Aaron Haohua Yen, G. P. Li

    Abstract: Adaptive machine learning (ML) aims to allow ML models to adapt to ever-changing environments with potential concept drift after model deployment. Traditionally, adaptive ML requires a new dataset to be manually labeled to tailor deployed models to altered data distributions. Recently, an interactive causality based self-labeling method was proposed to autonomously associate causally related data… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  42. arXiv:2404.05595  [pdf, other

    cs.CV

    UniFL: Improve Stable Diffusion via Unified Feedback Learning

    Authors: Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Min Zheng, Lean Fu, Guanbin Li

    Abstract: Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications. However, despite these significant advancements, the current competitive solutions still suffer from several limitations, including inferior visual quality, a lack of aesthetic appeal, and inefficient inference, without a comprehensive solutio… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  43. arXiv:2404.04860  [pdf, other

    cs.CV

    ByteEdit: Boost, Comply and Accelerate Generative Image Editing

    Authors: Yuxi Ren, Jie Wu, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean Fu

    Abstract: Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we p… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  44. arXiv:2404.04286  [pdf, other

    cs.CL cs.AI cs.LG

    Language Model Evolution: An Iterated Learning Perspective

    Authors: Yi Ren, Shangmin Guo, Linlu Qiu, Bailin Wang, Danica J. Sutherland

    Abstract: With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase. Notably, recent advancements in multi-round self-improving methods allow LLMs to generate new examples for training subsequent models. At the same time, multi-agent LLM systems, involving automated interactions among agents, are also increasing in pr… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  45. arXiv:2404.01240  [pdf, other

    cs.SE cs.CL cs.CV cs.HC

    AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding

    Authors: Safwat Ali Khan, Wenyu Wang, Yiran Ren, Bin Zhu, Jiangfan Shi, Alyssa McGowan, Wing Lam, Kevin Moran

    Abstract: Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Published at 17th IEEE International Conference on Software Testing, Verification and Validation (ICST) 2024, 12 pages

  46. arXiv:2404.00891  [pdf, other

    cs.CV cs.RO

    Marrying NeRF with Feature Matching for One-step Pose Estimation

    Authors: Ronghan Chen, Yang Cong, Yu Ren

    Abstract: Given the image collection of an object, we aim at building a real-time image-based pose estimation method, which requires neither its CAD model nor hours of object-specific training. Recent NeRF-based methods provide a promising solution by directly optimizing the pose from pixel loss between rendered and target images. However, during inference, they require long converging time, and suffer from… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: ICRA, 2024. Video https://www.youtube.com/watch?v=70fgUobOFWo

  47. arXiv:2403.20031  [pdf, other

    cs.CV

    A Unified Framework for Human-centric Point Cloud Video Understanding

    Authors: Yiteng Xu, Kecheng Ye, Xiao Han, Yiming Ren, Xinge Zhu, Yuexin Ma

    Abstract: Human-centric Point Cloud Video Understanding (PVU) is an emerging field focused on extracting and interpreting human-related features from sequences of human point clouds, further advancing downstream human-centric tasks and applications. Previous works usually focus on tackling one specific task and rely on huge labeled data, which has poor generalization capability. Considering that human has s… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  48. arXiv:2403.18762  [pdf, other

    cs.CV cs.AI cs.RO

    ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

    Authors: Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen

    Abstract: Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 8 pages, 11 figures, conference

  49. arXiv:2403.13307  [pdf, other

    cs.CV

    LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment

    Authors: Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma

    Abstract: Language-guided scene-aware human motion generation has great significance for entertainment and robotics. In response to the limitations of existing datasets, we introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research. LaserHuman stands out with its inclusion of genuine human motions within 3D environments, unbounded free-form natural language descript… ▽ More

    Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  50. arXiv:2403.12601  [pdf, other

    cs.CL

    LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models

    Authors: Chuang Liu, Renren Jin, Yuqi Ren, Deyi Xiong

    Abstract: Chinese Large Language Models (LLMs) have recently demonstrated impressive capabilities across various NLP benchmarks and real-world applications. However, the existing benchmarks for comprehensively evaluating these LLMs are still insufficient, particularly in terms of measuring knowledge that LLMs capture. Current datasets collect questions from Chinese examinations across different subjects and… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024