Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Huang, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09739  [pdf, other

    cs.CV

    TraDiffusion: Trajectory-Based Training-Free Image Generation

    Authors: Mingrui Wu, Oucheng Huang, Jiayi Ji, Jiale Li, Xinyue Cai, Huafeng Kuang, Jianzhuang Liu, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this work, we propose a training-free, trajectory-based controllable T2I approach, termed TraDiffusion. This novel method allows users to effortlessly guide image generation via mouse trajectories. To achieve precise control, we design a distance awareness energy function to effectively guide latent variables, ensuring that the focus of generation is within the areas defined by the trajectory.… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: The code: https://github.com/och-mac/TraDiffusion

  2. arXiv:2407.21534  [pdf, other

    cs.CV

    ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models

    Authors: Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. We observe the relationship between text prompt tokens and visual tokens in MLLMs, where attention layers model the connection between them. Our approach involves adjusting visual tokens from the MLP output during inference, controlli… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Code:https://github.com/mrwu-mac/ControlMLLM

  3. arXiv:2406.16449  [pdf, other

    cs.CV

    Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

    Authors: Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji

    Abstract: The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, which is essential for visual comprehension. In this work, we introduce R-Benc… ▽ More

    Submitted 18 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: ICML2024; Project Page:https://github.com/mrwu-mac/R-Bench

  4. arXiv:2311.02802  [pdf, other

    cs.CL cs.AI

    Incorporating Worker Perspectives into MTurk Annotation Practices for NLP

    Authors: Olivia Huang, Eve Fleisig, Dan Klein

    Abstract: Current practices regarding data collection for natural language processing on Amazon Mechanical Turk (MTurk) often rely on a combination of studies on data quality and heuristics shared among NLP researchers. However, without considering the perspectives of MTurk workers, these approaches are susceptible to issues regarding workers' rights and poor response quality. We conducted a critical litera… ▽ More

    Submitted 15 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

  5. arXiv:2212.00802  [pdf, other

    cs.LG

    An Introduction to Kernel and Operator Learning Methods for Homogenization by Self-consistent Clustering Analysis

    Authors: Owen Huang, Sourav Saha, Jiachen Guo, Wing Kam Liu

    Abstract: Recent advances in operator learning theory have improved our knowledge about learning maps between infinite dimensional spaces. However, for large-scale engineering problems such as concurrent multiscale simulation for mechanical properties, the training cost for the current operator learning methods is very high. The article presents a thorough analysis on the mathematical underpinnings of the o… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

  6. arXiv:2205.12602  [pdf, other

    cs.CV

    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

    Authors: Yuxing Chen, Renshu Gu, Ouhan Huang, Gangyong Jia

    Abstract: This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flat… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  7. arXiv:2201.11345  [pdf, other

    cs.CV cs.AI

    Exploring Global Diversity and Local Context for Video Summarization

    Authors: Yingchao Pan, Ouhan Huang, Qinghao Ye, Zhongjin Li, Wenjiang Wang, Guodun Li, Yuxing Chen

    Abstract: Video summarization aims to automatically generate a diverse and concise summary which is useful in large-scale video processing. Most of the methods tend to adopt self-attention mechanism across video frames, which fails to model the diversity of video frames. To alleviate this problem, we revisit the pairwise similarity measurement in self-attention mechanism and find that the existing inner-pro… ▽ More

    Submitted 27 March, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Accepted by IEEE Access

  8. arXiv:1908.05782  [pdf, other

    eess.IV cs.CV stat.ML

    MimickNet, Matching Clinical Post-Processing Under Realistic Black-Box Constraints

    Authors: Ouwen Huang, Will Long, Nick Bottenus, Gregg E. Trahey, Sina Farsiu, Mark L. Palmeri

    Abstract: Image post-processing is used in clinical-grade ultrasound scanners to improve image quality (e.g., reduce speckle noise and enhance contrast). These post-processing techniques vary across manufacturers and are generally kept proprietary, which presents a challenge for researchers looking to match current clinical-grade workflows. We introduce a deep learning framework, MimickNet, that transforms… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: This work has been submitted to the IEEE Transactions on Medical Imaging on July 1st, 2019 for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible