Skip to main content

Showing 1–50 of 184 results for author: Luo, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04242  [pdf, other

    cs.CV

    Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

    Authors: Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni

    Abstract: Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependen… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024. This is the submitted manuscript and the preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections

  2. arXiv:2406.19593  [pdf, other

    cs.CL cs.CV

    SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs

    Authors: Xin Su, Man Luo, Kris W Pan, Tien Pei Chou, Vasudev Lal, Phillip Howard

    Abstract: Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for conte… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.07754  [pdf, other

    cs.CV

    HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

    Authors: Zihui Xue, Mi Luo, Changan Chen, Kristen Grauman

    Abstract: We study the problem of precisely swapping objects in videos, with a focus on those interacted with by hands, given one user-provided reference object image. Despite the great advancements that diffusion models have made in video editing recently, these models often fall short in handling the intricacies of hand-object interactions (HOI), failing to produce realistic edits -- especially when objec… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Project website: https://vision.cs.utexas.edu/projects/HOI-Swap/

  4. arXiv:2406.02930  [pdf, other

    cs.CV

    P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images

    Authors: Tao Zhang, Shiqing Wei, Yikang Zhou, Muying Luo, Wenling You, Shunping Ji

    Abstract: Extracting building contours from remote sensing imagery is a significant challenge due to buildings' complex and diverse shapes, occlusions, and noise. Existing methods often struggle with irregular contours, rounded corners, and redundancy points, necessitating extensive post-processing to produce regular polygonal building contours. To address these challenges, we introduce a novel, streamlined… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2405.16402  [pdf, other

    cs.CL cs.AI

    Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions

    Authors: Man Luo, Christopher J. Warren, Lu Cheng, Haidar M. Abdul-Muhsin, Imon Banerjee

    Abstract: The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support through the development of empathetic, patient-facing chatbots. This study investigates an intriguing question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians? To answer this question, we collect a de-identifi… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  6. arXiv:2405.16273  [pdf, other

    cs.CV

    M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

    Authors: Mingshuang Luo, Ruibing Hou, Hong Chang, Zimo Liu, Yaowei Wang, Shiguang Shan

    Abstract: This paper presents M$^3$GPT, an advanced $\textbf{M}$ultimodal, $\textbf{M}$ultitask framework for $\textbf{M}$otion comprehension and generation. M$^3$GPT operates on three fundamental principles. The first focuses on creating a unified representation space for various motion-relevant modalities. We employ discrete vector quantization for multimodal control and generation signals, such as text,… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 18 pages, 6 figures

  7. arXiv:2404.19007  [pdf, other

    cs.CL cs.AI cs.CY

    How Did We Get Here? Summarizing Conversation Dynamics

    Authors: Yilun Hua, Nicholas Chernogor, Yuzhe Gu, Seoyeon Julie Jeong, Miranda Luo, Cristian Danescu-Niculescu-Mizil

    Abstract: Throughout a conversation, the way participants interact with each other is in constant flux: their tones may change, they may resort to different strategies to convey their points, or they might alter their interaction patterns. An understanding of these dynamics can complement that of the actual facts and opinions discussed, offering a more holistic view of the trajectory of the conversation: ho… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: To appear in the Proceedings of NAACL 2024. Data available in ConvoKit https://convokit.cornell.edu/

  8. arXiv:2404.18928  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    Stylus: Automatic Adapter Selection for Diffusion Models

    Authors: Michael Luo, Justin Wong, Brandon Trabucco, Yanping Huang, Joseph E. Gonzalez, Zhifeng Chen, Ruslan Salakhutdinov, Ion Stoica

    Abstract: Beyond scaling base models with more data or parameters, fine-tuned adapters provide an alternative way to generate high fidelity, custom images at reduced costs. As such, adapters have been widely adopted by open-source communities, accumulating a database of over 100K adapters-most of which are highly customized with insufficient descriptions. This paper explores the problem of matching the prom… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Project Website: https://stylus-diffusion.github.io

  9. arXiv:2404.15522  [pdf, other

    cs.CL cs.AI

    LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

    Authors: Mihir Parmar, Nisarg Patel, Neeraj Varshney, Mutsumi Nakamura, Man Luo, Santosh Mashetty, Arindam Mitra, Chitta Baral

    Abstract: Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied. However, the crucial skill pertaining to 'logi… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted at ACL(Main) 2024 | First version available @ https://openreview.net/forum?id=7NR2ZVzZxx

  10. arXiv:2403.16422  [pdf, other

    cs.CV cs.AI

    Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

    Authors: Sanyam Lakhanpal, Shivang Chopra, Vinija Jain, Aman Chadha, Man Luo

    Abstract: Over the past few years, Text-to-Image (T2I) generation approaches based on diffusion models have gained significant attention. However, vanilla diffusion models often suffer from spelling inaccuracies in the text displayed within the generated images. The capability to generate visual text is crucial, offering both academic interest and a wide range of practical applications. To produce accurate… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  11. arXiv:2403.13480  [pdf, other

    cs.CV cs.IR cs.MM

    A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels

    Authors: Haochen Han, Minnan Luo, Huan Liu, Fang Nan

    Abstract: Cross-modal retrieval (CMR) aims to establish interaction between different modalities, among which supervised CMR is emerging due to its flexibility in learning semantic category discrimination. Despite the remarkable performance of previous supervised CMR methods, much of their success can be attributed to the well-annotated data. However, even for unimodal data, precise annotation is expensive… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  12. arXiv:2403.06351  [pdf, other

    cs.CV

    Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

    Authors: Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman

    Abstract: We investigate exocentric-to-egocentric cross-view translation, which aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective. To this end, we propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation, which explicitly… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 22 pages

  13. arXiv:2403.05265  [pdf, other

    cs.AI

    MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

    Authors: Zinan Zeng, Sen Ye, Zijian Cai, Heng Wang, Yuhan Liu, Haokai Zhang, Minnan Luo

    Abstract: Online movie review websites are valuable for information and discussion about movies. However, the massive spoiler reviews detract from the movie-watching experience, making spoiler detection an important task. Previous methods simply focus on reviews' text content, ignoring the heterogeneity of information in the platform. For instance, the metadata and the corresponding user's information of a… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  14. arXiv:2403.05105  [pdf, other

    cs.CV cs.AI cs.MM

    Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

    Authors: Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang

    Abstract: Collecting well-matched multimedia datasets is crucial for training cross-modal retrieval models. However, in real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data will remarkably harm the cross-modal retrieval performance. Previous efforts tend to mitigate this proble… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  15. arXiv:2402.16911  [pdf, other

    cs.LG cs.CV

    Trustworthy Personalized Bayesian Federated Learning via Posterior Fine-Tune

    Authors: Mengen Luo, Chi Xu, Ercan Engin Kuruoglu

    Abstract: Performance degradation owing to data heterogeneity and low output interpretability are the most significant challenges faced by federated learning in practical applications. Personalized federated learning diverges from traditional approaches, as it no longer seeks to train a single model, but instead tailors a unique personalized model for each client. However, previous work focused only on pers… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  16. arXiv:2402.16907  [pdf, other

    eess.IV cs.CV cs.LG

    Diffusion Posterior Proximal Sampling for Image Restoration

    Authors: Hongjie Wu, Linchao He, Mingqin Zhang, Dongdong Chen, Kunming Luo, Mengting Luo, Ji-Zhe Zhou, Hu Chen, Jiancheng Lv

    Abstract: Diffusion models have demonstrated remarkable efficacy in generating high-quality samples. Existing diffusion-based image restoration algorithms exploit pre-trained diffusion models to leverage data priors, yet they still preserve elements inherited from the unconditional generation paradigm. These strategies initiate the denoising process with pure white noise and incorporate random noise at each… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  17. arXiv:2402.16091  [pdf, other

    cs.LG cs.AI

    Bayesian Neural Network For Personalized Federated Learning Parameter Selection

    Authors: Mengen Luo, Ercan Engin Kuruoglu

    Abstract: Federated learning's poor performance in the presence of heterogeneous data remains one of the most pressing issues in the field. Personalized federated learning departs from the conventional paradigm in which all clients employ the same model, instead striving to discover an individualized model for each client to address the heterogeneity in the data. One of such approach involves personalizing… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  18. arXiv:2402.13851  [pdf, other

    cs.CV

    VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models

    Authors: Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han, Ee-Chien Chang, Xiaochun Cao

    Abstract: Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. However, we uncover the potential threat posed by backdoor attacks on autoregressive VLMs during instruction tuning. Adversaries can implant a backdoor by injecting pois… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  19. arXiv:2402.11566  [pdf, other

    cs.CV

    Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training

    Authors: Huayi Zhou, Mukun Luo, Fei Jiang, Yue Ding, Hongtao Lu

    Abstract: The 2D human pose estimation (HPE) is a basic visual problem. However, its supervised learning requires massive keypoint labels, which is labor-intensive to collect. Thus, we aim at boosting a pose estimator by excavating extra unlabeled data with semi-supervised learning (SSL). Most previous SSHPE methods are consistency-based and strive to maintain consistent outputs for differently augmented in… ▽ More

    Submitted 7 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 14 pages. Semi-Supervised 2D Human Pose Estimation

  20. arXiv:2402.10426  [pdf, other

    cs.CL

    DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection

    Authors: Herun Wan, Shangbin Feng, Zhaoxuan Tan, Heng Wang, Yulia Tsvetkov, Minnan Luo

    Abstract: Large language models are limited by challenges in factuality and hallucinations to be directly employed off-the-shelf for judging the veracity of news articles, where factual accuracy is paramount. In this work, we propose DELL that identifies three key stages in misinformation detection where LLMs could be incorporated as part of the pipeline: 1) LLMs could \emph{generate news reactions} to repr… ▽ More

    Submitted 4 July, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  21. arXiv:2402.00371  [pdf, other

    cs.CL

    What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

    Authors: Shangbin Feng, Herun Wan, Ningnan Wang, Zhaoxuan Tan, Minnan Luo, Yulia Tsvetkov

    Abstract: Social media bot detection has always been an arms race between advancements in machine learning bot detectors and adversarial bot strategies to evade detection. In this work, we bring the arms race to the next level by investigating the opportunities and risks of state-of-the-art large language models (LLMs) in social bot detection. To investigate the opportunities, we design novel LLM-based bot… ▽ More

    Submitted 4 July, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  22. arXiv:2401.11624  [pdf, other

    cs.CL cs.AI cs.IR

    In-context Learning with Retrieved Demonstrations for Language Models: A Survey

    Authors: Man Luo, Xin Xu, Yue Liu, Panupong Pasupat, Mehran Kazemi

    Abstract: Language models, especially pre-trained large language models, have showcased remarkable abilities as few-shot in-context learners (ICL), adept at adapting to new tasks with just a few demonstrations in the input context. However, the model's ability to perform ICL is sensitive to the choice of the few-shot demonstrations. Instead of using a fixed set of demonstrations, one recent development is t… ▽ More

    Submitted 23 March, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

  23. arXiv:2401.08180  [pdf, other

    physics.optics cs.ET physics.app-ph

    Control-free and efficient integrated photonic neural networks via hardware-aware training and pruning

    Authors: Tengji Xu, Weipeng Zhang, Jiawei Zhang, Zeyu Luo, Qiarong Xiao, Benshan Wang, Mingcheng Luo, Xingyuan Xu, Bhavin J. Shastri, Paul R. Prucnal, Chaoran Huang

    Abstract: Integrated photonic neural networks (PNNs) are at the forefront of AI computing, leveraging on light's unique properties, such as large bandwidth, low latency, and potentially low power consumption. Nevertheless, the integrated optical components within PNNs are inherently sensitive to external disturbances and thermal interference, which can detrimentally affect computing accuracy and reliability… ▽ More

    Submitted 7 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 21 pages, 6 figures

  24. arXiv:2401.06442  [pdf, other

    cs.CV

    RotationDrag: Point-based Image Editing with Rotated Diffusion Features

    Authors: Minxing Luo, Wentao Cheng, Jian Yang

    Abstract: A precise and user-friendly manipulation of image content while preserving image fidelity has always been crucial to the field of image editing. Thanks to the power of generative models, recent point-based image editing methods allow users to interactively change the image content with high generalizability by clicking several control points. But the above mentioned editing process is usually base… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Code is released at https://github.com/Tony-Lowe/RotationDrag

  25. arXiv:2312.16478  [pdf, other

    cs.LG

    Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

    Authors: Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Xiaojun Chang, Jingdong Wang

    Abstract: Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. Recently, to alleviate expensive data collection, co-occurring pairs from the Internet are automatically harvested for training. However, it inevitably includes mismatched pairs, \ie, noisy correspondences, undermining supervision reliability and degrading performance. Current methods leverage deep ne… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  26. arXiv:2312.02226  [pdf, other

    cs.CV

    Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition

    Authors: Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang

    Abstract: Exploring open-vocabulary video action recognition is a promising venture, which aims to recognize previously unseen actions within any arbitrary set of categories. Existing methods typically adapt pretrained image-text models to the video domain, capitalizing on their inherent strengths in generalization. A common thread among such methods is the augmentation of visual embeddings with temporal in… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  27. arXiv:2312.01195  [pdf, other

    cs.CR cs.SE

    AIM: Automatic Interrupt Modeling for Dynamic Firmware Analysis

    Authors: Bo Feng, Meng Luo, Changming Liu, Long Lu, Engin Kirda

    Abstract: The security of microcontrollers, which drive modern IoT and embedded devices, continues to raise major concerns. Within a microcontroller (MCU), the firmware is a monolithic piece of software that contains the whole software stack, whereas a variety of peripherals represent the hardware. As MCU firmware contains vulnerabilities, it is ideal to test firmware with off-the-shelf software testing tec… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: This paper was accepted to IEEE Transactions on Dependable and Secure Computing at Oct 12, 2023

  28. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  29. arXiv:2311.04760  [pdf, other

    cs.IR cs.LG

    Towards Open-world Cross-Domain Sequential Recommendation: A Model-Agnostic Contrastive Denoising Approach

    Authors: Wujiang Xu, Xuying Ning, Wenfang Lin, Mingming Ha, Qiongxu Ma, Qianqiao Liang, Xuewen Tao, Linxun Chen, Bing Han, Minnan Luo

    Abstract: Cross-domain sequential recommendation (CDSR) aims to address the data sparsity problems that exist in traditional sequential recommendation (SR) systems. The existing approaches aim to design a specific cross-domain unit that can transfer and propagate information across multiple domains by relying on overlapping users with abundant behaviors. However, in real-world recommender systems, CDSR sc… ▽ More

    Submitted 5 June, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

  30. arXiv:2311.01686  [pdf, other

    cs.CV cs.LG

    Disentangled Representation Learning with Transmitted Information Bottleneck

    Authors: Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Jihong Wang, Xiaojun Chang, Jingdong Wang, Qinghua Zheng

    Abstract: Encoding only the task-related information from the raw data, \ie, disentangled representation learning, can greatly contribute to the robustness and generalizability of models. Although significant advances have been made by regularizing the information in representations with information theory, two major challenges remain: 1) the representation compression inevitably leads to performance drop;… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  31. arXiv:2311.00483  [pdf, other

    eess.IV cs.CV

    DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation

    Authors: Xiaohua Jiang, Yihao Guo, Jian Huang, Yuting Wu, Meiyi Luo, Zhaoyang Xu, Qianni Zhang, Xingru Huang, Hong He, Shaowei Jiang, Jing Ye, Mang Xiao

    Abstract: The precise spatial and quantitative delineation of indistinct-boundary medical objects is paramount for the accuracy of diagnostic protocols, efficacy of surgical interventions, and reliability of postoperative assessments. Despite their significance, the effective segmentation and instantaneous three-dimensional reconstruction are significantly impeded by the paucity of representative samples in… ▽ More

    Submitted 19 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 36pages,16figures,7tables

    MSC Class: 68; 92 ACM Class: I.4; J.3

  32. arXiv:2311.00197  [pdf, other

    cs.RO

    Design, Modeling, and Control of a Low-Cost and Rapid Response Soft-Growing Manipulator for Orchard Operations

    Authors: Ryan Dorosh, Justin Allen, Zixuan He, Christopher Ninatanta, Jack Coleman, Jack Spieker, Ethan Tuck, Jordan Kurtz, Qin Zhang, Matthew D. Whiting, Jiecai Luo, Manoj Karkee, Ming Luo

    Abstract: Tree fruit growers around the world are facing labor shortages for critical operations, including harvest and pruning. There is a great interest in developing robotic solutions for these labor-intensive tasks, but current efforts have been prohibitively costly, slow, or require a reconfiguration of the orchard in order to function. In this paper, we introduce an alternative approach to robotics us… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: International Conference on Intelligent Robots and Systems (IROS) 2023

  33. arXiv:2310.19293  [pdf, other

    eess.IV cs.CV

    FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound

    Authors: Chaoyu Chen, Xin Yang, Yuhao Huang, Wenlong Shi, Yan Cao, Mingyuan Luo, Xindi Hu, Lei Zhue, Lequan Yu, Kejuan Yue, Yuanji Zhang, Yi Xiong, Dong Ni, Weijun Huang

    Abstract: Fetal pose estimation in 3D ultrasound (US) involves identifying a set of associated fetal anatomical landmarks. Its primary objective is to provide comprehensive information about the fetus through landmark connections, thus benefiting various critical applications, such as biometric measurements, plane localization, and fetal movement monitoring. However, accurately estimating the 3D fetal pose… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: 16 pages, 11 figures, accepted by Medical Image Analysis(2023)

  34. arXiv:2310.16376  [pdf, other

    cs.LG cs.AI

    GADY: Unsupervised Anomaly Detection on Dynamic Graphs

    Authors: Shiqi Lou, Qingyue Zhang, Shujie Yang, Yuyang Tian, Zhaoxuan Tan, Minnan Luo

    Abstract: Anomaly detection on dynamic graphs refers to detecting entities whose behaviors obviously deviate from the norms observed within graphs and their temporal information. This field has drawn increasing attention due to its application in finance, network security, social networks, and more. However, existing methods face two challenges: dynamic structure constructing challenge - difficulties in cap… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  35. arXiv:2310.13822  [pdf, other

    cs.LG

    Adversarial Attacks on Fairness of Graph Neural Networks

    Authors: Binchi Zhang, Yushun Dong, Chen Chen, Yada Zhu, Minnan Luo, Jundong Li

    Abstract: Fairness-aware graph neural networks (GNNs) have gained a surge of attention as they can reduce the bias of predictions on any demographic group (e.g., female) in graph-based applications. Although these methods greatly improve the algorithmic fairness of GNNs, the fairness can be easily corrupted by carefully designed adversarial attacks. In this paper, we investigate the problem of adversarial a… ▽ More

    Submitted 2 March, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024

  36. arXiv:2310.00836  [pdf, other

    cs.CL cs.AI

    Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models

    Authors: Man Luo, Shrinidhi Kumbhar, Ming shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, Chitta Baral

    Abstract: Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge R… ▽ More

    Submitted 30 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: Work in progress

  37. Perceptual Tone Mapping Model for High Dynamic Range Imaging

    Authors: Imran Mehmood, Xinye Shi, M. Usman Khan, Ming Ronnier Luo

    Abstract: One of the key challenges in tone mapping is to preserve the perceptual quality of high dynamic range (HDR) images when mapping them to standard dynamic range (SDR) displays. Traditional tone mapping operators (TMOs) compress the luminance of HDR images without considering the surround and display conditions emanating into suboptimal results. Current research addresses this challenge by incorporat… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  38. arXiv:2309.15479  [pdf, other

    cs.LG cs.DS

    Fast Locality Sensitive Hashing with Theoretical Guarantee

    Authors: Zongyuan Tan, Hongya Wang, Bo Xu, Minjie Luo, Ming Du

    Abstract: Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks. The cost of hashing is proportional to data dimensions, and thus often the performance bottleneck when dimensionality is high and the number of hash functions involved is large. Surprisingly, however, little work has been done to improve the efficiency of LSH computation. In this paper… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  39. arXiv:2309.11125  [pdf, other

    cs.CV

    PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement

    Authors: Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Jingdong Wang

    Abstract: Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID). Despite significant progress, current methods face two primary challenges: 1) the pedestrian candidates learned within detectors are suboptimal for the ReID task. 2) the potential for collaboration between tw… ▽ More

    Submitted 13 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

  40. arXiv:2309.09235  [pdf, other

    quant-ph cs.LG

    Provable learning of quantum states with graphical models

    Authors: Liming Zhao, Naixu Guo, Ming-Xing Luo, Patrick Rebentrost

    Abstract: The complete learning of an $n$-qubit quantum state requires samples exponentially in $n$. Several works consider subclasses of quantum states that can be learned in polynomial sample complexity such as stabilizer states or high-temperature Gibbs states. Other works consider a weaker sense of learning, such as PAC learning and shadow tomography. In this work, we consider learning states that are c… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  41. arXiv:2309.05257  [pdf, other

    cs.CV

    FusionFormer: A Multi-sensory Fusion in Bird's-Eye-View and Temporal Consistent Transformer for 3D Object Detection

    Authors: Chunyong Hu, Hang Zheng, Kun Li, Jianyun Xu, Weibo Mao, Maochun Luo, Lingxuan Wang, Mingxia Chen, Qihao Peng, Kaixuan Liu, Yiru Zhao, Peihan Hao, Minzhe Liu, Kaicheng Yu

    Abstract: Multi-sensor modal fusion has demonstrated strong advantages in 3D object detection tasks. However, existing methods that fuse multi-modal features require transforming features into the bird's eye view space and may lose certain information on Z-axis, thus leading to inferior performance. To this end, we propose a novel end-to-end multi-modal fusion transformer-based framework, dubbed FusionForme… ▽ More

    Submitted 8 October, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

  42. arXiv:2308.15063  [pdf, other

    cs.CV cs.AI

    Learning Cross-modality Information Bottleneck Representation for Heterogeneous Person Re-Identification

    Authors: Haichao Shi, Mandi Luo, Xiao-Yu Zhang, Ran He

    Abstract: Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance. Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities, which still leave two problems underexplored: information redundancy and modality complementarity. To this end, properly eliminating th… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  43. arXiv:2308.10156  [pdf, other

    cs.CV cs.AI

    SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

    Authors: Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang

    Abstract: Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls. In contrast, Layout-to-Image (L2I) generation, aiming to generate realistic and complex scene images from user-specified layouts, has risen to prominence. However, existing methods transform layout information into tokens or RGB images for co… ▽ More

    Submitted 13 March, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted to AAAI 2024

    Journal ref: 38th AAAI Conference on Artificial Intelligence (AAAI2024), Vancouver, BC, Canada, 2024

  44. arXiv:2308.08147  [pdf, other

    cs.CL

    MDDial: A Multi-turn Differential Diagnosis Dialogue Dataset with Reliability Evaluation

    Authors: Srija Macherla, Man Luo, Mihir Parmar, Chitta Baral

    Abstract: Dialogue systems for Automatic Differential Diagnosis (ADD) have a wide range of real-life applications. These dialogue systems are promising for providing easy access and reducing medical costs. Building end-to-end ADD dialogue systems requires dialogue training datasets. However, to the best of our knowledge, there is no publicly available ADD dialogue dataset in English (although non-English da… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  45. arXiv:2307.16617  [pdf, other

    cs.CV

    FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

    Authors: Zhijian Huang, Sihao Lin, Guiyu Liu, Mukun Luo, Chaoqiang Ye, Hang Xu, Xiaojun Chang, Xiaodan Liang

    Abstract: Multi-modality fusion and multi-task learning are becoming trendy in 3D autonomous driving scenario, considering robust prediction and computation budget. However, naively extending the existing framework to the domain of multi-modality multi-task learning remains ineffective and even poisonous due to the notorious modality bias and task conflict. Previous works manually coordinate the learning fr… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  46. arXiv:2307.12070  [pdf, other

    cs.CV

    Fast and Stable Diffusion Inverse Solver with History Gradient Update

    Authors: Linchao He, Hongyu Yan, Mengting Luo, Hongjie Wu, Kunming Luo, Wang Wang, Wenchao Du, Hu Chen, Hongyu Yang, Yi Zhang, Jiancheng Lv

    Abstract: Diffusion models have recently been recognised as efficient inverse problem solvers due to their ability to produce high-quality reconstruction results without relying on pairwise data training. Existing diffusion-based solvers utilize Gradient Descent strategy to get a optimal sample solution. However, these solvers only calculate the current gradient and have not utilized any history information… ▽ More

    Submitted 11 March, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: 17 pages, 7 figures. Provision of theoretical proofs to demonstrate the convergence of the methods

  47. arXiv:2307.04091  [pdf, other

    cs.CV

    CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

    Authors: Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo, Yingya Zhang, Qifeng Chen

    Abstract: 2D RGB images and 3D LIDAR point clouds provide complementary knowledge for the perception system of autonomous vehicles. Several 2D and 3D fusion methods have been explored for the LIDAR semantic segmentation task, but they suffer from different problems. 2D-to-3D fusion methods require strictly paired data during inference, which may not be available in real-world scenarios, while 3D-to-2D fusio… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  48. arXiv:2307.00777  [pdf, ps, other

    cs.LG cs.AI

    GA-DRL: Graph Neural Network-Augmented Deep Reinforcement Learning for DAG Task Scheduling over Dynamic Vehicular Clouds

    Authors: Zhang Liu, Lianfen Huang, Zhibin Gao, Manman Luo, Seyyedali Hosseinalipour, Huaiyu Dai

    Abstract: Vehicular clouds (VCs) are modern platforms for processing of computation-intensive tasks over vehicles. Such tasks are often represented as directed acyclic graphs (DAGs) consisting of interdependent vertices/subtasks and directed edges. In this paper, we propose a graph neural network-augmented deep reinforcement learning scheme (GA-DRL) for scheduling DAG tasks over dynamic VCs. In doing so, we… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 15 pages, 12 figures, regular journal

  49. arXiv:2306.17408  [pdf, other

    cs.AI cs.CL cs.SI

    LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection

    Authors: Zijian Cai, Zhaoxuan Tan, Zhenyu Lei, Zifeng Zhu, Hongrui Wang, Qinghua Zheng, Minnan Luo

    Abstract: As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-the-art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-con… ▽ More

    Submitted 3 January, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: 11 pages, 7 figures

  50. arXiv:2306.16197  [pdf, other

    cs.CV eess.IV

    Multi-IMU with Online Self-Consistency for Freehand 3D Ultrasound Reconstruction

    Authors: Mingyuan Luo, Xin Yang, Zhongnuo Yan, Junyu Li, Yuanji Zhang, Jiongquan Chen, Xindi Hu, Jikuan Qian, Jun Cheng, Dong Ni

    Abstract: Ultrasound (US) imaging is a popular tool in clinical diagnosis, offering safety, repeatability, and real-time capabilities. Freehand 3D US is a technique that provides a deeper understanding of scanned regions without increasing complexity. However, estimating elevation displacement and accumulation error remains challenging, making it difficult to infer the relative position using images alone.… ▽ More

    Submitted 18 July, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by MICCAI-2023