Skip to main content

Showing 1–50 of 394 results for author: Gao, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13771  [pdf, other

    cs.CV

    Training-Free Model Merging for Multi-target Domain Adaptation

    Authors: Wenyi Li, Huan-ang Gao, Mingju Gao, Beiwen Tian, Rong Zhi, Hao Zhao

    Abstract: In this paper, we study multi-target domain adaptation of scene understanding models. While previous methods achieved commendable results through inter-domain consistency losses, they often assumed unrealistic simultaneous access to images from all target domains, overlooking constraints such as data transfer bandwidth limitations and data privacy concerns. Given these challenges, we pose the ques… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.13764  [pdf, other

    cs.CV

    Shape of Motion: 4D Reconstruction from a Single Video

    Authors: Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, Angjoo Kanazawa

    Abstract: Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-seq… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  4. arXiv:2407.08950  [pdf, other

    cs.CV eess.IV

    Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration

    Authors: Hu Gao, Depeng Dang

    Abstract: Image restoration aims to recover high-quality images from their corrupted counterparts. Many existing methods primarily focus on the spatial domain, neglecting the understanding of frequency variations and ignoring the impact of implicit noise in skip connections. In this paper, we introduce a multi-scale frequency selection network (MSFSNet) that seamlessly integrates spatial and frequency domai… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.20106

  5. Multi-objective Learning to Rank by Model Distillation

    Authors: Jie Tang, Huiji Gao, Liwei He, Sanjeev Katariya

    Abstract: In online marketplaces, search ranking's objective is not only to purchase or conversion (primary objective), but to also the purchase outcomes(secondary objectives), e.g. order cancellation(or return), review rating, customer service inquiries, platform long term growth. Multi-objective learning to rank has been widely studied to balance primary and secondary objectives. But traditional approache… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  6. arXiv:2407.06770  [pdf, other

    cs.RO

    Pretraining-finetuning Framework for Efficient Co-design: A Case Study on Quadruped Robot Parkour

    Authors: Ci Chen, Jiyu Yu, Haojian Lu, Hongbo Gao, Rong Xiong, Yue Wang

    Abstract: In nature, animals with exceptional locomotion abilities, such as cougars, often possess asymmetric fore and hind legs, with their powerful hind legs acting as reservoirs of energy for leaps. This observation inspired us: could optimize the leg length of quadruped robots endow them with similar locomotive capabilities? In this paper, we propose an approach that co-optimizes the mechanical structur… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  7. arXiv:2407.06250  [pdf, other

    cs.CV

    FairDiff: Fair Segmentation with Point-Image Diffusion

    Authors: Wenyi Li, Haoran Xu, Guiyu Zhang, Huan-ang Gao, Mingju Gao, Mengyu Wang, Hao Zhao

    Abstract: Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and the societal demand for equitable medical quality. In response to this issue, our research adopts a data-driven strategy-enhancing data balance by integrating synthetic images. However, in terms of generating synthetic images, previous works either lack pai… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  8. arXiv:2407.04573  [pdf, other

    cs.IR cs.CL

    VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models

    Authors: Hang Gao, Yongfeng Zhang

    Abstract: Vector retrieval algorithms are vital for semantic queries in the evolving landscape of Large Language Models (LLMs). Retrieving vectors that simultaneously meet criteria for both similarity and diversity significantly enhances the capabilities of LLM-based agents. Despite the widespread use of the Maximal Marginal Relevance (MMR) in retrieval scenarios with relevance and diversity requirements, f… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  9. arXiv:2407.02264  [pdf, other

    cs.CV cs.SD eess.AS

    SOAF: Scene Occlusion-aware Neural Acoustic Field

    Authors: Huiyu Gao, Jiahao Ma, David Ahmedt-Aristizabal, Chuong Nguyen, Miaomiao Liu

    Abstract: This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach… ▽ More

    Submitted 2 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  10. arXiv:2407.00091  [pdf, other

    cs.IR cs.HC cs.LG

    Learning to Rank for Maps at Airbnb

    Authors: Malay Haldar, Hongwei Zhang, Kedar Bellare, Sherry Chen, Soumyadip Banerjee, Xiaotang Wang, Mustafa Abdool, Huiji Gao, Pavan Tapadia, Liwei He, Sanjeev Katariya

    Abstract: As a two-sided marketplace, Airbnb brings together hosts who own listings for rent with prospective guests from around the globe. Results from a guest's search for listings are displayed primarily through two interfaces: (1) as a list of rectangular cards that contain on them the listing image, price, rating, and other details, referred to as list-results (2) as oval pins on a map showing the list… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  11. arXiv:2406.18071  [pdf, other

    cs.SE

    Documenting Ethical Considerations in Open Source AI Models

    Authors: Haoyu Gao, Mansooreh Zahedi, Christoph Treude, Sarita Rosenstock, Marc Cheong

    Abstract: Background: The development of AI-enabled software heavily depends on AI model documentation, such as model cards, due to different domain expertise between software engineers and model developers. From an ethical standpoint, AI model documentation conveys critical information on ethical considerations along with mitigation strategies for downstream developers to ensure the delivery of ethically c… ▽ More

    Submitted 2 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: This paper is accepted by 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM'24)

  12. arXiv:2406.17297  [pdf, other

    cs.CV cs.AI

    Towards Open-set Camera 3D Object Detection

    Authors: Zhuolin He, Xinrun Li, Heng Gao, Jiachen Tang, Shoumeng Qiu, Wenfu Wang, Lvjian Lu, Xuchong Qiu, Xiangyang Xue, Jian Pu

    Abstract: Traditional camera 3D object detectors are typically trained to recognize a predefined set of known object classes. In real-world scenarios, these detectors may encounter unknown objects outside the training categories and fail to identify them correctly. To address this gap, we present OS-Det3D (Open-set Camera 3D Object Detection), a two-stage training framework enhancing the ability of camera 3… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  13. arXiv:2406.16525  [pdf, other

    stat.ML cs.LG

    OAML: Outlier Aware Metric Learning for OOD Detection Enhancement

    Authors: Heng Gao, Zhuolin He, Shoumeng Qiu, Jian Pu

    Abstract: Out-of-distribution (OOD) detection methods have been developed to identify objects that a model has not seen during training. The Outlier Exposure (OE) methods use auxiliary datasets to train OOD detectors directly. However, the collection and learning of representative OOD samples may pose challenges. To tackle these issues, we propose the Outlier Aware Metric Learning (OAML) framework. The main… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.16001  [pdf, other

    cs.CV

    Learning Accurate and Enriched Features for Stereo Image Super-Resolution

    Authors: Hu Gao, Depeng Dang

    Abstract: Stereo image super-resolution (stereoSR) aims to enhance the quality of super-resolution results by incorporating complementary information from an alternative view. Although current methods have shown significant advancements, they typically operate on representations at full resolution to preserve spatial details, facing challenges in accurately capturing contextual information. Simultaneously,… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  15. arXiv:2406.13233  [pdf, other

    cs.AI

    AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

    Authors: Zihao Zeng, Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng

    Abstract: Mixture of experts (MoE) has become the standard for constructing production-level large language models (LLMs) due to its promise to boost model capacity without causing significant overheads. Nevertheless, existing MoE methods usually enforce a constant top-k routing for all tokens, which is arguably restrictive because various tokens (e.g., "<EOS>" vs. "apple") may require various numbers of ex… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  16. arXiv:2406.12784  [pdf, other

    cs.CL

    UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions

    Authors: Xunzhi Wang, Zhuowei Zhang, Qiongyu Li, Gaonan Chen, Mengting Hu, Zhiyu li, Bitong Luo, Hang Gao, Zhixin Han, Haotian Wang

    Abstract: The rapid development of large language models (LLMs) has shown promising practical results. However, their low interpretability often leads to errors in unforeseen circumstances, limiting their utility. Many works have focused on creating comprehensive evaluation systems, but previous benchmarks have primarily assessed problem-solving abilities while neglecting the response's uncertainty, which m… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Under review

  17. arXiv:2406.12468  [pdf, other

    cs.CL cs.AI

    Adaptive Token Biaser: Knowledge Editing via Biasing Key Entities

    Authors: Baolong Bi, Shenghua Liu, Yiwei Wang, Lingrui Mei, Hongcheng Gao, Yilong Xu, Xueqi Cheng

    Abstract: The parametric knowledge memorized by large language models (LLMs) becomes outdated quickly. In-context editing (ICE) is currently the most effective method for updating the knowledge of LLMs. Recent advancements involve enhancing ICE by modifying the decoding strategy, obviating the need for altering internal model structures or adjusting external prompts. However, this enhancement operates acros… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  18. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  19. arXiv:2406.11721  [pdf, other

    cs.CL cs.AI cs.LG

    Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

    Authors: Bingxiang He, Ning Ding, Cheng Qian, Jia Deng, Ganqu Cui, Lifan Yuan, Huan-ang Gao, Huimin Chen, Zhiyuan Liu, Maosong Sun

    Abstract: Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that tasks are artificially defined and, to LLMs, merely consist of tokens and representations. This line of research has been limited to examining transfe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 33 pages, 14 figures

  20. arXiv:2406.10236  [pdf, other

    eess.IV cs.AI

    Lightening Anything in Medical Images

    Authors: Ben Fei, Yixuan Li, Weidong Yang, Hengjun Gao, Jingyi Xu, Lipeng Ma, Yatian Yang, Pinghong Zhou

    Abstract: The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 23 pages, 6 figures

  21. arXiv:2406.08709  [pdf, other

    cs.LG stat.ME

    Introducing Diminutive Causal Structure into Graph Representation Learning

    Authors: Hang Gao, Peng Qiao, Yifan Jin, Fengge Wu, Jiangmeng Li, Changwen Zheng

    Abstract: When engaging in end-to-end graph representation learning with Graph Neural Networks (GNNs), the intricate causal relationships and rules inherent in graph data pose a formidable challenge for the model in accurately capturing authentic data relationships. A proposed mitigating strategy involves the direct integration of rules or relationships corresponding to the graph data into the model. Howeve… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  22. arXiv:2406.08380  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Unsupervised Speech Recognition Without Pronunciation Models

    Authors: Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora by pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  23. arXiv:2406.07365  [pdf, other

    cs.CL cs.AI

    BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction

    Authors: Yinhao Bai, Yalan Xie, Xiaoyi Liu, Yuhua Zhao, Zhixin Han, Mengting Hu, Hang Gao, Renhong Cheng

    Abstract: Aspect sentiment quad prediction (ASQP) aims to predict four aspect-based elements, including aspect term, opinion term, aspect category, and sentiment polarity. In practice, unseen aspects, due to distinct data distribution, impose many challenges for a trained neural model. Motivated by this, this work formulates ASQP into the few-shot scenario, which aims for fast adaptation in real application… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Main Conference

  24. arXiv:2406.03873  [pdf, other

    cs.LG cs.AI cs.CV

    Quantum Implicit Neural Representations

    Authors: Jiaming Zhao, Wenbo Qiao, Peng Zhang, Hui Gao

    Abstract: Implicit neural representations have emerged as a powerful paradigm to represent signals such as images and sounds. This approach aims to utilize neural networks to parameterize the implicit function of the signal. However, when representing implicit functions, traditional neural networks such as ReLU-based multilayer perceptrons face challenges in accurately modeling high-frequency components of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: This paper was accepted by icml 2024

  25. arXiv:2406.03250  [pdf, other

    cs.CV cs.AI

    Prompt-based Visual Alignment for Zero-shot Policy Transfer

    Authors: Haihan Gao, Rui Zhang, Qi Yi, Hantao Yao, Haochen Li, Jiaming Guo, Shaohui Peng, Yunkai Gao, QiCheng Wang, Xing Hu, Yuanbo Wen, Zihao Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen

    Abstract: Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issue… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ICML2024

  26. arXiv:2406.02348  [pdf

    cs.LG

    AMOSL: Adaptive Modality-wise Structure Learning in Multi-view Graph Neural Networks For Enhanced Unified Representation

    Authors: Peiyu Liang, Hongchang Gao, Xubin He

    Abstract: While Multi-view Graph Neural Networks (MVGNNs) excel at leveraging diverse modalities for learning object representation, existing methods assume identical local topology structures across modalities that overlook real-world discrepancies. This leads MVGNNs straggles in modality fusion and representations denoising. To address these issues, we propose adaptive modality-wise structure learning (AM… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Journal ref: 13th International Conference on Soft Computing, Artificial Intelligence and Applications (SAI 2024)

  27. arXiv:2405.19909  [pdf, other

    cs.LG cs.AI cs.RO

    Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

    Authors: Tenglong Liu, Yang Li, Yixing Lan, Hao Gao, Wei Pan, Xin Xu

    Abstract: In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain the learned policy through policy regularization. However, these methods often suffer from the issue of unnecessary conservativeness, hampering policy improvement. This occurs due to the indiscriminate use of all actions from the behavior policy that genera… ▽ More

    Submitted 15 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: ICML 2024, 19 pages

  28. arXiv:2405.17264  [pdf, other

    cs.CL cs.LG

    On the Noise Robustness of In-Context Learning for Text Generation

    Authors: Hongfu Gao, Feipeng Zhang, Wenyu Jiang, Jun Shu, Feng Zheng, Hongxin Wei

    Abstract: Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significan… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  29. arXiv:2405.16488  [pdf

    cs.CV

    Partial train and isolate, mitigate backdoor attack

    Authors: Yong Li, Han Gao

    Abstract: Neural networks are widely known to be vulnerable to backdoor attacks, a method that poisons a portion of the training data to make the target model perform well on normal data sets, while outputting attacker-specified or random categories on the poisoned samples. Backdoor attacks are full of threats. Poisoned samples are becoming more and more similar to corresponding normal samples, and even the… ▽ More

    Submitted 6 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: 9 pages, 2 figures

  30. arXiv:2405.13859  [pdf, other

    cs.CV

    QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input

    Authors: Senmao Tian, Haoyu Gao, Gangyi Hong, Shuyun Wang, JingJie Wang, Xin Yu, Shunli Zhang

    Abstract: Existing deep learning methods have made significant progress in gait recognition. Typically, appearance-based models binarize inputs into silhouette sequences. However, mainstream quantization methods prioritize minimizing task loss over quantization error, which is detrimental to gait recognition with binarized inputs. Minor variations in silhouette sequences can be diminished in the network's i… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  31. arXiv:2405.12519  [pdf, other

    cs.LG cs.AI q-bio.QM

    MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation

    Authors: Zhaoning Yu, Hongyang Gao

    Abstract: Graph Neural Networks (GNNs) have shown remarkable success in molecular tasks, yet their interpretability remains challenging. Traditional model-level explanation methods like XGNN and GNNInterpreter often fail to identify valid substructures like rings, leading to questionable interpretability. This limitation stems from XGNN's atom-by-atom approach and GNNInterpreter's reliance on average graph… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.08419

  32. arXiv:2405.11468  [pdf, other

    cs.CV

    Emphasizing Crucial Features for Efficient Image Restoration

    Authors: Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, Depeng Dang

    Abstract: Image restoration is a challenging ill-posed problem which estimates latent sharp image from its degraded counterpart. Although the existing methods have achieved promising performance by designing novelty architecture of module, they ignore the fact that different regions in a corrupted image undergo varying degrees of degradation. In this paper, we propose an efficient and effective framework to… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  33. arXiv:2405.10800  [pdf, other

    cs.LG

    Heterogeneity-Informed Meta-Parameter Learning for Spatiotemporal Time Series Forecasting

    Authors: Zheng Dong, Renhe Jiang, Haotian Gao, Hangchen Liu, Jinliang Deng, Qingsong Wen, Xuan Song

    Abstract: Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heter… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD'24 Research Track

  34. arXiv:2405.10550  [pdf, other

    eess.IV cs.CV

    LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

    Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping Zhou

    Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  35. arXiv:2405.10300  [pdf, other

    cs.CV

    Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

    Authors: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

    Abstract: This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model o… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: homepage: https://deepdataspace.com/home

  36. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  37. Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids

    Authors: Junchen Liu, Wenbo Hu, Zhuo Yang, Jianteng Chen, Guoliang Wang, Xiaoxue Chen, Yantong Cai, Huan-ang Gao, Hao Zhao

    Abstract: Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure. This paper introduces a Ripmap-Encoded Platonic Solid representation to precisely and efficiently featurize 3D anisotrop… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024, Project page: https://junchenliu77.github.io/Rip-NeRF , Code: https://github.com/JunchenLiu77/Rip-NeRF

  38. arXiv:2405.02077  [pdf, other

    cs.CV

    MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

    Authors: Hongyu Qu, Rui Yan, Xiangbo Shu, Hailiang Gao, Peng Huang, Guo-Sen Xie

    Abstract: Recent few-shot action recognition (FSAR) methods typically perform semantic matching on learned discriminative features to achieve promising performance. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocit… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  39. arXiv:2405.00056  [pdf, other

    eess.SY cs.GT

    Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation

    Authors: Yousef Emami, Hao Gao, Kai Li, Luis Almeida, Eduardo Tovar, Zhu Han

    Abstract: Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs… ▽ More

    Submitted 2 May, 2024; v1 submitted 24 April, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2312.09953

    MSC Class: 00 ACM Class: C.2

  40. arXiv:2404.19557  [pdf, other

    stat.ML cs.LG

    Neural Dynamic Data Valuation

    Authors: Zhangyong Liang, Huanhuan Gao, Ji Zhang

    Abstract: Data constitute the foundational component of the data economy and its marketplaces. Efficient and fair data valuation has emerged as a topic of significant interest.\ Many approaches based on marginal contribution have shown promising results in various downstream tasks. However, they are well known to be computationally expensive as they require training a large number of utility functions, whic… ▽ More

    Submitted 12 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: 43 pages, 19 figures

  41. arXiv:2404.19247  [pdf, ps, other

    cs.LG cs.CV

    Improved AutoEncoder with LSTM module and KL divergence

    Authors: Wei Huang, Bingyang Zhang, Kaituo Zhang, Hua Gao, Rongchun Wan

    Abstract: The task of anomaly detection is to separate anomalous data from normal data in the dataset. Models such as deep convolutional autoencoder (CAE) network and deep supporting vector data description (SVDD) model have been universally employed and have demonstrated significant success in detecting anomalies. However, the over-reconstruction ability of CAE network for anomalous data can easily lead to… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  42. arXiv:2404.18043  [pdf, ps, other

    cs.CL cs.IR cs.LG

    Utilizing Large Language Models for Information Extraction from Real Estate Transactions

    Authors: Yu Zhao, Haoxiang Gao

    Abstract: Real estate sales contracts contain crucial information for property transactions, but manual extraction of data can be time-consuming and error-prone. This paper explores the application of large language models, specifically transformer-based architectures, for automated information extraction from real estate contracts. We discuss challenges, techniques, and future directions in leveraging thes… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  43. arXiv:2404.16323  [pdf, other

    cs.CV

    DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction

    Authors: Jiamin Wu, Kenkun Liu, Han Gao, Xiaoke Jiang, Lei Zhang

    Abstract: In this paper, we study the problem of 3D reconstruction from a single-view RGB image and propose a novel approach called DIG3D for 3D object reconstruction and novel view synthesis. Our method utilizes an encoder-decoder framework which generates 3D Gaussians in decoder with the guidance of depth-aware image features from encoder. In particular, we introduce the use of deformable transformer, all… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  44. arXiv:2404.15435  [pdf, other

    cs.HC

    Introduction to Eye Tracking: A Hands-On Tutorial for Students and Practitioners

    Authors: Enkelejda Kasneci, Hong Gao, Suleyman Ozdel, Virmarie Maquiling, Enkeleda Thaqi, Carrie Lau, Yao Rong, Gjergji Kasneci, Efe Bozkir

    Abstract: Eye-tracking technology is widely used in various application areas such as psychology, neuroscience, marketing, and human-computer interaction, as it is a valuable tool for understanding how people process information and interact with their environment. This tutorial provides a comprehensive introduction to eye tracking, from the basics of eye anatomy and physiology to the principles and applica… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  45. SGRU: A High-Performance Structured Gated Recurrent Unit for Traffic Flow Prediction

    Authors: Wenfeng Zhang, Xin Li, Anqi Li, Xiaoting Huang, Ti Wang, Honglei Gao

    Abstract: Traffic flow prediction is an essential task in constructing smart cities and is a typical Multivariate Time Series (MTS) Problem. Recent research has abandoned Gated Recurrent Units (GRU) and utilized dilated convolutions or temporal slicing for feature extraction, and they have the following drawbacks: (1) Dilated convolutions fail to capture the features of adjacent time steps, resulting in the… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 7 pages, 6 figures, conference

  46. arXiv:2404.09982  [pdf, other

    cs.CL

    Memory Sharing for Large Language Model based Agents

    Authors: Hang Gao, Yongfeng Zhang

    Abstract: The adaptation of Large Language Model (LLM)-based agents to execute tasks via natural language prompts represents a significant advancement, notably eliminating the need for explicit retraining or fine tuning, but are constrained by the comprehensiveness and diversity of the provided examples, leading to outputs that often diverge significantly from expected results, especially when it comes to t… ▽ More

    Submitted 5 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  47. arXiv:2404.07443  [pdf

    physics.optics cs.ET cs.LG

    1-bit Quantized On-chip Hybrid Diffraction Neural Network Enabled by Authentic All-optical Fully-connected Architecture

    Authors: Yu Shao, Haiqi Gao, Yipeng Chen, Yujie liu, Junren Wen, Haidong He, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

    Abstract: Optical Diffraction Neural Networks (DNNs), a subset of Optical Neural Networks (ONNs), show promise in mirroring the prowess of electronic networks. This study introduces the Hybrid Diffraction Neural Network (HDNN), a novel architecture that incorporates matrix multiplication into DNNs, synergizing the benefits of conventional ONNs with those of DNNs to surmount the modulation limitations inhere… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  48. arXiv:2404.03634  [pdf, other

    cs.RO cs.CV

    PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

    Authors: Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Guyue Zhou, Yixin Zhu, Hao Dong, Hao Zhao

    Abstract: Robotic manipulation with two-finger grippers is challenged by objects lacking distinct graspable features. Traditional pre-grasping methods, which typically involve repositioning objects or utilizing external aids like table edges, are limited in their adaptability across different object categories and environments. To overcome these limitations, we introduce PreAfford, a novel pre-grasping plan… ▽ More

    Submitted 4 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://air-discover.github.io/PreAfford/

  49. arXiv:2404.01273  [pdf, other

    cs.LG cs.CL stat.ME

    TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

    Authors: Yue Wang, Tianfan Fu, Yinlong Xu, Zihan Ma, Hongxia Xu, Yingzhou Lu, Bang Du, Honghao Gao, Jian Wu

    Abstract: Clinical trials are indispensable for medical research and the development of new treatments. However, clinical trials often involve thousands of participants and can span several years to complete, with a high probability of failure during the process. Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significa… ▽ More

    Submitted 28 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  50. arXiv:2403.20106  [pdf, other

    cs.CV cs.LG

    Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring

    Authors: Hu Gao, Depeng Dang

    Abstract: Image deblurring aims to restore a high-quality image from its corresponding blurred. The emergence of CNNs and Transformers has enabled significant progress. However, these methods often face the dilemma between eliminating long-range degradation perturbations and maintaining computational efficiency. While the selective state space model (SSM) shows promise in modeling long-range dependencies wi… ▽ More

    Submitted 5 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.