Skip to main content

Showing 1–50 of 672 results for author: Huang, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13609  [pdf, other

    cs.CV cs.AI

    Training-free Composite Scene Generation for Layout-to-Image Synthesis

    Authors: Jiaqi Liu, Tao Huang, Chang Xu

    Abstract: Recent breakthroughs in text-to-image diffusion models have significantly advanced the generation of high-fidelity, photo-realistic images from textual descriptions. Yet, these models often struggle with interpreting spatial arrangements from text, hindering their ability to produce images with precise spatial configurations. To bridge this gap, layout-to-image generation has emerged as a promisin… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  2. arXiv:2407.13248  [pdf, other

    cs.CL

    Are Large Language Models Capable of Generating Human-Level Narratives?

    Authors: Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng

    Abstract: This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects: i) story arcs, ii) turning points, and iii) affective dimensions, including arousal and valence. By leveraging expert and automatic annotations, we uncover significant discre… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.11004  [pdf, other

    cs.CL cs.AI cs.LG

    The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators

    Authors: Tzu-Heng Huang, Catherine Cao, Vaishnavi Bhargava, Frederic Sala

    Abstract: Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  4. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. arXiv:2407.10603  [pdf, other

    eess.AS cs.CL cs.SD

    Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data

    Authors: Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee, Tsung-Ren Huang, Hung-yi Lee

    Abstract: Recent advances in automatic speech recognition (ASR) often rely on large speech foundation models for generating high-quality transcriptions. However, these models can be impractical due to limited computing resources. The situation is even more severe in terms of more realistic or difficult scenarios, such as code-switching ASR (CS-ASR). To address this, we present a framework for developing mor… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  6. arXiv:2407.10062  [pdf, other

    cs.CV

    SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion

    Authors: Jiyuan Zhang, Kang Chen, Shiyan Chen, Yajing Zheng, Tiejun Huang, Zhaofei Yu

    Abstract: Novel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-wor… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  7. arXiv:2407.09808  [pdf, other

    cs.NI

    SeqBalance: Congestion-Aware Load Balancing with no Reordering for RoCE

    Authors: Huimin Luo, Jiao Zhang, Mingxuan Yu, Yongchen Pan, Tian Pan, Tao Huang

    Abstract: Remote Direct Memory Access (RDMA) is widely used in data center networks because of its high performance. However, due to the characteristics of RDMA's retransmission strategy and the traffic mode of AI training, current load balancing schemes for data center networks are unsuitable for RDMA. In this paper, we propose SeqBalance, a load balancing framework designed for RDMA. SeqBalance implements… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  8. arXiv:2407.09486  [pdf, other

    cs.DC cs.AI

    ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving

    Authors: Tao Huang, Pengfei Chen, Kyoka Gong, Jocky Hawk, Zachary Bright, Wenxin Xie, Kecheng Huang, Zhi Ji

    Abstract: Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, mo… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

  9. arXiv:2407.08206  [pdf

    cs.CL

    System Report for CCL24-Eval Task 7: Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay Evaluation

    Authors: Jingshen Zhang, Xiangyu Yang, Xinkai Su, Xinglu Chen, Tianyou Huang, Xinying Qiu

    Abstract: This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types pe… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.06483  [pdf, other

    cs.LG cs.CL

    Composable Interventions for Language Models

    Authors: Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen

    Abstract: Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventi… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  11. arXiv:2407.03771  [pdf, other

    cs.CV

    SpikeGS: Reconstruct 3D scene via fast-moving bio-inspired sensors

    Authors: Yijia Guo, Liwen Hu, Lei Ma, Tiejun Huang

    Abstract: 3D Gaussian Splatting (3DGS) demonstrates unparalleled superior performance in 3D scene reconstruction. However, 3DGS heavily relies on the sharp images. Fulfilling this requirement can be challenging in real-world scenarios especially when the camera moves fast, which severely limits the application of 3DGS. To address these challenges, we proposed Spike Gausian Splatting (SpikeGS), the first fra… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  12. arXiv:2407.02783  [pdf, ps, other

    cs.CL cs.AI

    52B to 1T: Lessons Learned via Tele-FLM Series

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: For the Tele-FLM-52B tech report, see also 2404.16645

  13. arXiv:2406.18992  [pdf, other

    cs.CV cs.AI cs.LG

    Semi-supervised Concept Bottleneck Models

    Authors: Lijie Hu, Tianhao Huang, Huanyi Xie, Chenyang Ren, Zhengyu Hu, Lu Yu, Di Wang

    Abstract: Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 17 pages

  14. arXiv:2406.18485  [pdf, other

    cs.DC

    LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

    Authors: Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong, Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin Jin, Xuanzhe Liu

    Abstract: Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  15. arXiv:2406.16986  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

    Authors: Tao Huang, Ziyang Chen, Jiayang Meng, Qingyu Huang, Xu Yang, Xun Yi, Ibrahim Khalil

    Abstract: In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  16. arXiv:2406.12787  [pdf, other

    cs.CL cs.HC

    Generating Educational Materials with Different Levels of Readability using LLMs

    Authors: Chieh-Yang Huang, Jing Wei, Ting-Hao 'Kenneth' Huang

    Abstract: This study introduces the leveled-text generation task, aiming to rewrite educational materials to specific readability levels while preserving meaning. We assess the capability of GPT-3.5, LLaMA-2 70B, and Mixtral 8x7B, to generate content at various readability levels through zero-shot and few-shot prompting. Evaluating 100 processed educational materials reveals that few-shot prompting signific… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: In2Writing 2024

  17. arXiv:2406.12243  [pdf, other

    cs.IR cs.AI

    CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework

    Authors: Shaohuang Wang, Lun Wang, Yunhan Bu, Tianwei Huang

    Abstract: Large Language Models (LLMs) have achieved remarkable progress in language understanding and generation. Custom LLMs leveraging textual features have been applied to recommendation systems, demonstrating improvements across various recommendation scenarios. However, most existing methods perform untrained recommendation based on pre-trained knowledge (e.g., movie recommendation), and the auto-regr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.09484  [pdf, other

    cs.CV cs.CR

    Is Diffusion Model Safe? Severe Data Leakage via Gradient-Guided Diffusion Model

    Authors: Jiayang Meng, Tao Huang, Hong Chen, Cuiping Li

    Abstract: Gradient leakage has been identified as a potential source of privacy breaches in modern image processing systems, where the adversary can completely reconstruct the training images from leaked gradients. However, existing methods are restricted to reconstructing low-resolution images where data leakage risks of image processing systems are not sufficiently explored. In this paper, by exploiting d… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2406.08477  [pdf, other

    cs.IR

    Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens

    Authors: Ting-Ji Huang, Jia-Qi Yang, Chunxu Shen, Kai-Qi Liu, De-Chuan Zhan, Han-Jia Ye

    Abstract: Characterizing users and items through vector representations is crucial for various tasks in recommender systems. Recent approaches attempt to apply Large Language Models (LLMs) in recommendation through a question and answer format, where real users and items (e.g., Item No.2024) are represented with in-vocabulary tokens (e.g., "item", "20", "24"). However, since LLMs are typically pretrained on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.06954  [pdf, ps, other

    cs.LG math.OC

    Distributional MIPLIB: a Multi-Domain Library for Advancing ML-Guided MILP Methods

    Authors: Weimin Huang, Taoan Huang, Aaron M Ferber, Bistra Dilkina

    Abstract: Mixed Integer Linear Programming (MILP) is a fundamental tool for modeling combinatorial optimization problems. Recently, a growing body of research has used machine learning to accelerate MILP solving. Despite the increasing popularity of this approach, there is a lack of a common repository that provides distributions of similar MILP instances across different domains, at different hardness leve… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  21. arXiv:2406.06934  [pdf

    cs.CY cs.ET cs.NI cs.SI

    Decentralized Social Networks and the Future of Free Speech Online

    Authors: Tao Huang

    Abstract: Decentralized social networks like Mastodon and BlueSky are trending topics that have drawn much attention and discussion in recent years. By devolving powers from the central node to the end users, decentralized social networks aim to cure existing pathologies on the centralized platforms and have been viewed by many as the future of the Internet. This article critically and systematically assess… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 25 pages

  22. arXiv:2406.05774  [pdf, other

    cs.CV

    VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction

    Authors: Hanlin Chen, Fangyin Wei, Chen Li, Tianxin Huang, Yunsong Wang, Gim Hee Lee

    Abstract: Although 3D Gaussian Splatting has been widely studied because of its realistic and efficient novel-view synthesis, it is still challenging to extract a high-quality surface from the point-based representation. Previous works improve the surface by incorporating geometric priors from the off-the-shelf normal estimator. However, there are two main limitations: 1) Supervising normal rendered from 3D… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  23. arXiv:2406.04264  [pdf, other

    cs.CV cs.AI cs.CL

    MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

    Authors: Junjie Zhou, Yan Shu, Bo Zhao, Boya Wu, Shitao Xiao, Xi Yang, Yongping Xiong, Bo Zhang, Tiejun Huang, Zheng Liu

    Abstract: The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To addres… ▽ More

    Submitted 19 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  24. arXiv:2406.01476  [pdf, other

    cs.CV

    DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors

    Authors: Tianyu Huang, Yihan Zeng, Hui Li, Wangmeng Zuo, Rynson W. H. Lau

    Abstract: Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Technical report. Codes are released at: https://github.com/tyhuang0428/DreamPhysics

  25. arXiv:2406.00956  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

    Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

    Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project Link: https://sam-auxol.github.io/AuxOL/

  26. arXiv:2406.00383  [pdf, other

    cs.CV

    SpikeMM: Flexi-Magnification of High-Speed Micro-Motions

    Authors: Baoyue Zhang, Yajing Zheng, Shiyan Chen, Jiyuan Zhang, Kang Chen, Zhaofei Yu, Tiejun Huang

    Abstract: The amplification of high-speed micro-motions holds significant promise, with applications spanning fault detection in fast-paced industrial environments to refining precision in medical procedures. However, conventional motion magnification algorithms often encounter challenges in high-speed scenarios due to low sampling rates or motion blur. In recent years, spike cameras have emerged as a super… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  27. arXiv:2405.20694  [pdf, other

    cs.NE

    Robust Stable Spiking Neural Networks

    Authors: Jianhao Ding, Zhiyu Pan, Yujia Liu, Zhaofei Yu, Tiejun Huang

    Abstract: Spiking neural networks (SNNs) are gaining popularity in deep learning due to their low energy budget on neuromorphic hardware. However, they still face challenges in lacking sufficient robustness to guard safety-critical applications such as autonomous driving. Many studies have been conducted to defend SNNs from the threat of adversarial attacks. This paper aims to uncover the robustness of SNN… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  28. arXiv:2405.20355  [pdf, other

    cs.NE cs.CR cs.CV cs.LG

    Enhancing Adversarial Robustness in SNNs with Sparse Gradients

    Authors: Yujia Liu, Tong Bu, Jianhao Ding, Zecheng Hao, Tiejun Huang, Zhaofei Yu

    Abstract: Spiking Neural Networks (SNNs) have attracted great attention for their energy-efficient operations and biologically inspired structures, offering potential advantages over Artificial Neural Networks (ANNs) in terms of energy efficiency and interpretability. Nonetheless, similar to ANNs, the robustness of SNNs remains a challenge, especially when facing adversarial attacks. Existing techniques, wh… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: accepted by ICML 2024

  29. arXiv:2405.19012  [pdf, other

    cs.AI

    Implicit Neural Image Field for Biological Microscopy Image Compression

    Authors: Gaole Dai, Cheng-Ching Tseng, Qingpo Wuwu, Rongyu Zhang, Shaokang Wang, Ming Lu, Tiejun Huang, Yu Zhou, Ali Ata Tuz, Matthias Gunzer, Jianxu Chen, Shanghang Zhang

    Abstract: The rapid pace of innovation in biological microscopy imaging has led to large images, putting pressure on data storage and impeding efficient sharing, management, and visualization. This necessitates the development of efficient compression solutions. Traditional CODEC methods struggle to adapt to the diverse bioimaging data and often suffer from sub-optimal compression. In this study, we propose… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  30. arXiv:2405.18641  [pdf, other

    cs.LG

    Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning

    Authors: Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu

    Abstract: Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we show that the jail-broken effect can be mitigated by separating states in the finetuning stage to optimize the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimizatio… ▽ More

    Submitted 26 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  31. arXiv:2405.17958  [pdf, other

    cs.CV

    FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes

    Authors: Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee

    Abstract: Empowering 3D Gaussian Splatting with generalization ability is appealing. However, existing generalizable 3D Gaussian Splatting methods are largely confined to narrow-range interpolation between stereo images due to their heavy backbones, thus lacking the ability to accurately localize 3D Gaussian and support free-view synthesis across wide view range. In this paper, we present a novel framework… ▽ More

    Submitted 9 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  32. arXiv:2405.16790  [pdf, other

    cs.CV

    SCSim: A Realistic Spike Cameras Simulator

    Authors: Liwen Hu, Lei Ma, Yijia Guo, Tiejun Huang

    Abstract: Spike cameras, with their exceptional temporal resolution, are revolutionizing high-speed visual applications. Large-scale synthetic datasets have significantly accelerated the development of these cameras, particularly in reconstruction and optical flow. However, current synthetic datasets for spike cameras lack sophistication. Addressing this gap, we introduce SCSim, a novel and more realistic s… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME2024. arXiv admin note: substantial text overlap with arXiv:2304.03129

  33. arXiv:2405.14830  [pdf, other

    hep-lat cond-mat.dis-nn cond-mat.str-el cs.LG hep-th

    Deep learning lattice gauge theories

    Authors: Anuj Apte, Anthony Ashmore, Clay Cordova, Tzu-Chen Huang

    Abstract: Monte Carlo methods have led to profound insights into the strong-coupling behaviour of lattice gauge theories and produced remarkable results such as first-principles computations of hadron masses. Despite tremendous progress over the last four decades, fundamental challenges such as the sign problem and the inability to simulate real-time dynamics remain. Neural network quantum states have emerg… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  34. arXiv:2405.14156  [pdf, other

    cs.CV

    Unveiling the Tapestry of Consistency in Large Vision-Language Models

    Authors: Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo

    Abstract: Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in different sizes of solution spaces, LVLMs fail to always give consistent answers regarding the same knowledge point. This inconsistency of answers between different solution spaces is prevalent in LVLMs an… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: This project is available at https://github.com/foundation-multimodal-models/ConBench

  35. Physics-based Scene Layout Generation from Human Motion

    Authors: Jianan Li, Tao Huang, Qingxu Zhu, Tien-Tsin Wong

    Abstract: Creating scenes for captured motions that achieve realistic human-scene interaction is crucial for 3D animation in movies or video games. As character motion is often captured in a blue-screened studio without real furniture or objects in place, there may be a discrepancy between the planned motion and the captured one. This gives rise to the need for automatic scene layout generation to relieve t… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH conference

  36. arXiv:2405.11884  [pdf, other

    cs.LG cs.DC

    Vertical Federated Learning Hybrid Local Pre-training

    Authors: Wenguo Li, Xinling Guo, Xu Jiao, Tiancheng Huang, Xiaoran Yan, Yao Yang

    Abstract: Vertical Federated Learning (VFL), which has a broad range of real-world applications, has received much attention in both academia and industry. Enterprises aspire to exploit more valuable features of the same users from diverse departments to boost their model prediction skills. VFL addresses this demand and concurrently secures individual parties from exposing their raw data. However, conventio… ▽ More

    Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  37. arXiv:2405.10565  [pdf, other

    cs.GR

    Real-time Level-of-Detail Strand-based Hair Rendering

    Authors: Tao Huang, Yang Zhou, Daqi Lin, Junqiu Zhu, Ling-Qi Yan, Kui Wu

    Abstract: Strand-based hair rendering has become increasingly popular in production for its realistic appearance. However, the prevailing level-of-detail solution employing hair cards for distant hair models introduces a significant discontinuity in dynamics and appearance during the transition from strands to cards. We introduce an innovative real-time framework for strand-based hair rendering that ensures… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 12 pages, 10 figures, 1 performance plot

    ACM Class: I.3.5; I.3.3

  38. arXiv:2405.10128  [pdf, other

    cs.CL cs.AI

    Red Teaming Language Models for Contradictory Dialogues

    Authors: Xiaofei Wen, Bangzheng Li, Tenghao Huang, Muhao Chen

    Abstract: Most language models currently available are prone to self-contradiction during dialogues. To mitigate this issue, this study explores a novel contradictory dialogue processing task that aims to detect and modify contradictory statements in a conversation. This task is inspired by research on context faithfulness and dialogue comprehension, which have demonstrated that the detection and understand… ▽ More

    Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: 18 pages, 5 figures

  39. arXiv:2405.06693  [pdf, other

    q-bio.BM cs.LG

    SurfPro: Functional Protein Design Based on Continuous Surface

    Authors: Zhenqiao Song, Tinglin Huang, Lei Li, Wengong Jin

    Abstract: How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models… ▽ More

    Submitted 17 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  40. arXiv:2405.05164  [pdf, other

    cs.CV

    ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion

    Authors: Bing Zhu, Zixin He, Weiyi Xiong, Guanhua Ding, Jianan Liu, Tao Huang, Wei Chen, Wei Xiang

    Abstract: Millimeter wave (mmWave) radar is a non-intrusive privacy and relatively convenient and inexpensive device, which has been demonstrated to be applicable in place of RGB cameras in human indoor pose estimation tasks. However, mmWave radar relies on the collection of reflected signals from the target, and the radar signals containing information is difficult to be fully applied. This has been a long… ▽ More

    Submitted 28 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  41. arXiv:2405.01115  [pdf

    cs.RO eess.SY

    A New Self-Alignment Method without Solving Wahba Problem for SINS in Autonomous Vehicles

    Authors: Hongliang Zhang, Yilan Zhou, Lei Wang, Tengchao Huang

    Abstract: Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study pres… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  42. arXiv:2404.18253  [pdf, other

    cs.CV cs.LG

    Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

    Authors: Tengjun Huang

    Abstract: With the rise of Visual and Language Pretraining (VLP), an increasing number of downstream tasks are adopting the paradigm of pretraining followed by fine-tuning. Although this paradigm has demonstrated potential in various multimodal downstream tasks, its implementation in the remote sensing domain encounters some obstacles. Specifically, the tendency for same-modality embeddings to cluster toget… ▽ More

    Submitted 28 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: Accepted by the Twelfth International Conference on Learning Representations (ICLR) Workshop

  43. arXiv:2404.18214  [pdf, other

    cs.IR cs.AI cs.HC

    Contrastive Learning Method for Sequential Recommendation based on Multi-Intention Disentanglement

    Authors: Zeyu Hu, Yuzhi Xiao, Tao Huang, Xuanrong Huo

    Abstract: Sequential recommendation is one of the important branches of recommender system, aiming to achieve personalized recommended items for the future through the analysis and prediction of users' ordered historical interactive behaviors. However, along with the growth of the user volume and the increasingly rich behavioral information, how to understand and disentangle the user's interactive multi-int… ▽ More

    Submitted 8 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  44. arXiv:2404.18105  [pdf, other

    cs.RO eess.SP

    Tightly-Coupled VLP/INS Integrated Navigation by Inclination Estimation and Blockage Handling

    Authors: Xiao Sun, Yuan Zhuang, Xiansheng Yang, Jianzhu Huai, Tianming Huang, Daquan Feng

    Abstract: Visible Light Positioning (VLP) has emerged as a promising technology capable of delivering indoor localization with high accuracy. In VLP systems that use Photodiodes (PDs) as light receivers, the Received Signal Strength (RSS) is affected by the incidence angle of light, making the inclination of PDs a critical parameter in the positioning model. Currently, most studies assume the inclination to… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  45. arXiv:2404.17147  [pdf, other

    cs.CV cs.LG

    On the Federated Learning Framework for Cooperative Perception

    Authors: Zhenrong Zhang, Jianan Liu, Xi Zhou, Tao Huang, Qing-Long Han, Jingxin Liu, Hongbin Liu

    Abstract: Cooperative perception is essential to enhance the efficiency and safety of future transportation systems, requiring extensive data sharing among vehicles on the road, which raises significant privacy concerns. Federated learning offers a promising solution by enabling data privacy-preserving collaborative enhancements in perception, decision-making, and planning among connected and autonomous veh… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  46. arXiv:2404.17025  [pdf, other

    cs.HC

    How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

    Authors: Shih-Hong Huang, Ya-Fang Lin, Zeyu He, Chieh-Yang Huang, Ting-Hao 'Kenneth' Huang

    Abstract: Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing LLMs to engage in longer or shorter conversations affect conversation quality? In this paper, we developed two Slack chatbots using GPT-4 with the ability to vary… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  47. arXiv:2404.16645  [pdf, other

    cs.CL cs.AI

    Tele-FLM Technical Report

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  48. arXiv:2404.16386  [pdf, other

    cs.CV

    Promoting CNNs with Cross-Architecture Knowledge Distillation for Efficient Monocular Depth Estimation

    Authors: Zhimeng Zheng, Tao Huang, Gongsheng Li, Zuyi Wang

    Abstract: Recently, the performance of monocular depth estimation (MDE) has been significantly boosted with the integration of transformer models. However, the transformer models are usually computationally-expensive, and their effectiveness in light-weight models are limited compared to convolutions. This limitation hinders their deployment on resource-limited devices. In this paper, we propose a cross-arc… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  49. arXiv:2404.12867  [pdf, other

    cs.CV cs.RO

    FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving

    Authors: Xingtai Gui, Tengteng Huang, Haonan Shao, Haotian Yao, Chi Zhang

    Abstract: The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving, which involves future instance segmentation and instance motion prediction. Existing methods usually rely on a redundant and complex pipeline which requires multiple auxiliary outputs and post-processing procedures. Moreover, estimated errors on each of the auxiliary predictions will… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  50. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred