Zum Hauptinhalt springen

Showing 1–50 of 148 results for author: Jia, F

.
  1. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

  2. arXiv:2408.07605  [pdf, other

    cs.CV

    Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving

    Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

    Abstract: The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://panacea-ad.github.io/. arXiv admin note: text overlap with arXiv:2311.16813

  3. arXiv:2408.05705  [pdf, other

    eess.IV cs.AI cs.CV

    TC-KANRecon: High-Quality and Accelerated MRI Reconstruction via Adaptive KAN Mechanisms and Intelligent Feature Scaling

    Authors: Ruiquan Ge, Xiao Yu, Yifei Chen, Fan Jia, Shenghao Zhu, Guanyu Zhou, Yiyu Huang, Chenyan Zhang, Dong Zeng, Changmiao Wang, Qiegen Liu, Shanzhou Niu

    Abstract: Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures

  4. arXiv:2408.05208  [pdf, other

    hep-th cond-mat.str-el gr-qc math-ph

    Holographic thermal correlators and quasinormal modes from semiclassical Virasoro blocks

    Authors: Hewei Frederic Jia, Mukund Rangamani

    Abstract: Motivated by its relevance for thermal correlators in strongly coupled holographic CFTs, we refine and further develop a recent exact analytic approach to black hole perturbation problem, based on the semiclassical Virasoro blocks, or equivalently via AGT relation, the Nekrasov partition functions in the Nekrasov-Shatashvili limit. Focusing on asymptotically $\text{AdS}_5$ black hole backgrounds,… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 69 pages, 3 figures

  5. arXiv:2408.04545  [pdf, other

    cs.GT

    Balancing Efficiency with Equality: Auction Design with Group Fairness Concerns

    Authors: Fengjuan Jia, Mengxiao Zhang, Jiamou Liu, Bakh Khoussainov

    Abstract: The issue of fairness in AI arises from discriminatory practices in applications like job recommendations and risk assessments, emphasising the need for algorithms that do not discriminate based on group characteristics. This concern is also pertinent to auctions, commonly used for resource allocation, which necessitate fairness considerations. Our study examines auctions with groups distinguished… ▽ More

    Submitted 9 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  6. arXiv:2408.00988  [pdf, other

    physics.atom-ph

    Measurement of microwave polarization using two polarization orthogonal local microwave electric fields in a Rydberg atom-based mixer

    Authors: Weibo Yin, Jianan Zhang, Fengdong Jia, Yuhan Wang, Yuxiang Wang, Jianhai Hao, Yue Cui, Ya Liu, Zhiping Zhong

    Abstract: We propose and demonstrate a novel method for measuring the polarization direction of a microwave electric field in a single measurement using a Rydberg atom-based mixer with two orthogonally polarized local microwave electric fields. Furthermore, introducing a weak static magnetic field enables the utilization of the Zeeman effect and exploitation of polarization asymmetry. This distinction allow… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  7. arXiv:2407.19219  [pdf, other

    astro-ph.SR astro-ph.GA

    Primeval very low-mass stars and brown dwarfs -- VIII. The first age benchmark L subdwarf, a wide companion to a halo white dwarf

    Authors: Z. H. Zhang, R. Raddi, A. J. Burgasser, S. L. Casewell, R. L. Smart, M. C. Galvez-Ortiz, H. R. A. Jones, S. Baig, N. Lodieu, B. Gauza, Ya. V. Pavlenko, Y. F. Jiao, Z. K. Zhao, S. Y. Zhou, D. J. Pinfield

    Abstract: We report the discovery of five white dwarf + ultracool dwarf systems identified as common proper motion wide binaries in the Gaia Catalogue of Nearby Stars. The discoveries include a white dwarf + L subdwarf binary, VVV 1256-62AB, a gravitationally bound system located 75.6(+1.9/-1.8) pc away with a projected separation of 1375(+35/-33) au. The primary is a cool DC white dwarf with a hydrogen dom… ▽ More

    Submitted 17 August, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: 15 pages, 12 figures

  8. arXiv:2407.17337  [pdf, ps, other

    cond-mat.supr-con

    Raman Spectroscopic Study on Bi2Rh3Se2: Two-dimensional-Ising Charge Density Wave and Quantum Fluctuations

    Authors: Fei Jiao, Yonghui Zhou, Shuyang Wang, Chao An, Xuliang Chen, Ying Zhou, Min Zhang, Liang Cao, Xigang Luo, Yimin Xiong, Zhaorong Yang

    Abstract: The ternary chalcogenide Bi2Rh3Se2 was found to be a charge density wave (CDW) superconductor with a 2*2 periodicity. The key questions regarding the underlying mechanism of CDW state and its interplay with lattice and electronic properties remains to be explored. Here, based on the systematic Raman scattering investigations on single crystalline Bi2Rh3Se2, we observed the fingerprinting feature o… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  9. arXiv:2407.15719  [pdf, other

    cs.CV cs.AI

    GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI

    Authors: Zhaojie Fang, Shenghao Zhu, Yifei Chen, Binfeng Zou, Fan Jia, Linwei Qiu, Chang Liu, Yiyu Huang, Xiang Feng, Feiwei Qin, Changmiao Wang, Yeru Wang, Jin Fan, Changbiao Chu, Wan-Zhen Wu, Hu Zhao

    Abstract: Alzheimer's Disease (AD) is an irreversible neurodegenerative disorder that often progresses from Mild Cognitive Impairment (MCI), leading to memory loss and significantly impacting patients' lives. Clinical trials indicate that early targeted interventions for MCI patients can potentially slow or halt the development and progression of AD. Previous research has shown that accurate medical classif… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 35 pages, 4 figures

  10. arXiv:2407.04368  [pdf, other

    cs.CL cs.SD eess.AS

    Romanization Encoding For Multilingual ASR

    Authors: Wen Ding, Fei Jia, Hainan Xu, Yu Xi, Junjie Lai, Boris Ginsburg

    Abstract: We introduce romanization encoding for script-heavy languages to optimize multilingual and code-switching Automatic Speech Recognition (ASR) systems. By adopting romanization encoding alongside a balanced concatenated tokenizer within a FastConformer-RNNT framework equipped with a Roman2Char module, we significantly reduce vocabulary and output dimensions, enabling larger training batches and redu… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  11. arXiv:2406.15185  [pdf, other

    physics.soc-ph

    Percolation transition of k-frequent destinations network for urban mobility

    Authors: Weiyu Zhang, Furong Jia, Jianying Wang, Yu Liu, Gezhi Xiu

    Abstract: Urban spatial interactions are a complex aggregation of routine visits and random explorations by individuals. The inherent uncertainty of these random visitations poses significant challenges to understanding urban structures and socioeconomic developments. To capture the core dynamics of urban interaction networks, we analyze the percolation structure of the $k$-most frequented destinations of i… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  12. arXiv:2406.10907  [pdf, other

    cs.CV

    SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection

    Authors: Lin Liu, Ziying Song, Qiming Xia, Feiyang Jia, Caiyan Jia, Lei Yang, Hongyu Pan

    Abstract: LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications due to its computational efficiency advantages. Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy. However, the former lacks the ability to aggregate contextual information, resulting in insufficie… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.02702

  13. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, Jin Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  14. arXiv:2405.18361  [pdf, other

    cs.CV

    Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

    Authors: Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang

    Abstract: Rapid advancements in Autonomous Driving (AD) tasks turned a significant shift toward end-to-end fashion, particularly in the utilization of vision-language models (VLMs) that integrate robust logical reasoning and cognitive abilities to enable comprehensive end-to-end planning. However, these VLM-based approaches tend to integrate 2D vision tokenizers and a large language model (LLM) for ego-car… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  15. arXiv:2405.16873  [pdf, other

    cs.CV

    ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection

    Authors: Ziying Song, Feiyang Jia, Hongyu Pan, Yadan Luo, Caiyan Jia, Guoxin Zhang, Lin Liu, Yang Ji, Lei Yang, Li Wang

    Abstract: In the field of 3D object detection tasks, fusing heterogeneous features from LiDAR and camera sensors into a unified Bird's Eye View (BEV) representation is a widely adopted paradigm. However, existing methods are often compromised by imprecise sensor calibration, resulting in feature misalignment in LiDAR-camera BEV fusion. Moreover, such inaccuracies result in errors in depth estimation for the… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  16. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  17. Quasiparticle and Excitonic Structures of Few-layer and Bulk GaSe: Interlayer Coupling, Self-energy, and Electron-hole Interaction

    Authors: Fanhao Jia, Zhao Tang, Greis J. Cruz, Weiwei Gao, Shaowen Xu, Wei Ren, Peihong Zhang

    Abstract: Metal monochalcogenide GaSe is a classic layered semiconductor that has received increasing research interest due to its highly tunable electronic and optical properties for ultrathin electronics applications. Despite intense research efforts, a systematic understanding of the layer-dependent electronic and optical properties of GaSe remains to be established, and there appear significant discrepa… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Journal ref: Phys. Rev. Applied 21, 054019 (2024)

  18. arXiv:2404.14604  [pdf, other

    cs.CL

    Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training

    Authors: Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang

    Abstract: Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex multimodal mathematical reasoning, lagging behind proprietary models like GPT-4V(ision) and Gemini-Pro. Although fine-tuning with intermediate steps (i.e., rationales) elicits some mathematical reasoning skills, the resulting models still fall short in vis… ▽ More

    Submitted 25 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  19. arXiv:2404.12728  [pdf, other

    cs.CL

    Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

    Authors: Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty

    Abstract: Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context… ▽ More

    Submitted 23 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  20. arXiv:2404.06654  [pdf, other

    cs.CL

    RULER: What's the Real Context Size of Your Long-Context Language Models?

    Authors: Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, Boris Ginsburg

    Abstract: The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is indicative of only a superficial form of long-context understanding. To provide a more comprehensive evaluation of long-con… ▽ More

    Submitted 6 August, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: COLM 2024; Code is available at https://github.com/hsiehjackson/RULER

  21. arXiv:2404.04295  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

    Authors: Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg

    Abstract: This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared components for text tokens with the same or similar pronunciations. With experiments conducted in multiple datasets in Mandarin Chinese and Korean, we show that P… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: accepted at the ICASSP 2024 conference

  22. arXiv:2404.00699  [pdf, other

    cs.CL

    How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library

    Authors: Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, Shafiq Joty

    Abstract: With the rise of Large Language Models (LLMs) in recent years, abundant new opportunities are emerging, but also new challenges, among which contamination is quickly becoming critical. Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pres… ▽ More

    Submitted 20 August, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure, 1 table

  23. arXiv:2403.19438  [pdf, other

    cs.CV cs.RO

    SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

    Authors: Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang

    Abstract: Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project page: https://subjectdrive.github.io/

  24. arXiv:2403.11848  [pdf, other

    cs.CV

    GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection

    Authors: Ziying Song, Lei Yang, Shaoqing Xu, Lin Liu, Dongyang Xu, Caiyan Jia, Feiyang Jia, Li Wang

    Abstract: Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between… ▽ More

    Submitted 2 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  25. arXiv:2402.16810  [pdf

    cs.CL

    OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)

    Authors: Fujian Jia, Xin Liu, Lixi Deng, Jiwen Gu, Chunchao Pu, Tunan Bai, Mengjiang Huang, Yuanzhi Lu, Kang Liu

    Abstract: In the past year, there has been a growing trend in applying Large Language Models (LLMs) to the field of medicine, particularly with the advent of advanced language models such as ChatGPT developed by OpenAI. However, there is limited research on LLMs specifically addressing oncology-related queries. The primary aim of this research was to develop a specialized language model that demonstrates im… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  26. arXiv:2402.10176  [pdf, other

    cs.CL cs.AI cs.LG

    OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

    Authors: Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman

    Abstract: Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limitin… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Data and models are available at https://huggingface.co/collections/nvidia/openmath-65c5619de2ba059be0775014

  27. arXiv:2402.04559  [pdf, other

    cs.AI cs.CL cs.HC

    Can Large Language Model Agents Simulate Human Trust Behaviors?

    Authors: Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li

    Abstract: Large Language Model (LLM) agents have been increasingly adopted as simulation tools to model humans in applications such as social science. However, one fundamental question remains: can LLM agents really simulate human behaviors? In this paper, we focus on one of the most critical behaviors in human interactions, trust, and aim to investigate whether or not LLM agents can simulate human trust be… ▽ More

    Submitted 10 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: The first two authors contributed equally. Project website: https://www.camel-ai.org/research/agent-trust

  28. arXiv:2402.00658  [pdf, other

    cs.AI cs.CL

    Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

    Authors: Fangkai Jiao, Chengwei Qin, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in handling complex reasoning tasks through step-by-step rationale generation. However, recent studies have raised concerns regarding the hallucination and flaws in their reasoning process. Substantial efforts are being made to improve the reliability and faithfulness of the generated rationales. Some approaches model reasoning a… ▽ More

    Submitted 15 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 17 pages, 9 figures

  29. arXiv:2401.14754  [pdf, other

    cs.CV

    VJT: A Video Transformer on Joint Tasks of Deblurring, Low-light Enhancement and Denoising

    Authors: Yuxiang Hui, Yang Liu, Yaofang Liu, Fan Jia, Jinshan Pan, Raymond Chan, Tieyong Zeng

    Abstract: Video restoration task aims to recover high-quality videos from low-quality observations. This contains various important sub-tasks, such as video denoising, deblurring and low-light enhancement, since video often faces different types of degradation, such as blur, low light, and noise. Even worse, these kinds of degradation could happen simultaneously when taking videos in extreme environments. T… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 12 pages,8 figures

  30. arXiv:2401.09112  [pdf, other

    cs.CV

    Stream Query Denoising for Vectorized HD Map Construction

    Authors: Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao

    Abstract: To enhance perception performance in complex and extensive scenarios within the realm of autonomous driving, there has been a noteworthy focus on temporal modeling, with a particular emphasis on streaming methods. The prevailing trend in streaming models involves the utilization of stream queries for the propagation of temporal information. Despite the prevalence of this approach, the direct appli… ▽ More

    Submitted 17 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  31. arXiv:2401.06542  [pdf, other

    cs.CV

    Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook

    Authors: Ziying Song, Lin Liu, Feiyang Jia, Yadan Luo, Guoxin Zhang, Lei Yang, Li Wang, Caiyan Jia

    Abstract: In the realm of modern autonomous driving, the perception system is indispensable for accurately assessing the state of the surrounding environment, thereby enabling informed prediction and planning. The key step to this system is related to 3D object detection that utilizes vehicle-mounted sensors such as LiDAR and cameras to identify the size, the category, and the location of nearby objects. De… ▽ More

    Submitted 15 August, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  32. arXiv:2401.03907  [pdf, other

    cs.CV

    RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

    Authors: Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, Li Wang

    Abstract: Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD).Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for im… ▽ More

    Submitted 23 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  33. arXiv:2401.00496  [pdf, other

    cs.CV cs.AI cs.LG

    SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

    Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

    Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More

    Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  34. arXiv:2312.17055  [pdf, other

    cs.CL

    Improving In-context Learning via Bidirectional Alignment

    Authors: Chengwei Qin, Wenhan Xia, Fangkai Jiao, Chen Chen, Yuchen Hu, Bosheng Ding, Shafiq Joty

    Abstract: Large language models (LLMs) have shown impressive few-shot generalization on many tasks via in-context learning (ICL). Despite their success in showing such emergent abilities, the scale and complexity of larger models also lead to unprecedentedly high computational demands and deployment challenges. In reaction, researchers explore transferring the powerful capabilities of larger models to more… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  35. arXiv:2311.16989  [pdf, other

    cs.CL

    ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

    Authors: Hailin Chen, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, Shafiq Joty

    Abstract: Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a large language model (LLM) with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: version v4, included latest top-performing open-sourced LLMs

  36. arXiv:2311.16813  [pdf, other

    cs.CV

    Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

    Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

    Abstract: The field of autonomous driving increasingly demands high-quality annotated training data. In this paper, we propose Panacea, an innovative approach to generate panoramic and controllable videos in driving scenarios, capable of yielding an unlimited numbers of diverse, annotated samples pivotal for autonomous driving advancements. Panacea addresses two critical challenges: 'Consistency' and 'Contr… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Project page: https://panacea-ad.github.io/

  37. arXiv:2311.13549  [pdf, other

    cs.CV cs.RO

    ADriver-I: A General World Model for Autonomous Driving

    Authors: Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang

    Abstract: Typically, autonomous driving adopts a modular design, which divides the full stack into perception, prediction, planning and control parts. Though interpretable, such modular design tends to introduce a substantial amount of redundancy. Recently, multimodal large language models (MLLM) and diffusion techniques have demonstrated their superior performance on comprehension and generation ability. I… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Tech Report

  38. arXiv:2311.11865  [pdf, other

    cs.CV

    VLM-Eval: A General Evaluation on Video Large Language Models

    Authors: Shuailin Li, Yuang Zhang, Yucheng Zhao, Qiuyue Wang, Fan Jia, Yingfei Liu, Tiancai Wang

    Abstract: Despite the rapid development of video Large Language Models (LLMs), a comprehensive evaluation is still absent. In this paper, we introduce a unified evaluation that encompasses multiple video tasks, including captioning, question and answering, retrieval, and action recognition. In addition to conventional metrics, we showcase how GPT-based evaluation can match human-like performance in assessin… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  39. arXiv:2311.00447  [pdf, other

    cs.AI

    On the Opportunities of Green Computing: A Survey

    Authors: You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo , et al. (16 additional authors not shown)

    Abstract: Artificial Intelligence (AI) has achieved significant advancements in technology and research with the development over several decades, and is widely used in many areas including computing vision, natural language processing, time-series analysis, speech synthesis, etc. During the age of deep learning, especially with the arise of Large Language Models, a large majority of researchers' attention… ▽ More

    Submitted 8 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 113 pages, 18 figures

  40. arXiv:2310.16591  [pdf

    cond-mat.mtrl-sci

    Intrinsic Piezoelectric Anisotropy of Tetragonal ABO3 Perovskites: A High-Throughput Study

    Authors: Fanhao Jia, Shaowen Xu, Shunbo Hu, Jianguo Chen, Yongchen Wang, Yuan Li, Wei Ren, Jinrong Cheng

    Abstract: A comprehensive understand of the intrinsic piezoelectric anisotropy stemming from diverse chemical and physical factors is a key step for the rational design of highly anisotropic materials. We performed high-throughput calculations on tetragonal ABO3 perovskites to investigate the piezoelectricity and the interplay between lattice, displacement, polarization and elasticity. Among the 123 types o… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  41. arXiv:2310.10942  [pdf, other

    cs.CV

    UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models

    Authors: Yangyang Guo, Fangkai Jiao, Zhiqi Shen, Liqiang Nie, Mohan Kankanhalli

    Abstract: Teaching Visual Question Answering (VQA) models to refrain from answering unanswerable questions is necessary for building a trustworthy AI system. Existing studies, though have explored various aspects of VQA but somewhat ignored this particular attribute. This paper aims to bridge the research gap by contributing a comprehensive dataset, called UNK-VQA. The dataset is specifically designed to ad… ▽ More

    Submitted 21 August, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted by TPAMI

  42. arXiv:2310.06753  [pdf, other

    cs.CV

    TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning

    Authors: Dongming Wu, Jiahao Chang, Fan Jia, Yingfei Liu, Tiancai Wang, Jianbing Shen

    Abstract: Topology reasoning aims to comprehensively understand road scenes and present drivable routes in autonomous driving. It requires detecting road centerlines (lane) and traffic elements, further reasoning their topology relationship, i.e., lane-lane topology, and lane-traffic topology. In this work, we first present that the topology score relies heavily on detection performance on lane and traffic… ▽ More

    Submitted 1 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: The 1st solution for 1st OpenLane Topology in Autonomous Driving Challenge. Code is at https://github.com/wudongming97/TopoMLP

  43. arXiv:2310.04948  [pdf, other

    cs.LG cs.CL

    TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

    Authors: Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, Yan Liu

    Abstract: The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various t… ▽ More

    Submitted 2 April, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024. Camera Ready Version

  44. arXiv:2309.08978  [pdf, other

    cs.AI

    Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimizations

    Authors: Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Deyu Zhang, Ju Ren, Yunxin Liu, Lili Qiu, Mao Yang

    Abstract: Web is increasingly becoming the primary platform to deliver AI services onto edge devices, making in-browser deep learning (DL) inference more prominent. Nevertheless, the heterogeneity of edge devices, combined with the underdeveloped state of Web hardware acceleration practices, hinders current in-browser inference from achieving its full performance potential on target devices. To address this… ▽ More

    Submitted 5 July, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted by MobiSys'24

  45. arXiv:2309.06296  [pdf, other

    hep-th quant-ph

    Holographic Entropy Inequalities and Multipartite Entanglement

    Authors: Sergio Hernández-Cuenca, Veronika E. Hubeny, Frederic Jia

    Abstract: We study holographic entropy inequalities and their structural properties by making use of a judicious grouping of terms into certain multipartite information quantities. This allows us to recast cumbersome entropic expressions into much simpler ones which share interestingly rigid structures. By performing a systematic search over some of these structures, we are able to discover more than 300 no… ▽ More

    Submitted 10 July, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: v1: 40 pages, 384 inequalities. v2: 1877 inequalities (w/ numbering distinct from v1); new tables in App. C; link to living repository added

    Report number: MIT-CTP/5610

  46. arXiv:2309.04766  [pdf, other

    cs.CL cs.AI

    SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

    Authors: Bin Wang, Zhengyuan Liu, Xin Huang, Fangkai Jiao, Yang Ding, AiTi Aw, Nancy F. Chen

    Abstract: We present SeaEval, a benchmark for multilingual foundation models. In addition to characterizing how these models understand and reason with natural language, we also investigate how well they comprehend cultural practices, nuances, and values. Alongside standard accuracy metrics, we investigate the brittleness of foundation models in the dimensions of semantics and multilinguality. Our analyses… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: Published at NAACL 2024. Code: https://seaeval.github.io/

  47. arXiv:2308.16839  [pdf, other

    hep-th cond-mat.stat-mech cond-mat.str-el math-ph nlin.SI

    Twist operator correlators and isomonodromic tau functions from modular Hamiltonians

    Authors: Hewei Frederic Jia

    Abstract: We introduce a novel approach for computing the twist operator correlators (TOC) in two-dimensional conformal field theories (2d CFT) and the closely related isomonodromic tau functions. The method stems from the formal path integral representation of the ground state reduced density matrix in 2d CFT, and exploits properties of the associated modular Hamiltonians. For a class of genus-zero TOC/tau… ▽ More

    Submitted 14 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: 24 pages, 4 figures; comments welcome. v2: improved discussion on extending to more generic monodromy data, fixed minor typos

  48. arXiv:2308.10919  [pdf

    cond-mat.mtrl-sci

    Effect of Grain Coalescence on Dislocation and Stress Evolution of GaN Films Grown on Nanoscale Patterned Sapphire Substrates

    Authors: Zuojian Pan, Zhizhong Chen, Yiyong Chen, Haodong Zhang, Han Yang, Jingxin Nie, Chuhan Deng, Boyan Dong, Daqi Wang, Yuchen Li, Weihua Chen, Fei Jiao, Xiangning Kang, Chuanyu Jia, Zhiwen Liang, Qi Wang, Guoyi Zhang, Bo Shen

    Abstract: Two types of nucleation layers (NLs), including in-situ low-temperature grown GaN (LT-GaN) and ex-situ sputtered physical vapor deposition AlN (PVD-AlN), are applied on cone-shaped nanoscale patterned sapphire substrate (NPSS). The initial growth process of GaN on these two NLs is comparably investigated by a series of growth interruptions. The coalescence process of GaN grains is modulated by adj… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  49. arXiv:2308.09616  [pdf, other

    cs.CV

    Far3D: Expanding the Horizon for Surround-view 3D Object Detection

    Authors: Xiaohui Jiang, Shuailin Li, Yingfei Liu, Shihao Wang, Fan Jia, Tiancai Wang, Lijin Han, Xiangyu Zhang

    Abstract: Recently 3D object detection from surround-view images has made notable advancements with its low deployment cost. However, most works have primarily focused on close perception range while leaving long-range detection less explored. Expanding existing methods directly to cover long distances poses challenges such as heavy computation costs and unstable convergence. To address these limitations, t… ▽ More

    Submitted 17 December, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted by AAAI-2024

  50. arXiv:2307.16267  [pdf

    cond-mat.mtrl-sci

    Efficient InGaN-based Red Light-Emitting Diodes by Modulating Trench Defects

    Authors: Z. Pan, Z. Chen, H. Zhang, H. Yang, Y. Chen, J. Nie, C. Deng, B. Dong, D. Wang, Y. Li, H. Lin, W. Chen, F. Jiao, X. Kang, C. Jia, Z. Liang, Q. Wang, G. Zhang, B. Shen

    Abstract: Trench defects in multi-quantum wells (MQWs) have been considered as flawed structures that severely degrade the internal quantum efficiency of light-emitting diodes (LEDs) in the past. In this research, trench defects are innovatively modulated into the structure to enhance the efficiency of red InGaN LEDs. Specifically, dual-color MQWs structures are grown with green MQWs at the bottom and red M… ▽ More

    Submitted 28 December, 2023; v1 submitted 30 July, 2023; originally announced July 2023.