Skip to main content

Showing 1–45 of 45 results for author: Jia, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12117  [pdf, other

    cs.LG cs.DC

    Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

    Authors: Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui

    Abstract: Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing f… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.10486  [pdf, other

    cs.AI cs.CL

    IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

    Authors: Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

    Abstract: Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically i… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  3. arXiv:2407.04416  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Audio Generation with Visual Enhanced Caption

    Authors: Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

    Abstract: Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low quality and relatively small quantity of training data. In this work, we aim to create a large-scale audio dataset with rich captions for improving audi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages with 1 appendix

  4. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2406.01210  [pdf, other

    cs.CV

    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

    Authors: Ding Jia, Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Chang Xu, Xinghao Chen

    Abstract: Cross-modal transformers have demonstrated superiority in various vision tasks by effectively integrating different modalities. This paper first critiques prior token exchange methods which replace less informative tokens with inter-modal features, and demonstrate exchange based methods underperform cross-attention mechanisms, while the computational demand of the latter inevitably restricts its u… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024, code and models are available at https://github.com/JiaDingCN/GeminiFusion

  6. arXiv:2405.14578  [pdf, other

    cs.LG

    Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

    Authors: Shuaipeng Li, Penghao Zhao, Hailin Zhang, Xingwu Sun, Hao Wu, Dian Jiao, Weiyan Wang, Chengjun Liu, Zheng Fang, Jinbao Xue, Yangyu Tao, Bin Cui, Di Wang

    Abstract: In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require c… ▽ More

    Submitted 4 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  7. arXiv:2405.13002  [pdf, other

    cs.CL cs.AI

    DuetRAG: Collaborative Retrieval-Augmented Generation

    Authors: Dian Jiao, Li Cai, Jingsheng Huang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

    Abstract: Retrieval-Augmented Generation (RAG) methods augment the input of Large Language Models (LLMs) with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks. However, contemporary RAG approaches suffer from irrelevant knowledge retrieval issues in complex domain questions (e.g., HotPot QA) due to the lack of corresponding domain knowledge, leading to low-quality generation… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 5 pages

  8. arXiv:2404.17806  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

    Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

    Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  9. arXiv:2404.08886  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

    Authors: Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea

    Abstract: In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted by NAACL 2024 Industry Track

  10. arXiv:2404.06674  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

    Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

    Abstract: We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  11. arXiv:2404.01063  [pdf, other

    cs.HC cs.GR

    Chat Modeling: Natural Language-based Procedural Modeling of Biological Structures without Training

    Authors: Donggang Jia, Yunhai Wang, Ivan Viola

    Abstract: 3D modeling of biological structures is an inherently complex process, necessitating both biological and geometric understanding. Additionally, the complexity of user interfaces of 3D modeling tools and the associated steep learning curve further exacerbate the difficulty of authoring a 3D model. In this paper, we introduce a novel framework to address the challenge of using 3D modeling software b… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  12. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  13. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  14. arXiv:2311.15983  [pdf, other

    cs.LG cs.AI cs.CL

    SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification

    Authors: Difan Jiao, Yilun Liu, Zhenwei Tang, Daniel Matter, Jürgen Pfeffer, Ashton Anderson

    Abstract: Among the many tasks that Large Language Models (LLMs) have revolutionized is text classification. Current text classification paradigms, however, rely solely on the output of the final layer in the LLM, with the rich information contained in internal neurons largely untapped. In this study, we present SPIN: a model-agnostic framework that sparsifies and integrates internal neurons of intermediate… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 17 pages, 7 figures, 12 tables Code available at https://github.com/difanj0713/SPIN

  15. GraphControl: Adding Conditional Control to Universal Graph Pre-trained Models for Graph Domain Transfer Learning

    Authors: Yun Zhu, Yaoke Wang, Haizhou Shi, Zhenshuo Zhang, Dian Jiao, Siliang Tang

    Abstract: Graph-structured data is ubiquitous in the world which models complex relationships between objects, enabling various Web applications. Daily influxes of unlabeled graph data on the Web offer immense potential for these applications. Graph self-supervised algorithms have achieved significant success in acquiring generic knowledge from abundant unlabeled graph data. These pre-trained models can be… ▽ More

    Submitted 11 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted by The Web Conference 2024 (WWW 2024)

    Journal ref: The Web Conference 2024

  16. arXiv:2308.05862  [pdf, other

    eess.IV cs.AI cs.CV

    Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Shihao Ma, Adamo Young, Cheng Zhu, Kangkang Meng, Xin Yang, Ziyan Huang, Fan Zhang, Wentao Liu, YuanKe Pan, Shoujin Huang, Jiacheng Wang, Mingze Sun, Weixin Xu, Dengqiang Jia, Jae Won Choi, Natália Alves, Bram de Wilde, Gregor Koehler, Yajun Wu, Manuel Wiesenfarth, Qiongjie Zhu , et al. (4 additional authors not shown)

    Abstract: Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations,… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: MICCAI FLARE22: https://flare22.grand-challenge.org/

  17. arXiv:2307.08199  [pdf, other

    cs.CV

    Unbiased Image Synthesis via Manifold Guidance in Diffusion Models

    Authors: Xingzhe Su, Daixi Jia, Fengge Wu, Junsuo Zhao, Changwen Zheng, Wenwen Qiang

    Abstract: Diffusion Models are a potent class of generative models capable of producing high-quality images. However, they often inadvertently favor certain data attributes, undermining the diversity of generated images. This issue is starkly apparent in skewed datasets like CelebA, where the initial dataset disproportionately favors females over males by 57.9%, this bias amplified in generated data where f… ▽ More

    Submitted 15 April, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

  18. arXiv:2306.07265  [pdf, other

    cs.CV

    detrex: Benchmarking Detection Transformers

    Authors: Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

    Abstract: The DEtection TRansformer (DETR) algorithm has received considerable attention in the research community and is gradually emerging as a mainstream approach for object detection and other perception tasks. However, the current field lacks a unified and comprehensive benchmark specifically tailored for DETR-based models. To address this issue, we develop a unified, highly modular, and lightweight co… ▽ More

    Submitted 13 June, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: project link: https://github.com/IDEA-Research/detrex

  19. arXiv:2305.13869  [pdf, other

    physics.acc-ph cs.AI cs.LG eess.SY

    Trend-Based SAC Beam Control Method with Zero-Shot in Superconducting Linear Accelerator

    Authors: Xiaolong Chen, Xin Qi, Chunguang Su, Yuan He, Zhijun Wang, Kunxiang Sun, Chao Jin, Weilong Chen, Shuhui Liu, Xiaoying Zhao, Duanyang Jia, Man Yi

    Abstract: The superconducting linear accelerator is a highly flexiable facility for modern scientific discoveries, necessitating weekly reconfiguration and tuning. Accordingly, minimizing setup time proves essential in affording users with ample experimental time. We propose a trend-based soft actor-critic(TBSAC) beam control method with strong robustness, allowing the agents to be trained in a simulated en… ▽ More

    Submitted 25 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  20. arXiv:2304.04083  [pdf, other

    cs.HC cs.GR

    VOICE: Visual Oracle for Interaction, Conversation, and Explanation

    Authors: Donggang Jia, Alexandra Irger, Lonni Besancon, Ondrej Strnad, Deng Luo, Johanna Bjorklund, Anders Ynnerman, Ivan Viola

    Abstract: We present VOICE, a novel approach to science communication that connects large language models' (LLM) conversational capabilities with interactive exploratory visualization. VOICE introduces several innovative technical contributions that drive our conversational visualization framework. Our foundation is a pack-of-bots that can perform specific tasks, such as assigning tasks, extracting instruct… ▽ More

    Submitted 22 January, 2024; v1 submitted 8 April, 2023; originally announced April 2023.

  21. arXiv:2303.11899  [pdf, other

    cs.AI

    Large-Scale Traffic Signal Control Using Constrained Network Partition and Adaptive Deep Reinforcement Learning

    Authors: Hankang Gu, Shangbo Wang, Xiaoguang Ma, Dongyao Jia, Guoqiang Mao, Eng Gee Lim, Cheuk Pong Ryan Wong

    Abstract: Multi-agent Deep Reinforcement Learning (MADRL) based traffic signal control becomes a popular research topic in recent years. To alleviate the scalability issue of completely centralized RL techniques and the non-stationarity issue of completely decentralized RL techniques on large-scale traffic networks, some literature utilizes a regional control approach where the whole network is firstly part… ▽ More

    Submitted 7 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

  22. arXiv:2303.02868  [pdf, other

    cs.LG cs.DC

    Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

    Authors: Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui

    Abstract: Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transfor… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

  23. arXiv:2301.05392  [pdf, other

    cs.CV cs.LG

    Multi-Target Landmark Detection with Incomplete Images via Reinforcement Learning and Shape Prior

    Authors: Kaiwen Wan, Lei Li, Dengqiang Jia, Shangqi Gao, Wei Qian, Yingzhi Wu, Huandong Lin, Xiongzheng Mu, Xin Gao, Sijia Wang, Fuping Wu, Xiahai Zhuang

    Abstract: Medical images are generally acquired with limited field-of-view (FOV), which could lead to incomplete regions of interest (ROI), and thus impose a great challenge on medical image analysis. This is particularly evident for the learning-based multi-target landmark detection, where algorithms could be misleading to learn primarily the variation of background due to the varying FOV, failing the dete… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: 29 pages, 13 figures

  24. arXiv:2301.02978  [pdf

    cs.RO

    Human Following Based on Visual Perception in the Context of Warehouse Logistics

    Authors: Yanbaihui Liu, Haibo Wang, Dongming Jia

    Abstract: Under the background of 5G, Internet, artificial intelligence technol,ogy and robot technology, warehousing, and logistics robot technology has developed rapidly, and products have been widely used. A practical application is to help warehouse personnel pick up or deliver heavy goods at dispersed locations based on dynamic routes. However, programs that can only accept instructions or pre-set by t… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

    Comments: Under review in 2023 5th international Conference on Materials Science, Machine and Energy Engineering (MSMEE 2023)

  25. arXiv:2210.01035  [pdf, other

    cs.CV cs.LG

    Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

    Authors: Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, Weihong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu

    Abstract: Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens. Many advanced approaches have been developed to reduce the total number of tokens in large-scale vision transformers, especially for image classification tasks. Typically, they select a small group of essential tokens acc… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022, camera-ready version, 22 pages, 14 figures

  26. arXiv:2208.03791  [pdf, other

    cs.CV

    Global Hierarchical Attention for 3D Point Cloud Analysis

    Authors: Dan Jia, Alexander Hermans, Bastian Leibe

    Abstract: We propose a new attention mechanism, called Global Hierarchical Attention (GHA), for 3D point cloud analysis. GHA approximates the regular global dot-product attention via a series of coarsening and interpolation operations over multiple hierarchy levels. The advantage of GHA is two-fold. First, it has linear complexity with respect to the number of points, enabling the processing of large point… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: Accepted to the German Conference on Pattern Recognition (GCPR) 2022

  27. arXiv:2208.02121  [pdf, other

    cs.RO cs.CV cs.HC

    Pedestrian-Robot Interactions on Autonomous Crowd Navigation: Reactive Control Methods and Evaluation Metrics

    Authors: Diego Paez-Granados, Yujie He, David Gonon, Dan Jia, Bastian Leibe, Kenji Suzuki, Aude Billard

    Abstract: Autonomous navigation in highly populated areas remains a challenging task for robots because of the difficulty in guaranteeing safe interactions with pedestrians in unstructured situations. In this work, we present a crowd navigation control framework that delivers continuous obstacle avoidance and post-contact control evaluated on an autonomous personal mobility vehicle. We propose evaluation me… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: \c{opyright}IEEE All rights reserved. IEEE-IROS-2022, Oct.23-27. Kyoto, Japan

    Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2022)

  28. arXiv:2207.13080  [pdf, other

    cs.CV

    DETRs with Hybrid Matching

    Authors: Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, Weihong Lin, Lei Sun, Chao Zhang, Han Hu

    Abstract: One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections. This end-to-end signature is important for the versatility of DETR, and it has been generalized to broader vision tasks. However, we note that there are few queries assigned as positive sample… ▽ More

    Submitted 16 May, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: CVPR 2023. The code is available at: https://github.com/HDETR

  29. arXiv:2207.02260  [pdf, other

    cs.CE

    An A-Phi Formulation Solver in Electromagnetics Based on Discrete Exterior Calculus

    Authors: Boyuan Zhang, Dong-Yeop Na, Dan Jiao, Weng Cho Chew

    Abstract: An efficient numerical solver for the A-Phi formulation in electromagnetics based on the discrete exterior calculus (DEC) is proposed in this paper. The A-Phi formulation is immune to low-frequency breakdown and ideal for broadband and multi-scale analysis. The generalized Lorenz gauge is used in this paper, which decouples the A equation and the Phi equation. The A-Phi formulation is discretized… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  30. arXiv:2204.04666  [pdf, other

    cs.NE

    Energy-Sensitive Trajectory Design and Restoration Areas Allocation for UAV-Enabled Grassland Restoration

    Authors: Dongbin Jiao, Lingyu Wang, Peng Yang, Weibo Yang, Yu Peng, Zhanhuan Shang, Fengyuan Ren

    Abstract: Grassland restoration is a critical means to safeguard grassland ecological degradation. To alleviate the extensive human labors and boost the restoration efficiency, UAV is promising for its fully automatic capability yet still waits to be exploited. This paper progresses this emerging technology by explicitly considering the realistic constraints of the UAV and the grassland degradation while pl… ▽ More

    Submitted 22 July, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

  31. A Simple Standard for Sharing Ontological Mappings (SSSOM)

    Authors: Nicolas Matentzoglu, James P. Balhoff, Susan M. Bello, Chris Bizon, Matthew Brush, Tiffany J. Callahan, Christopher G Chute, William D. Duncan, Chris T. Evelo, Davera Gabriel, John Graybeal, Alasdair Gray, Benjamin M. Gyori, Melissa Haendel, Henriette Harmse, Nomi L. Harris, Ian Harrow, Harshad Hegde, Amelia L. Hoyt, Charles T. Hoyt, Dazhi Jiao, Ernesto Jiménez-Ruiz, Simon Jupp, Hyeongsik Kim, Sebastian Koehler , et al. (19 additional authors not shown)

    Abstract: Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, ar… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: Corresponding author: Christopher J. Mungall <[email protected]>

  32. arXiv:2110.02565  [pdf, other

    cs.NI

    A Region-based Collaborative Management Scheme for Dynamic Clustering in Green VANET

    Authors: Bingyi Liu, Zhipeng Fang, Wei Wang, Xun Shao, Wei Wei, Dongyao Jia, Enshu Wang, Shengwu Xiong

    Abstract: Green Vehicular Ad-hoc Network (VANET) is a newly-emerged research area which focuses on reducing harmful impacts of vehicular communication equipments on the natural environment. Recent studies have shown that grouping vehicles into clusters for green communications in VANETs can significantly improve networking efficiency and reduce infrastructure costs. As a dynamic network system, maintaining… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  33. arXiv:2107.06780  [pdf, ps, other

    cs.CV cs.RO

    Person-MinkUNet: 3D Person Detection with LiDAR Point Cloud

    Authors: Dan Jia, Bastian Leibe

    Abstract: In this preliminary work we attempt to apply submanifold sparse convolution to the task of 3D person detection. In particular, we present Person-MinkUNet, a single-stage 3D person detection network based on Minkowski Engine with U-Net architecture. The network achieves a 76.4% average precision (AP) on the JRDB 3D detection benchmark.

    Submitted 3 July, 2021; originally announced July 2021.

    Comments: accepted as an extended abstract in JRDB-ACT Workshop at CVPR21

  34. arXiv:2107.01378  [pdf, other

    cs.CV

    Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation

    Authors: Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang

    Abstract: In the past few years, transformers have achieved promising performances on various computer vision tasks. Unfortunately, the immense inference overhead of most existing vision transformers withholds their from being deployed on edge devices such as cell phones and smart watches. Knowledge distillation is a widely used paradigm for compressing cumbersome architectures via transferring information… ▽ More

    Submitted 2 June, 2022; v1 submitted 3 July, 2021; originally announced July 2021.

  35. arXiv:2106.11239  [pdf, other

    cs.RO cs.CV

    2D vs. 3D LiDAR-based Person Detection on Mobile Robots

    Authors: Dan Jia, Alexander Hermans, Bastian Leibe

    Abstract: Person detection is a crucial task for mobile robots navigating in human-populated environments. LiDAR sensors are promising for this task, thanks to their accurate depth measurements and large field of view. Two types of LiDAR sensors exist: the 2D LiDAR sensors, which scan a single plane, and the 3D LiDAR sensors, which scan multiple planes, thus forming a volume. How do they compare for the tas… ▽ More

    Submitted 25 July, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Shortened version accepted at the International Conference on Intelligent Robots and Systems (IROS) 2022

  36. arXiv:2105.09548  [pdf, other

    cs.CV

    A low-rank representation for unsupervised registration of medical images

    Authors: Dengqiang Jia, Shangqi Gao, Qunlong Chen, Xinzhe Luo, Xiahai Zhuang

    Abstract: Registration networks have shown great application potentials in medical image analysis. However, supervised training methods have a great demand for large and high-quality labeled datasets, which is time-consuming and sometimes impractical due to data sharing issues. Unsupervised image registration algorithms commonly employ intensity-based similarity measures as loss functions without any manual… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: 11 pages, 3 figures

  37. arXiv:2012.08929  [pdf, other

    eess.IV cs.CV cs.LG

    Learning-Based Algorithms for Vessel Tracking: A Review

    Authors: Dengqiang Jia, Xiahai Zhuang

    Abstract: Developing efficient vessel-tracking algorithms is crucial for imaging-based diagnosis and treatment of vascular diseases. Vessel tracking aims to solve recognition problems such as key (seed) point detection, centerline extraction, and vascular segmentation. Extensive image-processing techniques have been developed to overcome the problems of vessel tracking that are mainly attributed to the comp… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: 19 pages, 3 figures, 9 tables, accept by Computerized Medical Imaging and Graphics

  38. arXiv:2012.08890  [pdf, other

    cs.CV cs.RO

    Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera

    Authors: Dan Jia, Mats Steinweg, Alexander Hermans, Bastian Leibe

    Abstract: Deep learning is the essential building block of state-of-the-art person detectors in 2D range data. However, only a few annotated datasets are available for training and testing these deep networks, potentially limiting their performance when deployed in new environments or with different LiDAR models. We propose a method, which uses bounding boxes from an image-based detector (e.g. Faster R-CNN)… ▽ More

    Submitted 3 June, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: 2021 IEEE International Conference on Robotics and Automation (ICRA)

  39. arXiv:2004.14079  [pdf, other

    cs.RO cs.CV

    DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data

    Authors: Dan Jia, Alexander Hermans, Bastian Leibe

    Abstract: Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the nec… ▽ More

    Submitted 31 July, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

  40. arXiv:1811.11329  [pdf, other

    cs.CV cs.LG cs.RO

    Deep Reinforcement Learning for Autonomous Driving

    Authors: Sen Wang, Daoyuan Jia, Xinshuo Weng

    Abstract: Reinforcement learning has steadily improved and outperform human in lots of traditional games since the resurgence of deep neural network. However, these success is not easy to be copied to autonomous driving because the state spaces in real world are extreme complex and action spaces are continuous and fine control is required. Moreover, the autonomous driving vehicles must also keep functional… ▽ More

    Submitted 19 May, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: no time for further improvement

  41. arXiv:1810.11218  [pdf, ps, other

    cs.NI

    Optimal Energy-Delay in Energy Harvesting Wireless Sensor Networks with Interference Channel

    Authors: Dongbin Jiao, Liangjun Ke, Shengbo Liu, Felix T. S. Chan

    Abstract: In this work, we investigate the capacity allocation problem in the energy harvesting wireless sensor networks (WSNs) with interference channel. For the fixed topologies of data and energy, we formulate the optimization problem when the data flow remains constant on all data links and each sensor node harvests energy only once in a time slot. We focus on the optimal data rates, power allocations a… ▽ More

    Submitted 30 October, 2018; v1 submitted 26 October, 2018; originally announced October 2018.

  42. arXiv:1804.10563  [pdf, other

    cs.PF

    Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks

    Authors: Zhengyu Yang, Danlin Jia, Stratis Ioannidis, Ningfang Mi, Bo Sheng

    Abstract: In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. Data-parallel computing frameworks, such as Apache Spark, are widely used to perform such data processing at scale. Specifically, Spark leverages distributed memory to cache the intermediate results, represented as Resilient Distributed Datasets (RDDs).… ▽ More

    Submitted 7 May, 2018; v1 submitted 27 April, 2018; originally announced April 2018.

  43. arXiv:1801.01726  [pdf, other

    cs.CV

    Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

    Authors: Peilun Li, Xiaodan Liang, Daoyuan Jia, Eric P. Xing

    Abstract: Recent advances in vision tasks (e.g., segmentation) highly depend on the availability of large-scale real-world image annotations obtained by cumbersome human labors. Moreover, the perception performance often drops significantly for new scenarios, due to the poor generalization capability of models trained on limited and biased annotations. In this work, we resort to transfer knowledge from auto… ▽ More

    Submitted 14 July, 2018; v1 submitted 5 January, 2018; originally announced January 2018.

    Comments: In proceedings of BMVC 2018

  44. arXiv:1704.01397  [pdf, ps, other

    cs.NI

    Cooperative Relative Positioning of Mobile Users by Fusing IMU Inertial and UWB Ranging Information

    Authors: Ran Liu, Chau Yuen, Tri-Nhut Do, Dewei Jiao, Xiang Liu, U-Xuan Tan

    Abstract: Relative positioning between multiple mobile users is essential for many applications, such as search and rescue in disaster areas or human social interaction. Inertial-measurement unit (IMU) is promising to determine the change of position over short periods of time, but it is very sensitive to error accumulation over long term run. By equipping the mobile users with ranging unit, e.g. ultra-wide… ▽ More

    Submitted 4 April, 2017; originally announced April 2017.

    Comments: accepted by ICRA 2017

  45. arXiv:1611.02776  [pdf, other

    cs.CV

    Deep Convolutional Neural Network for 6-DOF Image Localization

    Authors: Daoyuan Jia, Yongchi Su, Chunping Li

    Abstract: We present an accurate and robust method for six degree of freedom image localization. There are two key-points of our method, 1. automatic immense photo synthesis and labeling from point cloud model and, 2. pose estimation with deep convolutional neural networks regression. Our model can directly regresses 6-DOF camera poses from images, accurately describing where and how it was captured. We ach… ▽ More

    Submitted 8 November, 2016; originally announced November 2016.

    Comments: will update soon