Skip to main content

Showing 1–50 of 388 results for author: Tan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12857  [pdf, other

    cs.CL cs.DL cs.IR

    Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis

    Authors: Jianxiang Yu, Zichen Ding, Jiaqi Tan, Kangyang Luo, Zhenmin Weng, Chenghua Gong, Long Zeng, Renjing Cui, Chengcheng Han, Qiushi Sun, Zhiyong Wu, Yunshi Lan, Xiang Li

    Abstract: In recent years, the rapid increase in scientific papers has overwhelmed traditional review mechanisms, resulting in varying quality of publications. Although existing methods have explored the capabilities of Large Language Models (LLMs) for automated scientific reviewing, their generated contents are often generic or partial. To address the issues above, we introduce an automated paper reviewing… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2407.10353  [pdf, other

    cs.RO

    UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers

    Authors: Huy Ha, Yihuai Gao, Zipeng Fu, Jie Tan, Shuran Song

    Abstract: We introduce UMI-on-Legs, a new framework that combines real-world and simulation data for quadruped manipulation systems. We scale task-centric data collection in the real world using a hand-held gripper (UMI), providing a cheap way to demonstrate task-relevant manipulation skills without a robot. Simultaneously, we scale robot-centric data in simulation by training whole-body controller for task… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 18 pages, 7 figures, website: https://umi-on-legs.github.io/

    ACM Class: I.2.9

  3. arXiv:2407.09451  [pdf, other

    cs.RO

    Benchmarking Large Neighborhood Search for Multi-Agent Path Finding

    Authors: Jiaqi Tan, Yudong Luo, Jiaoyang Li, Hang Ma

    Abstract: Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  4. arXiv:2407.09395  [pdf, other

    cs.IR cs.AI cs.CL

    Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

    Authors: Zhe Lin, Jiwei Tan, Dan Ou, Xi Chen, Shaowei Yao, Bo Zheng

    Abstract: Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these mo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: KDD'24 accepted paper

  5. arXiv:2407.08974  [pdf, other

    q-bio.QM cs.LG math.GN q-bio.BM

    Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction

    Authors: Joshua Zhi En Tan, JunJie Wee, Xue Gong, Kelin Xia

    Abstract: Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  6. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  7. arXiv:2407.04942  [pdf, other

    cs.RO cs.LG

    FOSP: Fine-tuning Offline Safe Policy through World Models

    Authors: Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang

    Abstract: Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 21 pages

  8. arXiv:2407.02047  [pdf, other

    cs.CV

    CountFormer: Multi-View Crowd Counting Transformer

    Authors: Hong Mo, Xiong Zhang, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang, Wenqi Ren

    Abstract: Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted By ECCV2024

  9. arXiv:2406.19030  [pdf, other

    cs.CV

    Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model

    Authors: Jiangtong Tan, Feng Zhao

    Abstract: Image restoration has made marvelous progress with the advent of deep learning. Previous methods usually rely on designing powerful network architecture to elevate performance, however, the natural visual effect of the restored results is limited by color and texture distortions. Besides the visual perceptual quality, the semantic perception recovery is an important but often overlooked perspectiv… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  10. arXiv:2406.18518  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

    Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

    Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  11. arXiv:2406.16671  [pdf, other

    cs.RO

    STAR: Swarm Technology for Aerial Robotics Research

    Authors: Jimmy Chiun, Yan Rui Tan, Yuhong Cao, John Tan, Guillaume Sartoretti

    Abstract: In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.11263  [pdf, other

    cs.CL cs.AI

    The Fall of ROME: Understanding the Collapse of LLMs in Model Editing

    Authors: Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen

    Abstract: Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that con… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  13. arXiv:2406.10290  [pdf, other

    cs.CL cs.AI cs.LG

    MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

    Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  14. arXiv:2406.09801  [pdf, other

    cs.CV

    RaNeuS: Ray-adaptive Neural Surface Reconstruction

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: Our objective is to leverage a differentiable radiance field \eg NeRF to reconstruct detailed 3D surfaces in addition to producing the standard novel view renderings. There have been related methods that perform such tasks, usually by utilizing a signed distance field (SDF). However, the state-of-the-art approaches still fail to correctly reconstruct the small-scale details, such as the leaves, ro… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 3DV 2024, oral. In: Proceedings of the IEEE/CVF International Conference on 3D Vision (2023)

  15. arXiv:2406.09215  [pdf, other

    cs.IR cs.AI

    On Softmax Direct Preference Optimization for Recommendation

    Authors: Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, Tat-Seng Chua

    Abstract: Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-t… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  16. arXiv:2406.08122  [pdf

    eess.AS cs.SD

    Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

    Authors: Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He

    Abstract: It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted for publication on Interspeech 2024. 5 pages, 3 figures, 5 tables

  17. arXiv:2406.08119  [pdf

    eess.AS cs.SD

    Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

    Authors: Yanxiong Li, Jiaxin Tan, Guoqing Chen, Jialong Li, Yongjie Si, Qianhua He

    Abstract: This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted for publication on Interspeech 2024. 5 pages, 4 figures, 3 tables

  18. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  19. arXiv:2406.02609  [pdf, other

    cs.LG cs.AI

    Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation

    Authors: Jiayao Tan, Fan Lyu, Chenggong Ni, Tingliang Feng, Fuyuan Hu, Zhang Zhang, Shaochuang Zhao, Liang Wang

    Abstract: Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient ad… ▽ More

    Submitted 12 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.03335 by other authors

  20. arXiv:2405.18187  [pdf, other

    cs.LG

    AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

    Authors: Longxiang He, Li Shen, Junbo Tan, Xueqian Wang

    Abstract: Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets IQL as an actor-critic method and gets weights of implicit pol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 19 pages, 3 figures, 4 tables

  21. arXiv:2405.15673  [pdf, other

    cs.LG cs.AI stat.ML

    Consistency of Neural Causal Partial Identification

    Authors: Jiyuan Tan, Jose Blanchet, Vasilis Syrgkanis

    Abstract: Recent progress in Neural Causal Models (NCMs) showcased how identification and partial identification of causal effects can be automatically carried out via training of neural generative models that respect the constraints encoded in a given causal graph [Xia et al. 2022, Balazadeh et al. 2022]. However, formal consistency of these methods has only been proven for the case of discrete variables o… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 37 pages, 8 figures

  22. arXiv:2405.12476  [pdf, other

    cs.CV

    Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding

    Authors: Weizhen Liu, Jiayu Tan, Guangyu Lan, Ao Li, Dongye Li, Le Zhao, Xiaohui Yuan, Nanqing Dong

    Abstract: Accurate phenotypic analysis in aquaculture breeding necessitates the quantification of subtle morphological phenotypes. Existing datasets suffer from limitations such as small scale, limited species coverage, and inadequate annotation of keypoints for measuring refined and complex morphological phenotypes of fish body parts. To address this gap, we introduce FishPhenoKey, a comprehensive dataset… ▽ More

    Submitted 31 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI2024, Code: https://github.com/WeizhenLiuBioinform/Fish-Phenotype-Detect

  23. arXiv:2405.09052  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Dielectric Tensor Prediction for Inorganic Materials Using Latent Information from Preferred Potential

    Authors: Zetian Mao, Wenwen Li, Jethro Tan

    Abstract: Dielectrics are materials with widespread applications in flash memory, central processing units, photovoltaics, capacitors, etc. However, the availability of public dielectric data remains limited, hindering research and development efforts. Previously, machine learning models focused on predicting dielectric constants as scalars, overlooking the importance of dielectric tensors in understanding… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  24. arXiv:2405.05525  [pdf, other

    cs.CR cs.LG

    Ditto: Quantization-aware Secure Inference of Transformers upon MPC

    Authors: Haoqi Wu, Wenjing Fang, Yancheng Zheng, Junming Ma, Jin Tan, Yinggui Wang, Lei Wang

    Abstract: Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: to be published in ICML 2024

  25. arXiv:2405.05355  [pdf, other

    cs.CV cs.RO

    Geometry-Informed Distance Candidate Selection for Adaptive Lightweight Omnidirectional Stereo Vision with Fisheye Images

    Authors: Conner Pulling, Je Hon Tan, Yaoyu Hu, Sebastian Scherer

    Abstract: Multi-view stereo omnidirectional distance estimation usually needs to build a cost volume with many hypothetical distance candidates. The cost volume building process is often computationally heavy considering the limited resources a mobile robot has. We propose a new geometry-informed way of distance candidates selection method which enables the use of a very small number of candidates and reduc… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  26. arXiv:2405.00846  [pdf, other

    cs.RO cs.LG

    Gameplay Filters: Safe Robot Walking through Adversarial Imagination

    Authors: Duy P. Nguyen, Kai-Chieh Hsu, Wenhao Yu, Jie Tan, Jaime F. Fisac

    Abstract: Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This… ▽ More

    Submitted 31 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  27. arXiv:2404.16885  [pdf

    cs.CV cs.AI cs.CY cs.LG

    Adapting an Artificial Intelligence Sexually Transmitted Diseases Symptom Checker Tool for Mpox Detection: The HeHealth Experience

    Authors: Rayner Kay Jin Tan, Dilruk Perera, Salomi Arasaratnam, Yudara Kularathne

    Abstract: Artificial Intelligence applications have shown promise in the management of pandemics and have been widely used to assist the identification, classification, and diagnosis of medical images. In response to the global outbreak of Monkeypox (Mpox), the HeHealth.ai team leveraged an existing tool to screen for sexually transmitted diseases to develop a digital screening test for symptomatic Mpox thr… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 15 pages, 4 figures

  28. arXiv:2404.13267  [pdf, other

    cs.SI

    Demystify Adult Learning: A Social Network and Large Language Model Assisted Approach

    Authors: Fang Liu, Bosheng Ding, Chong Guan, Zhang Wei, Dusit Niyato, Justina Tan

    Abstract: Adult learning is increasingly recognized as a crucial way for personal development and societal progress. It however is challenging, and adult learners face unique challenges such as balancing education with other life responsibilities. Collecting feedback from adult learners is effective in understanding their concerns and improving learning experiences, and social networks provide a rich source… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 6 pages, 3 figures

  29. arXiv:2404.09292  [pdf, other

    cs.CV cs.AI

    Bridging Data Islands: Geographic Heterogeneity-Aware Federated Learning for Collaborative Remote Sensing Semantic Segmentation

    Authors: Jieyi Tan, Yansheng Li, Sergey A. Bartalev, Bo Dang, Wei Chen, Yongjun Zhang, Liangqi Yuan

    Abstract: Remote sensing semantic segmentation (RSS) is an essential task in Earth Observation missions. Due to data privacy concerns, high-quality remote sensing images with annotations cannot be well shared among institutions, making it difficult to fully utilize RSS data to train a generalized model. Federated Learning (FL), a privacy-preserving collaborative learning technology, is a potential solution.… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 13 pages,9 figures, 4 tables

  30. arXiv:2404.00653  [pdf, other

    cs.CV

    Dual DETRs for Multi-Label Temporal Action Detection

    Authors: Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, Limin Wang

    Abstract: Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task. However, these approaches primarily followed DETR to predict actions at the instance level (i.e., identify each action by its center point), leading… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  31. arXiv:2403.19021  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    IDGenRec: LLM-RecSys Alignment with Textual ID Learning

    Authors: Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, Yongfeng Zhang

    Abstract: Generative recommendation based on Large Language Models (LLMs) have transformed the traditional ranking-based recommendation style into a text-to-text generation paradigm. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current research in generative recommendations struggles to effectively encode recommendation items within the text-to-text framework using… ▽ More

    Submitted 17 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted in SIGIR 2024

  32. arXiv:2403.18197  [pdf, other

    cs.RO

    LocoMan: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-Manipulators

    Authors: Changyi Lin, Xingyu Liu, Yuxiang Yang, Yaru Niu, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots, Ding Zhao

    Abstract: Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In th… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Project page: https://linchangyi1.github.io/LocoMan

  33. arXiv:2403.15951  [pdf, other

    cs.CV

    MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

    Authors: Jiacheng Chen, Yuefan Wu, Jiaqi Tan, Hang Ma, Yasutaka Furukawa

    Abstract: This paper presents a vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time. Our method, MapTracker, accumulates a sensor stream into memory buffers of two latent representations: 1) Raster latents in the bird's-eye-view (BEV) space and 2) Vector latents over the road elements (i.e., pedestrian… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Project page: https://map-tracker.github.io

  34. arXiv:2403.15637  [pdf, other

    cs.RO

    CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

    Authors: Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Mohamed Elnoor, Anuj Zore, Brian Ichter, Fei Xia, Jie Tan, Wenhao Yu, Dinesh Manocha

    Abstract: We present ConVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to identify the context or scenario (e.g., indoor corridor, outdoor terrain, crosswalk, etc) of the robot's surroundings, and formulate context-based naviga… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures

  35. arXiv:2403.14877  [pdf, other

    cs.RO

    TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields

    Authors: Songyang Liu, Shuai Li, Haochen Li, Weizi Li, Jindong Tan

    Abstract: Electric vertical-takeoff and landing (eVTOL) aircraft, recognized for their maneuverability and flexibility, offer a promising alternative to our transportation system. However, the operational effectiveness of these aircraft faces many challenges, such as the delicate balance between energy and time efficiency, stemming from unpredictable environmental factors, including wind fields. Mathematica… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  36. arXiv:2403.12506  [pdf, ps, other

    cs.IT eess.SP

    Sparse Estimation for XL-MIMO with Unified LoS/NLoS Representation

    Authors: Xu Shi, Xuehan Wang, Jingbo Tan, Jintao Wang

    Abstract: Extremely large-scale antenna array (ELAA) is promising as one of the key ingredients for the sixth generation (6G) of wireless communications. The electromagnetic propagation of spherical wavefronts introduces an additional distance-dependent dimension beyond conventional beamspace. In this paper, we first present one concise closed-form channel formulation for extremely large-scale multiple-inpu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: ICC 2024

  37. arXiv:2403.08879  [pdf, other

    cs.LG cs.AI cs.MA math.OC

    Multi-Objective Optimization Using Adaptive Distributed Reinforcement Learning

    Authors: Jing Tan, Ramin Khalili, Holger Karl

    Abstract: The Intelligent Transportation System (ITS) environment is known to be dynamic and distributed, where participants (vehicle users, operators, etc.) have multiple, changing and possibly conflicting objectives. Although Reinforcement Learning (RL) algorithms are commonly applied to optimize ITS applications such as resource management and offloading, most RL algorithms focus on single objectives. In… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  38. arXiv:2403.02901  [pdf, other

    cs.AI

    A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods

    Authors: Hanlei Jin, Yang Zhang, Dan Meng, Jun Wang, Jinghua Tan

    Abstract: Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort required in processing large volumes of text. ATS has drawn considerable interest in both academic and industrial circles. Many studies have been conducted in the past to survey ATS methods; however, they generall… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  39. arXiv:2403.01734  [pdf, other

    cs.RO cs.AI cs.LG

    Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy

    Authors: Chenyang Cao, Zichen Yan, Renhao Lu, Junbo Tan, Xueqian Wang

    Abstract: Offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset. While prior work has demonstrated various approaches for agents to learn near-optimal policies, these methods encounter limitations when dealing with diverse constraints in complex environments, such as safety constraints. Some of these approaches prioritize goal… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by ICRA24

    MSC Class: 68T40

  40. arXiv:2403.00081  [pdf, other

    cs.CY

    The Constitutions of Web3

    Authors: Joshua Z. Tan, Max Langenkamp, Anna Weichselbraun, Ann Brody, Lucia Korpas

    Abstract: The governance of online communities has been a critical issue since the first USENET groups, and a number of serious constitutions -- declarations of goals, values, and rights -- have emerged since the mid-1990s. More recently, decentralized autonomous organizations (DAOs) have begun to publish their own constitutions, manifestos, and other governance documents. There are two unique aspects to th… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  41. arXiv:2402.15538  [pdf, other

    cs.MA cs.AI

    AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

    Authors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K. Choubey, Tian Lan, Jason Wu, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese

    Abstract: The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from singl… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: preprint. Library is available at https://github.com/SalesforceAIResearch/AgentLite

  42. arXiv:2402.15506  [pdf, other

    cs.AI cs.CL cs.LG

    AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

    Authors: Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

    Abstract: Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \… ▽ More

    Submitted 20 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Add GitHub repo link at \url{https://github.com/SalesforceAIResearch/xLAM} and HuggingFace model link at \url{https://huggingface.co/Salesforce/xLAM-v0.1-r}

  43. arXiv:2402.12052  [pdf, other

    cs.CL

    Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs

    Authors: Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, Ji-Rong Wen

    Abstract: The integration of large language models (LLMs) and search engines represents a significant evolution in knowledge acquisition methodologies. However, determining the knowledge that an LLM already possesses and the knowledge that requires the help of a search engine remains an unresolved issue. Most existing methods solve this problem through the results of preliminary answers or reasoning done by… ▽ More

    Submitted 30 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 main conference. Repo: https://github.com/plageon/SlimPLM

  44. arXiv:2402.11450  [pdf, other

    cs.RO

    Learning to Learn Faster from Human Feedback with Language Model Predictive Control

    Authors: Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore , et al. (25 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  45. arXiv:2402.10695  [pdf, other

    cs.LG cs.AI cs.CR

    Unlink to Unlearn: Simplifying Edge Unlearning in GNNs

    Authors: Jiajun Tan, Fei Sun, Ruichen Qiu, Du Su, Huawei Shen

    Abstract: As concerns over data privacy intensify, unlearning in Graph Neural Networks (GNNs) has emerged as a prominent research frontier in academia. This concept is pivotal in enforcing the \textit{right to be forgotten}, which entails the selective removal of specific data from trained GNNs upon user request. Our research focuses on edge unlearning, a process of particular relevance to real-world applic… ▽ More

    Submitted 11 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted by WWW 2024 as a Short Research Paper

  46. arXiv:2402.10298  [pdf, ps, other

    cs.DS cs.DM

    Streaming algorithm for balance gain and cost with cardinality constraint on the integer lattice

    Authors: Jingjing Tan

    Abstract: Team formation problem is a very important problem in the labor market, and it is proved to be NP-hard. In this paper, we design an efficient bicriteria streaming algorithms to construct a balance between gain and cost in a team formation problem with cardinality constraint on the integer lattice. To solve this problem, we establish a model for maximizing the difference between a nonnegative norma… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  47. arXiv:2402.03752  [pdf, other

    cs.CV cs.LG

    Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images

    Authors: Jen Hong Tan

    Abstract: Can a lightweight Vision Transformer (ViT) match or exceed the performance of Convolutional Neural Networks (CNNs) like ResNet on small datasets with small image resolutions? This report demonstrates that a pure ViT can indeed achieve superior performance through pre-training, using a masked auto-encoder technique with minimal image scaling. Our experiments on the CIFAR-10 and CIFAR-100 datasets i… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 7 pages, 6 figures

  48. BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies

    Authors: Ratnadira Widyasari, Sheng Qin Sim, Camellia Lok, Haodi Qi, Jack Phan, Qijin Tay, Constance Tan, Fiona Wee, Jodie Ethelda Tan, Yuheng Yieh, Brian Goh, Ferdian Thung, Hong Jin Kang, Thong Hoang, David Lo, Eng Lieh Ouh

    Abstract: The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the a… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Journal ref: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2020) 1556-1560

  49. arXiv:2401.12433  [pdf, other

    cs.CV

    A Novel Garment Transfer Method Supervised by Distilled Knowledge of Virtual Try-on Model

    Authors: Naiyu Fang, Lemiao Qiu, Shuyou Zhang, Zili Wang, Kerui Hu, Jianrong Tan

    Abstract: This paper proposes a novel garment transfer method supervised with knowledge distillation from virtual try-on. Our method first reasons the transfer parsing to provide shape prior to downstream tasks. We employ a multi-phase teaching strategy to supervise the training of the transfer parsing reasoning model, learning the response and feature knowledge from the try-on parsing reasoning model. To c… ▽ More

    Submitted 4 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  50. arXiv:2401.11449  [pdf, other

    eess.SP cs.NI

    Energy Consumption Analysis for Continuous Phase Modulation in Smart-Grid Internet of Things of beyond 5G

    Authors: Hongjian Gao, Yang Lu, Shaoshi Yang, Jingsheng Tan, Longlong Nie, Xinyi Qu

    Abstract: Wireless sensor network (WSN) underpinning the smart-grid Internet of Things (SG-IoT) has been a popular research topic in recent years due to its great potential for enabling a wide range of important applications. However, the energy consumption (EC) characteristic of sensor nodes is a key factor that affects the operational performance (e.g., lifetime of sensors) and the total cost of ownership… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 7 figures, 2 tables

    Journal ref: Sensors, vol. 24, no. 2, pp. 1-14, article number 533, Jan. 2024