Zum Hauptinhalt springen

Showing 1–50 of 63 results for author: Lan, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07276  [pdf, other

    cs.CV cs.AI

    Exploring Camera Encoder Designs for Autonomous Driving Perception

    Authors: Barath Lakshmanan, Joshua Chen, Shiyi Lan, Maying Shen, Zhiding Yu, Jose M. Alvarez

    Abstract: The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accur… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2406.12079  [pdf, other

    cs.CV cs.AI cs.LG

    Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

    Authors: Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez

    Abstract: As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pru… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under Review

  3. arXiv:2406.06978  [pdf, other

    cs.CV

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Authors: Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez

    Abstract: We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment… ▽ More

    Submitted 29 August, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: The 1st place solution of End-to-end Driving at Scale at the CVPR 2024 Autonomous Grand Challenge

  4. arXiv:2405.01533  [pdf, other

    cs.CV

    OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

    Authors: Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

    Abstract: The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work propos… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2404.01990  [pdf, other

    cs.CV

    What is Point Supervision Worth in Video Instance Segmentation?

    Authors: Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar

    Abstract: Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. Conventional VIS methods rely on densely-annotated object masks which are expensive. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Our proposed train… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  6. arXiv:2402.12177  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning

    Authors: Mingtian Zhang, Shawn Lan, Peter Hayes, David Barber

    Abstract: Retrieval Augmented Generation (RAG) has emerged as an effective solution for mitigating hallucinations in Large Language Models (LLMs). The retrieval stage in RAG typically involves a pre-trained embedding model, which converts queries and passages into vectors to capture their semantics. However, a standard pre-trained embedding model may exhibit sub-optimal performance when applied to specific… ▽ More

    Submitted 12 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  7. arXiv:2402.00892  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

    Authors: Shijia Liao, Shiyi Lan, Arun George Zachariah

    Abstract: The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns. Despite these advancements, the exploration into scaling, especially in the audio generation domain, remains limited, with previous efforts didn't extend into the high-fidelity (HiFi) 44.1kHz domain and suffering from bot… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

  8. arXiv:2401.03844  [pdf, other

    cs.CV

    Fully Attentional Networks with Self-emerging Token Labeling

    Authors: Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

    Abstract: Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framew… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5585-5595

  9. arXiv:2312.13764  [pdf, other

    cs.CV cs.CL cs.LG

    A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

    Authors: Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

    Abstract: This paper introduces ProLab, a novel approach using property-level label space for creating strong interpretable segmentation models. Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models. It is based on two core designs. First, we employ Large Language Models (LLMs) and carefully craft… ▽ More

    Submitted 15 August, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted to ECCV 2024. Code is available at https://github.com/lambert-x/ProLab

  10. arXiv:2312.03031  [pdf, other

    cs.CV

    Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

    Authors: Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

    Abstract: End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observ… ▽ More

    Submitted 2 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accept to cvpr 2024

  11. arXiv:2312.01696  [pdf, other

    cs.CV

    BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

    Authors: Zhenxin Li, Shiyi Lan, Jose M. Alvarez, Zuxuan Wu

    Abstract: Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This… ▽ More

    Submitted 24 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  12. arXiv:2312.00081  [pdf, other

    cs.CV

    Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

    Authors: Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu

    Abstract: Vision language models (VLM) have demonstrated remarkable performance across various downstream tasks. However, understanding fine-grained visual-linguistic concepts, such as attributes and inter-object relationships, remains a significant challenge. While several benchmarks aim to evaluate VLMs in finer granularity, their primary focus remains on the linguistic aspect, neglecting the visual dimen… ▽ More

    Submitted 30 March, 2024; v1 submitted 29 November, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  13. arXiv:2311.14671  [pdf, other

    cs.CV

    SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

    Authors: Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

    Abstract: In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target. The resulting models can be generalized seamlessly to novel segmentation tasks, significantly reducing the labeling and training costs compared with conventional pipelines. However, in-context segmentation is mo… ▽ More

    Submitted 22 July, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: ECCV-24 camera-ready

  14. arXiv:2311.03695  [pdf, other

    cs.LG cs.AI

    Context Shift Reduction for Offline Meta-Reinforcement Learning

    Authors: Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng, Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, Yunji Chen

    Abstract: Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and fur… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  15. arXiv:2311.01075  [pdf, other

    cs.LG

    Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

    Authors: Siming Lan, Rui Zhang, Qi Yi, Jiaming Guo, Shaohui Peng, Yunkai Gao, Fan Wu, Ruizhi Chen, Zidong Du, Xing Hu, Xishan Zhang, Ling Li, Yunji Chen

    Abstract: In the field of multi-task reinforcement learning, the modular principle, which involves specializing functionalities into different modules and combining them appropriately, has been widely adopted as a promising approach to prevent the negative transfer problem that performance degradation due to conflicts between tasks. However, most of the existing multi-task RL methods only combine shared mod… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: This paper has been accepted at NeurIPS 2023 as a poster

  16. arXiv:2310.19731  [pdf, other

    cs.CV cs.AI cs.LG

    ViR: Towards Efficient Vision Retention Backbones

    Authors: Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

    Abstract: Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios whic… ▽ More

    Submitted 26 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Introduction of Vision Retention Networks (ViR) for Efficient Visual Modeling

  17. arXiv:2308.04556  [pdf, other

    cs.CV

    FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

    Authors: Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

    Abstract: False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  18. arXiv:2308.03666  [pdf, other

    stat.ML cs.LG

    Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

    Authors: Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shiping Wang, Wenzhong Guo

    Abstract: As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence… ▽ More

    Submitted 18 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  19. arXiv:2307.01492  [pdf, other

    cs.CV cs.RO

    FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

    Authors: Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

    Abstract: This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection.… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: Outstanding Champion and Innovation Award in the 3D Occupancy Prediction Challenge (CVPR23)

  20. arXiv:2306.07307  [pdf, other

    cs.LG cs.AI

    Online Prototype Alignment for Few-shot Policy Transfer

    Authors: Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Yunkai Gao, Kaizhao Yuan, Ruizhi Chen, Siming Lan, Xing Hu, Zidong Du, Xishan Zhang, Qi Guo, Yunji Chen

    Abstract: Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they ofte… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted at ICML2023

  21. arXiv:2305.18312  [pdf, other

    cs.CY cs.AI cs.LG

    Balancing Test Accuracy and Security in Computerized Adaptive Testing

    Authors: Wanyong Feng, Aritra Ghosh, Stephen Sireci, Andrew S. Lan

    Abstract: Computerized adaptive testing (CAT) is a form of personalized testing that accurately measures students' knowledge levels while reducing test length. Bilevel optimization-based CAT (BOBCAT) is a recent framework that learns a data-driven question selection algorithm to effectively reduce test length and improve test accuracy. However, it suffers from high question exposure and test overlap rates,… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: The 24th International Conference on Artificial Intelligence in Education (AIED 2023)

  22. Collaborative Multi-Agent Video Fast-Forwarding

    Authors: Shuyue Lan, Zhilu Wang, Ermin Wei, Amit K. Roy-Chowdhury, Qi Zhu

    Abstract: Multi-agent applications have recently gained significant popularity. In many computer vision tasks, a network of agents, such as a team of robots with cameras, could work collaboratively to perceive the environment for efficient and accurate situation awareness. However, these agents often have limited computation, communication, and storage resources. Thus, reducing resource consumption while st… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: IEEE Transactions on Multimedia, 2023. arXiv admin note: text overlap with arXiv:2008.04437

  23. arXiv:2302.04858  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

    Authors: Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar

    Abstract: Augmenting pretrained language models (LMs) with a vision encoder (e.g., Flamingo) has obtained the state-of-the-art results in image-to-text generation. However, these models store all the knowledge within their parameters, thus often requiring enormous model parameters to model the abundant visual concepts and very rich textual descriptions. Additionally, they are inefficient in incorporating ne… ▽ More

    Submitted 22 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Findings of EMNLP 2023

  24. arXiv:2301.03992  [pdf, other

    cs.CV cs.LG cs.MM

    Vision Transformers Are Good Mask Auto-Labelers

    Authors: Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar

    Abstract: We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. MAL takes box-cropped images as inputs and conditionally generates their mask pseudo-labels.We show that Vision Transformers are good mask auto-labelers. Our method significantly reduces the gap between auto-labeling and human annotation regarding… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  25. arXiv:2210.12852  [pdf, other

    cs.CV

    1st Place Solution of The Robust Vision Challenge 2022 Semantic Segmentation Track

    Authors: Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar

    Abstract: This report describes the winning solution to the Robust Vision Challenge (RVC) semantic segmentation track at ECCV 2022. Our method adopts the FAN-B-Hybrid model as the encoder and uses SegFormer as the segmentation framework. The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with… ▽ More

    Submitted 7 November, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: The Winning Solution to The Robust Vision Challenge 2022 Semantic Segmentation Track

  26. arXiv:2201.08662  [pdf, other

    quant-ph cs.PL cs.SE

    A Comprehensive Study of Bug Fixes in Quantum Programs

    Authors: Junjie Luo, Pengzhan Zhao, Zhongtao Miao, Shuhan Lan, Jianjun Zhao

    Abstract: As quantum programming evolves, more and more quantum programming languages are being developed. As a result, debugging and testing quantum programs have become increasingly important. While bug fixing in classical programs has come a long way, there is a lack of research in quantum programs. To this end, this paper presents a comprehensive study on bug fixing in quantum programs. We collect and i… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  27. arXiv:2111.15668  [pdf, other

    cs.CV

    AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

    Authors: Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim

    Abstract: Built on top of self-attention mechanisms, vision transformers have demonstrated remarkable performance on a variety of vision tasks recently. While achieving excellent performance, they still require relatively intensive computational cost that scales up drastically as the numbers of patches, self-attention heads and transformer blocks increase. In this paper, we argue that due to the large varia… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

  28. arXiv:2111.00901  [pdf, other

    cs.LG

    Click-Based Student Performance Prediction: A Clustering Guided Meta-Learning Approach

    Authors: Yun-Wei Chu, Elizabeth Tenorio, Laura Cruz, Kerrie Douglas, Andrew S. Lan, Christopher G. Brinton

    Abstract: We study the problem of predicting student knowledge acquisition in online courses from clickstream behavior. Motivated by the proliferation of eLearning lecture delivery, we specifically focus on student in-video activity in lectures videos, which consist of content and in-video quizzes. Our methodology for predicting in-video quiz performance is based on three key ideas we develop. First, we mod… ▽ More

    Submitted 15 November, 2021; v1 submitted 28 October, 2021; originally announced November 2021.

    Comments: 10 pages, IEEE BigData 2021

  29. arXiv:2109.04546  [pdf, other

    cs.CL

    Math Word Problem Generation with Mathematical Consistency and Problem Context Constraints

    Authors: Zichao Wang, Andrew S. Lan, Richard G. Baraniuk

    Abstract: We study the problem of generating arithmetic math word problems (MWPs) given a math equation that specifies the mathematical computation and a context that specifies the problem scenario. Existing approaches are prone to generating MWPs that are either mathematically invalid or have unsatisfactory language quality. They also either ignore the context or require manual specification of a problem t… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  30. arXiv:2108.12093  [pdf, other

    cs.LG

    Anomaly Detection on IT Operation Series via Online Matrix Profile

    Authors: Shi-Ying Lan, Run-Qing Chen, Wan-Lei Zhao

    Abstract: Anomaly detection on time series is a fundamental task in monitoring the Key Performance Indicators (KPIs) of IT systems. Many of the existing approaches in the literature show good performance while requiring a lot of training resources. In this paper, the online matrix profile, which requires no training, is proposed to address this issue. The anomalies are detected by referring to the past subs… ▽ More

    Submitted 6 September, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: 10 pages, 6 figures; Shi-Ying Lan and Run-Qing Chen contributed equally

  31. arXiv:2108.09744  [pdf, other

    cs.SE cs.PL

    Bugs4Q: A Benchmark of Real Bugs for Quantum Programs

    Authors: Pengzhan Zhao, Jianjun Zhao, Zhongtao Miao, Shuhan Lan

    Abstract: Realistic benchmarks of reproducible bugs and fixes are vital to good experimental evaluation of debugging and testing approaches. However, there is no suitable benchmark suite that can systematically evaluate the debugging and testing methods of quantum programs until now. This paper proposes Bugs4Q, a benchmark of thirty-six real, manually validated Qiskit bugs from four popular Qiskit elements… ▽ More

    Submitted 20 September, 2021; v1 submitted 22 August, 2021; originally announced August 2021.

    Comments: The short version of this paper will appear in the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021), New Ideas and Emerging Results (NIER) track, November 15-19, 2021

  32. arXiv:2105.06464  [pdf, other

    cs.CV cs.LG

    DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

    Authors: Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar

    Abstract: We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pai… ▽ More

    Submitted 5 June, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: Tech Report

  33. arXiv:2104.11896  [pdf, other

    cs.CV

    M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

    Authors: Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha

    Abstract: We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids. M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transfo… ▽ More

    Submitted 22 October, 2021; v1 submitted 24 April, 2021; originally announced April 2021.

  34. arXiv:2103.16847  [pdf, other

    eess.IV cs.CV

    A Novel Deep ML Architecture by Integrating Visual Simultaneous Localization and Mapping (vSLAM) into Mask R-CNN for Real-time Surgical Video Analysis

    Authors: Ella Selina Lan

    Abstract: Seven million people suffer surgical complications each year, but with sufficient surgical training and review, 50\% of these complications could be prevented. To improve surgical performance, existing research uses various deep learning (DL) technologies including convolutional neural networks (CNN) and recurrent neural networks (RNN) to automate surgical tool and workflow detection. However, the… ▽ More

    Submitted 17 April, 2022; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: Accepted and Published in ISBI 2022

    ACM Class: I.4.0

  35. arXiv:2008.04437  [pdf, other

    cs.CV

    Distributed Multi-agent Video Fast-forwarding

    Authors: Shuyue Lan, Zhilu Wang, Amit K. Roy-Chowdhury, Ermin Wei, Qi Zhu

    Abstract: In many intelligent systems, a network of agents collaboratively perceives the environment for better and more efficient situation awareness. As these agents often have limited resources, it could be greatly beneficial to identify the content overlapping among camera views from different agents and leverage it for reducing the processing, transmission and storage of redundant/unimportant video fra… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: To appear at ACM Multimedia 2020

  36. arXiv:2007.12324  [pdf, other

    cs.LG cs.AI

    Context-Aware Attentive Knowledge Tracing

    Authors: Aritra Ghosh, Neil Heffernan, Andrew S. Lan

    Abstract: Knowledge tracing (KT) refers to the problem of predicting future learner performance given their past performance in educational applications. Recent developments in KT using flexible deep neural network-based models excel at this task. However, these models often offer limited interpretability, thus making them insufficient for personalized learning, which requires using interpretable feedback a… ▽ More

    Submitted 23 July, 2020; originally announced July 2020.

    Comments: Published in KDD 2020

  37. arXiv:2007.08556  [pdf, other

    cs.CV

    InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling

    Authors: Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis

    Abstract: Real-time 3D object detection is crucial for autonomous cars. Achieving promising performance with high efficiency, voxel-based approaches have received considerable attention. However, previous methods model the input space with features extracted from equally divided sub-regions without considering that point cloud is generally non-uniformly distributed over the space. To address this issue, we… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  38. arXiv:2005.12442  [pdf, other

    cs.LG cs.AI stat.ML

    qDKT: Question-centric Deep Knowledge Tracing

    Authors: Shashank Sonkar, Andrew E. Waters, Andrew S. Lan, Phillip J. Grimaldi, Richard G. Baraniuk

    Abstract: Knowledge tracing (KT) models, e.g., the deep knowledge tracing (DKT) model, track an individual learner's acquisition of skills over time by examining the learner's performance on questions related to those skills. A practical limitation in most existing KT models is that all questions nested under a particular skill are treated as equivalent observations of a learner's ability, which is an inacc… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

  39. arXiv:2005.00696  [pdf, other

    cs.CL

    Robust and Interpretable Grounding of Spatial References with Relation Networks

    Authors: Tsung-Yen Yang, Andrew S. Lan, Karthik Narasimhan

    Abstract: Learning representations of spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal representations for spatial concepts. However, the lack of explicit reasoning over entities makes such approaches vulnerable to noise in input text or state observations… ▽ More

    Submitted 7 October, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Findings of Empirical Methods in Natural Language Processing (EMNLP) 2020

  40. arXiv:2003.12125  [pdf, other

    cs.CV

    SaccadeNet: A Fast and Accurate Object Detector

    Authors: Shiyi Lan, Zhou Ren, Yi Wu, Larry S. Davis, Gang Hua

    Abstract: Object detection is an essential step towards holistic scene understanding. Most existing object detection algorithms attend to certain object areas once and then predict the object locations. However, neuroscientists have revealed that humans do not look at the scene in fixed steadiness. Instead, human eyes move around, locating informative parts to understand the object location. This active per… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

  41. arXiv:2001.10509  [pdf, other

    cs.LG eess.SP stat.ML

    MSE-Optimal Neural Network Initialization via Layer Fusion

    Authors: Ramina Ghods, Andrew S. Lan, Tom Goldstein, Christoph Studer

    Abstract: Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. However, the use of stochastic gradient descent combined with the nonconvexity of the underlying optimization problems renders parameter learning susceptible to initialization. To address this issue, a variety of methods that rely on random parameter initialization or knowledge distillation… ▽ More

    Submitted 28 January, 2020; originally announced January 2020.

    Comments: Extended version of the CISS 2020 paper containing the proof for convolutional layers

  42. arXiv:1908.09021  [pdf, other

    cs.GT cs.LG math.OC stat.ML

    Geometrical Regret Matching

    Authors: Sizhong Lan

    Abstract: We argue that the existing regret matchings for Nash equilibrium approximation conduct "jumpy" strategy updating when the probabilities of future plays are set to be proportional to positive regret measures. We propose a geometrical regret matching which features "smooth" strategy updating. Our approach is simple, intuitive and natural. The analytical and numerical results show that, continuously… ▽ More

    Submitted 23 January, 2020; v1 submitted 18 August, 2019; originally announced August 2019.

    Comments: 11 pages, 22 figures; https://github.com/lansiz/eqpt with code and hands-on demos

  43. arXiv:1907.06713  [pdf, other

    cs.CV

    MaskPlus: Improving Mask Generation for Instance Segmentation

    Authors: Shichao Xu, Shuyue Lan, Qi Zhu

    Abstract: Instance segmentation is a promising yet challenging topic in computer vision. Recent approaches such as Mask R-CNN typically divide this problem into two parts -- a detection component and a mask generation branch, and mostly focus on the improvement of the detection part. In this paper, we present an approach that extends Mask R-CNN with five novel optimization techniques for improving the mask… ▽ More

    Submitted 27 September, 2019; v1 submitted 15 July, 2019; originally announced July 2019.

  44. arXiv:1905.08831  [pdf, other

    cs.SI cs.LG eess.SP stat.ML

    IdeoTrace: A Framework for Ideology Tracing with a Case Study on the 2016 U.S. Presidential Election

    Authors: Indu Manickam, Andrew S. Lan, Gautam Dasarathy, Richard G. Baraniuk

    Abstract: The 2016 United States presidential election has been characterized as a period of extreme divisiveness that was exacerbated on social media by the influence of fake news, trolls, and social bots. However, the extent to which the public became more polarized in response to these influences over the course of the election is not well understood. In this paper we propose IdeoTrace, a framework for (… ▽ More

    Submitted 30 May, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: 9 pages, 4 figures, submitted to ASONAM 2019

  45. arXiv:1901.06408  [pdf

    cs.GR physics.optics

    Metasurfaces for near-eye augmented reality

    Authors: Shoufeng Lan, Xueyue Zhang, Mohammad Taghinejad, Sean Rodrigues, Kyu-Tae Lee, Zhaocheng Liu, Wenshan Cai

    Abstract: Augmented reality (AR) has the potential to revolutionize the way in which information is presented by overlaying virtual information onto a person's direct view of their real-time surroundings. By placing the display on the surface of the eye, a contact lens display (CLD) provides a versatile solution for compact AR. However, an unaided human eye cannot visualize patterns on the CLD simply becaus… ▽ More

    Submitted 18 January, 2019; originally announced January 2019.

    Comments: 23 pages, 9 figures

  46. arXiv:1811.07782  [pdf, other

    cs.CV

    Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN

    Authors: Shiyi Lan, Ruichi Yu, Gang Yu, Larry S. Davis

    Abstract: Recent advances in deep convolutional neural networks (CNNs) have motivated researchers to adapt CNNs to directly model points in 3D point clouds. Modeling local structure has been proven to be important for the success of convolutional architectures, and researchers exploited the modeling of local point sets in the feature extraction hierarchy. However, limited attention has been paid to explicit… ▽ More

    Submitted 19 November, 2018; originally announced November 2018.

  47. arXiv:1806.08468  [pdf, other

    cs.SI cs.CY stat.AP

    Personalized Thread Recommendation for MOOC Discussion Forums

    Authors: Andrew S. Lan, Jonathan C. Spencer, Ziqi Chen, Christopher G. Brinton, Mung Chiang

    Abstract: Social learning, i.e., students learning from each other through social interactions, has the potential to significantly scale up instruction in online education. In many cases, such as in massive open online courses (MOOCs), social learning is facilitated through discussion forums hosted by course providers. In this paper, we propose a probabilistic model for the process of learners posting on su… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

    Comments: To appear at ECML-PKDD 2018

  48. arXiv:1806.03551  [pdf, other

    stat.ML cs.LG eess.SP

    An Estimation and Analysis Framework for the Rasch Model

    Authors: Andrew S. Lan, Mung Chiang, Christoph Studer

    Abstract: The Rasch model is widely used for item response analysis in applications ranging from recommender systems to psychology, education, and finance. While a number of estimators have been proposed for the Rasch model over the last decades, the available analytical performance guarantees are mostly asymptotic. This paper provides a framework that relies on a novel linear minimum mean-squared error (L-… ▽ More

    Submitted 9 June, 2018; originally announced June 2018.

    Comments: To be presented at ICML 2018

  49. arXiv:1806.03547  [pdf, other

    cs.IT eess.SP stat.ML

    Linear Spectral Estimators and an Application to Phase Retrieval

    Authors: Ramina Ghods, Andrew S. Lan, Tom Goldstein, Christoph Studer

    Abstract: Phase retrieval refers to the problem of recovering real- or complex-valued vectors from magnitude measurements. The best-known algorithms for this problem are iterative in nature and rely on so-called spectral initializers that provide accurate initialization vectors. We propose a novel class of estimators suitable for general nonlinear measurement systems, called linear spectral estimators (LSPE… ▽ More

    Submitted 9 June, 2018; originally announced June 2018.

    Comments: To appear at ICML 2018, extended version with supplementary material

  50. arXiv:1805.09001  [pdf, other

    q-bio.NC cs.LG stat.ML

    One-to-one Mapping between Stimulus and Neural State: Memory and Classification

    Authors: Sizhong Lan

    Abstract: Synaptic strength can be seen as probability to propagate impulse, and according to synaptic plasticity, function could exist from propagation activity to synaptic strength. If the function satisfies constraints such as continuity and monotonicity, neural network under external stimulus will always go to fixed point, and there could be one-to-one mapping between external stimulus and synaptic stre… ▽ More

    Submitted 23 April, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: 8 pages, 15 figures, final for AIP Advances