Zum Hauptinhalt springen

Showing 1–50 of 107 results for author: Qi, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01163  [pdf, other

    cs.LG cs.CV

    Benchmarking Predictive Coding Networks -- Made Simple

    Authors: Luca Pinchetti, Chang Qi, Oleh Lokshyn, Gaspard Olivers, Cornelius Emde, Mufeng Tang, Amine M'Charrak, Simon Frieder, Bayar Menzat, Rafal Bogacz, Thomas Lukasiewicz, Tommaso Salvatori

    Abstract: In this work, we tackle the problems of efficiency and scalability for predictive coding networks in machine learning. To do so, we first propose a library called PCX, whose focus lies on performance and simplicity, and provides a user-friendly, deep-learning oriented interface. Second, we use PCX to implement a large set of benchmarks for the community to use for their experiments. As most works… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 33 pages, 25 figures

    ACM Class: I.2.6

  2. arXiv:2406.09238  [pdf, other

    cs.IT eess.SP

    Near-Field Multiuser Communications based on Sparse Arrays

    Authors: Kangjian Chen, Chenhao Qi, Geoffrey Ye Li, Octavia A. Dobre

    Abstract: This paper considers near-field multiuser communications based on sparse arrays (SAs). First, for the uniform SAs (USAs), we analyze the beam gains of channel steering vectors, which shows that increasing the antenna spacings can effectively improve the spatial resolution of the antenna arrays to enhance the sum rate of multiuser communications. Then, we investigate nonuniform SAs (NSAs) to mitiga… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2405.03113  [pdf, other

    cs.RO cs.AI

    Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

    Authors: Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum

    Abstract: Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  4. arXiv:2405.02243  [pdf, other

    cs.RO

    Towards Improving Learning from Demonstration Algorithms via MCMC Methods

    Authors: Carl Qi, Edward Sun, Harry Zhang

    Abstract: Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging impl… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2207.04638, arXiv:2204.03597 by other authors

  5. arXiv:2404.19531  [pdf, other

    cs.CV

    MoST: Multi-modality Scene Tokenization for Motion Prediction

    Authors: Norman Mu, Jingwei Ji, Zhenpei Yang, Nate Harada, Haotian Tang, Kan Chen, Charles R. Qi, Runzhou Ge, Kratarth Goel, Zoey Yang, Scott Ettinger, Rami Al-Rfou, Dragomir Anguelov, Yin Zhou

    Abstract: Many existing motion prediction approaches rely on symbolic perception outputs to generate agent trajectories, such as bounding boxes, road graph information and traffic lights. This symbolic representation is a high-level abstraction of the real world, which may render the motion prediction model vulnerable to perception errors (e.g., failures in detecting open-vocabulary obstacles) while missing… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  6. arXiv:2403.08268  [pdf, other

    cs.CV

    Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

    Authors: Yue Ma, Yingqing He, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, Qifeng Chen

    Abstract: Despite recent advances in image-to-video generation, better controllability and local animation are less explored. Most existing image-to-video methods are not locally aware and tend to move the entire scene. However, human artists may need to control the movement of different objects or regions. Additionally, current I2V methods require users not only to describe the target motion but also to pr… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Project Page: https://follow-your-click.github.io/ Github Page: https://github.com/mayuelala/FollowYourClick

  7. arXiv:2402.02321  [pdf, other

    cs.LG

    Active Learning for Graphs with Noisy Structures

    Authors: Hongliang Chi, Cong Qi, Suhang Wang, Yao Ma

    Abstract: Graph Neural Networks (GNNs) have seen significant success in tasks such as node classification, largely contingent upon the availability of sufficient labeled nodes. Yet, the excessive cost of labeling large-scale graphs led to a focus on active learning on graphs, which aims for effective data selection to maximize downstream model performance. Notably, most existing methods assume reliable grap… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  8. Triple-Refined Hybrid-Field Beam Training for mmWave Extremely Large-Scale MIMO

    Authors: Kangjian Chen, Chenhao Qi, Octavia A. Dobre, Geoffrey Ye Li

    Abstract: This paper investigates beam training for extremely large-scale multiple-input multiple-output systems. By considering both the near field and far field, a triple-refined hybrid-field beam training scheme is proposed, where high-accuracy estimates of channel parameters are obtained through three steps of progressive beam refinement. First, the hybrid-field beam gain (HFBG)-based first refinement m… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Journal ref: IEEE Transactions on Wireless Communications, 2024

  9. arXiv:2401.08976  [pdf

    cs.LG eess.SP

    ACT-GAN: Radio map construction based on generative adversarial networks with ACT blocks

    Authors: Chen Qi, Yang Jingjing, Huang Ming, Zhou Qiang

    Abstract: The radio map, serving as a visual representation of electromagnetic spatial characteristics, plays a pivotal role in assessment of wireless communication networks and radio monitoring coverage. Addressing the issue of low accuracy existing in the current radio map construction, this paper presents a novel radio map construction method based on generative adversarial network (GAN) in which the Agg… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 11 pages, 10 figures

  10. arXiv:2312.11595  [pdf, other

    cs.CV

    SPIRE: Semantic Prompt-Driven Image Restoration

    Authors: Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi

    Abstract: Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPI… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by ECCV 2024; Webpage: https://chenyangqiqi.github.io/tip

  11. arXiv:2312.03793  [pdf, other

    cs.CV

    AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

    Authors: Jiwen Yu, Xiaodong Cun, Chenyang Qi, Yong Zhang, Xintao Wang, Ying Shan, Jian Zhang

    Abstract: Large-scale text-to-video (T2V) diffusion models have great progress in recent years in terms of visual quality, motion and temporal consistency. However, the generation process is still a black box, where all attributes (e.g., appearance, motion) are learned and generated jointly without precise control ability other than rough text descriptions. Inspired by image animation which decouples the vi… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: Project Page: https://vvictoryuki.github.io/animatezero.github.io/

  12. arXiv:2312.03047  [pdf, other

    cs.CV

    MagicStick: Controllable Video Editing via Control Handle Transformations

    Authors: Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen

    Abstract: Text-based video editing has recently attracted considerable interest in changing the style or replacing the objects with a similar structure. Beyond this, we demonstrate that properties such as shape, size, location, motion, etc., can also be edited in videos. Our key insight is that the keyframe transformations of the specific internal feature (e.g., edge maps of objects or human pose), can easi… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://magic-stick-edit.github.io/ Github repository: https://github.com/mayuelala/MagicStick

  13. arXiv:2311.16545  [pdf

    cs.NI

    Unravelling DNS Performance: A Historical Examination of F-ROOT in Southeast Asia

    Authors: Jiajia Zhu, Chao Qi

    Abstract: The DNS root server system uses Anycast technology to provide resolution through widely distributed root nodes. In recent years, the F-root node has seen astonishing growth and now boasts the largest number of nodes among the 13 root servers. Based on Ripe Atlas measurement data, we examined the availability and query latency of the F-root within the Southeast Asian region historically. The collec… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 10 pages,4 figures

  14. arXiv:2311.15069  [pdf, ps, other

    cs.IT eess.SP

    Multiuser Beamforming for Partially-Connected Millimeter Wave Massive MIMO

    Authors: Chenhao Qi, Jinlin Hu, Yang Du, Arumugam Nallanathan

    Abstract: Multiuser beamforming is considered for partially-connected millimeter wave massive MIMO systems. Based on perfect channel state information (CSI), a low-complexity hybrid beamforming scheme that decouples the analog beamformer and the digital beamformer is proposed to maximize the sum-rate. The analog beamformer design is modeled as a phase alignment problem to harvest the array gain. Given the a… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  15. arXiv:2311.15066  [pdf, other

    cs.IT eess.SP

    Beam Training and Tracking for Extremely Large-Scale MIMO Communications

    Authors: Kangjian Chen, Chenhao Qi, Cheng-Xiang Wang, Geoffrey Ye Li

    Abstract: In this paper, beam training and beam tracking are investigated for extremely large-scale multiple-input-multiple-output communication systems with partially-connected hybrid combining structures. Firstly, we propose a two-stage hybrid-field beam training scheme for both the near field and the far field. In the first stage, each subarray independently uses multiple far-field channel steering vecto… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  16. arXiv:2311.15062  [pdf, other

    eess.SP cs.IT

    Simultaneous Beam Training and Target Sensing in ISAC Systems with RIS

    Authors: Kangjian Chen, Chenhao Qi, Octavia A. Dobre, Geoffrey Ye Li

    Abstract: This paper investigates an integrated sensing and communication (ISAC) system with reconfigurable intelligent surface (RIS). Our simultaneous beam training and target sensing (SBTTS) scheme enables the base station to perform beam training with the user terminals (UTs) and the RIS, and simultaneously to sense the targets. Based on our findings, the energy of the echoes from the RIS is accumulated… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  17. arXiv:2311.15060  [pdf, ps, other

    eess.SP cs.IT

    Key Issues in Wireless Transmission for NTN-Assisted Internet of Things

    Authors: Chenhao Qi, Jing Wang, Leyi Lyu, Lei Tan, Jinming Zhang, Geoffrey Ye Li

    Abstract: Non-terrestrial networks (NTNs) have become appealing resolutions for seamless coverage in the next-generation wireless transmission, where a large number of Internet of Things (IoT) devices diversely distributed can be efficiently served. The explosively growing number of IoT devices brings a new challenge for massive connection. The long-distance wireless signal propagation in NTNs leads to seve… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: 7 pages, 6 figures

  18. arXiv:2310.18451   

    cs.CY

    Fusion of the Power from Citations: Enhance your Influence by Integrating Information from References

    Authors: Cong Qi, Qin Liu, Kan Liu

    Abstract: Influence prediction plays a crucial role in the academic community. The amount of scholars' influence determines whether their work will be accepted by others. Most existing research focuses on predicting one paper's citation count after a period or identifying the most influential papers among the massive candidates, without concentrating on an individual paper's negative or positive impact on i… ▽ More

    Submitted 25 June, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: There is a problem in section 3

  19. arXiv:2310.05163  [pdf, other

    cs.CL

    An Investigation of LLMs' Inefficacy in Understanding Converse Relations

    Authors: Chengwen Qi, Bowen Li, Binyuan Hui, Bailin Wang, Jinyang Li, Jinwang Wu, Yuanjun Laili

    Abstract: Large Language Models (LLMs) have achieved remarkable success in many formal language oriented tasks, such as structural data-to-text and semantic parsing. However current benchmarks mostly follow the data distribution of the pre-training data of LLMs. Therefore, a natural question rises that do LLMs really understand the structured semantics of formal languages. In this paper, we investigate this… ▽ More

    Submitted 13 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023

  20. arXiv:2310.00156  [pdf, other

    cs.RO cs.AI

    Learning Generalizable Tool-use Skills through Trajectory Generation

    Authors: Carl Qi, Yilin Wu, Lifan Yu, Haoyue Liu, Bowen Jiang, Xingyu Lin, David Held

    Abstract: Autonomous systems that efficiently utilize tools can assist humans in completing many common tasks such as cooking and cleaning. However, current systems fall short of matching human-level of intelligence in terms of adapting to novel tools. Prior works based on affordance often make strong assumptions about the environments and cannot scale to more complex, contact-rich tasks. In this work, we t… ▽ More

    Submitted 23 April, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    ACM Class: I.2.9

  21. arXiv:2309.14491  [pdf, other

    cs.CV

    Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving

    Authors: Mahyar Najibi, Jingwei Ji, Yin Zhou, Charles R. Qi, Xinchen Yan, Scott Ettinger, Dragomir Anguelov

    Abstract: Closed-set 3D perception models trained on only a pre-defined set of object categories can be inadequate for safety critical applications such as autonomous driving where new object types can be encountered after deployment. In this paper, we present a multi-modal auto labeling pipeline capable of generating amodal 3D bounding boxes and tracklets for training models on open-set categories without… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  22. arXiv:2309.02563  [pdf, other

    eess.IV cs.CV

    Evaluation Kidney Layer Segmentation on Whole Slide Imaging using Convolutional Neural Networks and Transformers

    Authors: Muhao Liu, Chenyang Qi, Shunxing Bao, Quan Liu, Ruining Deng, Yu Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: The segmentation of kidney layer structures, including cortex, outer stripe, inner stripe, and inner medulla within human kidney whole slide images (WSI) plays an essential role in automated image analysis in renal pathology. However, the current manual segmentation process proves labor-intensive and infeasible for handling the extensive digital pathology images encountered at a large scale. In re… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  23. arXiv:2308.03014  [pdf, other

    cs.RO

    Learning Multiple Gaits within Latent Space for Quadruped Robots

    Authors: Jinze Wu, Yufei Xue, Chenkun Qi

    Abstract: Learning multiple gaits is non-trivial for legged robots, especially when encountering different terrains and velocity commands. In this work, we present an end-to-end training framework for learning multiple gaits for quadruped robots, tailored to the needs of robust locomotion, agile locomotion, and user's commands. A latent space is constructed concurrently by a gait encoder and a gait generato… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  24. arXiv:2306.03206  [pdf, other

    cs.CV

    MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences

    Authors: Yingwei Li, Charles R. Qi, Yin Zhou, Chenxi Liu, Dragomir Anguelov

    Abstract: Occluded and long-range objects are ubiquitous and challenging for 3D object detection. Point cloud sequence data provide unique opportunities to improve such cases, as an occluded or distant object can be observed from different viewpoints or gets better visibility over time. However, the efficiency and effectiveness in encoding long-term sequence data can still be improved. In this work, we prop… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: CVPR 2023

  25. arXiv:2306.00926  [pdf, other

    cs.CV

    Inserting Anybody in Diffusion Models via Celeb Basis

    Authors: Ge Yuan, Xiaodong Cun, Yong Zhang, Maomao Li, Chenyang Qi, Xintao Wang, Ying Shan, Huicheng Zheng

    Abstract: Exquisite demand exists for customizing the pretrained large text-to-image model, $\textit{e.g.}$, Stable Diffusion, to generate innovative concepts, such as the users themselves. However, the newly-added concept from previous customization methods often shows weaker combination abilities than the original ones even given several images during training. We thus propose a new personalization method… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Project page: http://celeb-basis.github.io ; Github repository: https://github.com/ygtxr1997/CelebBasis

  26. arXiv:2304.03834  [pdf, other

    cs.CV

    WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

    Authors: Kan Chen, Runzhou Ge, Hang Qiu, Rami AI-Rfou, Charles R. Qi, Xuanyu Zhou, Zoey Yang, Scott Ettinger, Pei Sun, Zhaoqi Leng, Mustafa Baniodeh, Ivan Bogun, Weiyue Wang, Mingxing Tan, Dragomir Anguelov

    Abstract: Widely adopted motion forecasting datasets substitute the observed sensory inputs with higher-level abstractions such as 3D boxes and polylines. These sparse shapes are inferred through annotating the original scenes with perception systems' predictions. Such intermediate representations tie the quality of the motion forecasting models to the performance of computer vision models. Moreover, the hu… ▽ More

    Submitted 18 February, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: ICRA 2024 camera ready version. Dataset website: https://waymo.com/open/data/motion/

  27. arXiv:2304.02163  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    GINA-3D: Learning to Generate Implicit Neural Assets in the Wild

    Authors: Bokui Shen, Xinchen Yan, Charles R. Qi, Mahyar Najibi, Boyang Deng, Leonidas Guibas, Yin Zhou, Dragomir Anguelov

    Abstract: Modeling the 3D world from sensor data for simulation is a scalable way of developing testing and validation environments for robotic learning problems such as autonomous driving. However, manually creating or re-creating real-world-like environments is difficult, expensive, and not scalable. Recent generative model techniques have shown promising progress to address such challenges by learning 3D… ▽ More

    Submitted 28 August, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023; Our WOD-ObjectAsset can be accessed through waymo.com/open

  28. arXiv:2304.01064  [pdf, other

    cs.CV eess.IV

    Real-time 6K Image Rescaling with Rate-distortion Optimization

    Authors: Chenyang Qi, Xin Yang, Ka Leong Cheng, Ying-Cong Chen, Qifeng Chen

    Abstract: Contemporary image rescaling aims at embedding a high-resolution (HR) image into a low-resolution (LR) thumbnail image that contains embedded information for HR image reconstruction. Unlike traditional image super-resolution, this enables high-fidelity HR image restoration faithful to the original one, given the embedded information in the LR thumbnail. However, state-of-the-art image rescaling me… ▽ More

    Submitted 19 May, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023; Github Repository: https://github.com/AbnerVictor/HyperThumbnail

  29. Boundary-to-Solution Mapping for Groundwater Flows in a Toth Basin

    Authors: Jingwei Sun, Jun Li, Yonghong Hao, Cuiting Qi, Chunmei Ma, Huazhi Sun, Negash Begashaw, Gurcan Comet, Yi Sun, Qi Wang

    Abstract: In this paper, the authors propose a new approach to solving the groundwater flow equation in the Toth basin of arbitrary top and bottom topographies using deep learning. Instead of using traditional numerical solvers, they use a DeepONet to produce the boundary-to-solution mapping. This mapping takes the geometry of the physical domain along with the boundary conditions as inputs to output the st… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  30. arXiv:2303.10271  [pdf, other

    cs.AR

    VPU-EM: An Event-based Modeling Framework to Evaluate NPU Performance and Power Efficiency at Scale

    Authors: Charles Qi, Yi Wang, Hui Wang, Yang Lu, Shiva Shankar Subramanian, Finola Cahill, Conall Tuohy, Victor Li, Xu Qian, Darren Crews, Ling Wang, Shivaji Roy, Andrea Deidda, Martin Power, Niall Hanrahan, Rick Richmond, Umer Cheema, Arnab Raha, Alessandro Palla, Gary Baugh, Deepak Mathaikutty

    Abstract: State-of-art NPUs are typically architected as a self-contained sub-system with multiple heterogeneous hardware computing modules, and a dataflow-driven programming model. There lacks well-established methodology and tools in the industry to evaluate and compare the performance of NPUs from different architectures. We present an event-based performance modeling framework, VPU-EM, targeting scalabl… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: 8 pages, 9 figures

    ACM Class: B.2.2; B.8.2

  31. arXiv:2303.09535  [pdf, other

    cs.CV

    FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

    Authors: Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, Qifeng Chen

    Abstract: The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness in generation progress, it is still challenging to apply such models for real-world visual content editing, especially in videos. In this paper, we propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt traini… ▽ More

    Submitted 11 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023 as an Oral Presentation. Project page: https://fate-zero-edit.github.io ; GitHub repository: https://github.com/ChenyangQiQi/FateZero

  32. Instance-incremental Scene Graph Generation from Real-world Point Clouds via Normalizing Flows

    Authors: Chao Qi, Jianqin Yin, Jinghang Xu, Pengxiang Ding

    Abstract: This work introduces a new task of instance-incremental scene graph generation: Given a scene of the point cloud, representing it as a graph and automatically increasing novel instances. A graph denoting the object layout of the scene is finally generated. It is an important task since it helps to guide the insertion of novel 3D objects into a real-world scene in vision-based applications like aug… ▽ More

    Submitted 28 August, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted by IEEE TCSVT. The supplementary material is available in the media column of the journal version of the article

  33. arXiv:2301.02969  [pdf

    cs.CV cs.AI

    Multi-scale multi-modal micro-expression recognition algorithm based on transformer

    Authors: Fengping Wang, Jie Li, Chun Qi, Lin Wang, Pan Wang

    Abstract: A micro-expression is a spontaneous unconscious facial muscle movement that can reveal the true emotions people attempt to hide. Although manual methods have made good progress and deep learning is gaining prominence. Due to the short duration of micro-expression and different scales of expressed in facial regions, existing algorithms cannot extract multi-modal multi-scale facial region features w… ▽ More

    Submitted 10 January, 2023; v1 submitted 7 January, 2023; originally announced January 2023.

  34. arXiv:2212.08062  [pdf, other

    cs.CV

    MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

    Authors: Bowen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen

    Abstract: In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects. First, as opposed to interpolating from sparse flow, we claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields. Second, inspired by face-swapping methods, we adaptively fuse the source identity during synthesis, so that the network better pre… ▽ More

    Submitted 26 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: CVPR 2023, project page: https://meta-portrait.github.io

  35. arXiv:2212.03267  [pdf, other

    cs.CV

    NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors

    Authors: Congyue Deng, Chiyu "Max'' Jiang, Charles R. Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, Dragomir Anguelov

    Abstract: 2D-to-3D reconstruction is an ill-posed problem, yet humans are good at solving this problem due to their prior knowledge of the 3D world developed over years. Driven by this observation, we propose NeRDi, a single-view NeRF synthesis framework with general image priors from 2D diffusion models. Formulating single-view reconstruction as an image-conditioned 3D generation problem, we optimize the N… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  36. arXiv:2210.15751  [pdf, other

    cs.RO cs.AI

    Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation

    Authors: Xingyu Lin, Carl Qi, Yunchu Zhang, Zhiao Huang, Katerina Fragkiadaki, Yunzhu Li, Chuang Gan, David Held

    Abstract: Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels. Previous methods typically either focus on short-horizon tasks or make strong assumptions that full-state information is available, which prevents their use on deformable objects. In this paper, we propose PlAnning with Spatial-Temporal Abstraction (PASTA), whic… ▽ More

    Submitted 23 June, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Published at the Conference on Robot Learning (CoRL 2022)

  37. arXiv:2210.15492  [pdf

    cs.CV eess.IV

    Reconstruction of compressed spectral imaging based on global structure and spectral correlation

    Authors: Pan Wang, Jie Li, Jieru Chen, Lin Wang, Chun Qi

    Abstract: In this paper, a convolutional sparse coding method based on global structure characteristics and spectral correlation is proposed for the reconstruction of compressive spectral images. The spectral data is regarded as the convolution sum of the convolution kernel and the corresponding coefficients, using the convolution kernel operates the global image information, preserving the structure inform… ▽ More

    Submitted 9 January, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

  38. arXiv:2210.13729  [pdf, other

    cs.AI cs.CL cs.CV

    Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition Penalty

    Authors: Wenting Xu, Zhenghua Xu, Junyang Chen, Chang Qi, Thomas Lukasiewicz

    Abstract: To reduce doctors' workload, deep-learning-based automatic medical report generation has recently attracted more and more research efforts, where deep convolutional neural networks (CNNs) are employed to encode the input images, and recurrent neural networks (RNNs) are used to decode the visual features into medical reports automatically. However, these state-of-the-art methods mainly suffer from… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: This paper is current under peer-review in IEEE TNNLS

  39. arXiv:2210.08375  [pdf, other

    cs.CV cs.LG

    Improving the Intra-class Long-tail in 3D Detection via Rare Example Mining

    Authors: Chiyu Max Jiang, Mahyar Najibi, Charles R. Qi, Yin Zhou, Dragomir Anguelov

    Abstract: Continued improvements in deep learning architectures have steadily advanced the overall performance of 3D object detectors to levels on par with humans for certain tasks and datasets, where the overall performance is mostly driven by common examples. However, even the best performing models suffer from the most naive mistakes when it comes to rare examples that do not appear frequently in the tra… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV) 2022

    MSC Class: 68T45

  40. arXiv:2210.08064  [pdf, other

    cs.CV cs.RO

    LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds

    Authors: Minghua Liu, Yin Zhou, Charles R. Qi, Boqing Gong, Hao Su, Dragomir Anguelov

    Abstract: Semantic segmentation of LiDAR point clouds is an important task in autonomous driving. However, training deep models via conventional supervised methods requires large datasets which are costly to label. It is critical to have label-efficient segmentation approaches to scale up the model to new operational domains or to improve performance on rare cases. While most prior works focus on indoor sce… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  41. arXiv:2210.08061  [pdf, other

    cs.CV cs.LG cs.RO

    Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving

    Authors: Mahyar Najibi, Jingwei Ji, Yin Zhou, Charles R. Qi, Xinchen Yan, Scott Ettinger, Dragomir Anguelov

    Abstract: Learning-based perception and prediction modules in modern autonomous driving systems typically rely on expensive human annotation and are designed to perceive only a handful of predefined object categories. This closed-set paradigm is insufficient for the safety-critical autonomous driving task, where the autonomous vehicle needs to process arbitrarily many types of traffic participants and their… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: ECCV 2022

  42. arXiv:2210.05018  [pdf, other

    cs.CV

    LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds

    Authors: Chenxi Liu, Zhaoqi Leng, Pei Sun, Shuyang Cheng, Charles R. Qi, Yin Zhou, Mingxing Tan, Dragomir Anguelov

    Abstract: Developing neural models that accurately understand objects in 3D point clouds is essential for the success of robotics and autonomous driving. However, arguably due to the higher-dimensional nature of the data (as compared to images), existing neural architectures exhibit a large variety in their designs, including but not limited to the views considered, the format of the neural features, and th… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: ECCV 2022

  43. arXiv:2208.00714  [pdf, other

    cs.IT eess.SP

    Hybrid Precoding for Mixture Use of Phase Shifters and Switches in mmWave Massive MIMO

    Authors: Chenhao Qi, Qiang Liu, Xianghao Yu, Geoffrey Ye Li

    Abstract: A variable-phase-shifter (VPS) architecture with hybrid precoding for mixture use of phase shifters and switches, is proposed for millimeter wave massive multiple-input multiple-output communications. For the VPS architecture, a hybrid precoding design (HPD) scheme, called VPS-HPD, is proposed to optimize the phases according to the channel state information by alternately optimizing the analog pr… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

  44. Real-time Streaming Video Denoising with Bidirectional Buffers

    Authors: Chenyang Qi, Junming Chen, Xin Yang, Qifeng Chen

    Abstract: Video streams are delivered continuously to save the cost of storage and device memory. Real-time denoising algorithms are typically adopted on the user device to remove the noise involved during the shooting and transmission of video streams. However, sliding-window-based methods feed multiple input frames for a single output and lack computation efficiency. Recent multi-output inference works pr… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted to ACM MM 2022; Github link: https://github.com/ChenyangQiQi/BSVD ;

  45. arXiv:2207.04638  [pdf, other

    cs.RO

    Learning Closed-loop Dough Manipulation Using a Differentiable Reset Module

    Authors: Carl Qi, Xingyu Lin, David Held

    Abstract: Deformable object manipulation has many applications such as cooking and laundry folding in our daily lives. Manipulating elastoplastic objects such as dough is particularly challenging because dough lacks a compact state representation and requires contact-rich interactions. We consider the task of flattening a piece of dough into a specific shape from RGB-D images. While the task is seemingly in… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  46. arXiv:2206.03666  [pdf, other

    cs.CV

    Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking

    Authors: Longlong Jing, Ruichi Yu, Henrik Kretzschmar, Kang Li, Charles R. Qi, Hang Zhao, Alper Ayvaci, Xu Chen, Dillon Cower, Yingwei Li, Yurong You, Han Deng, Congcong Li, Dragomir Anguelov

    Abstract: Monocular image-based 3D perception has become an active research area in recent years owing to its applications in autonomous driving. Approaches to monocular 3D perception including detection and tracking, however, often yield inferior performance when compared to LiDAR-based techniques. Through systematic analysis, we identified that per-object depth estimation accuracy is a major factor boundi… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Journal ref: ICRA2022

  47. arXiv:2206.01738  [pdf, other

    eess.IV cs.CV

    RIDDLE: Lidar Data Compression with Range Image Deep Delta Encoding

    Authors: Xuanyu Zhou, Charles R. Qi, Yin Zhou, Dragomir Anguelov

    Abstract: Lidars are depth measuring sensors widely used in autonomous driving and augmented reality. However, the large volume of data produced by lidars can lead to high costs in data storage and transmission. While lidar data can be represented as two interchangeable representations: 3D point clouds and range images, most previous work focus on compressing the generic 3D point clouds. In this work, we sh… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: 14 pages, 10 figures; CVPR 2022

  48. arXiv:2205.05703  [pdf, other

    cs.CV cs.RO

    Multi-Class 3D Object Detection with Single-Class Supervision

    Authors: Mao Ye, Chenxi Liu, Maoqing Yao, Weiyue Wang, Zhaoqi Leng, Charles R. Qi, Dragomir Anguelov

    Abstract: While multi-class 3D detectors are needed in many robotics applications, training them with fully labeled datasets can be expensive in labeling cost. An alternative approach is to have targeted single-class labels on disjoint data samples. In this paper, we are interested in training a multi-class 3D object detection model, while using these single-class labeled data. We begin by detailing the uni… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: ICRA 2022

  49. arXiv:2204.03597  [pdf, other

    cs.LG cs.AI

    Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

    Authors: Carl Qi, Pieter Abbeel, Aditya Grover

    Abstract: The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via inverse reinforcement learning (IRL) followed by maximizing this reward function via reinforcement learning (RL). The policies learned via these approaches are however very brittle in practice and deteriora… ▽ More

    Submitted 18 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  50. arXiv:2203.06450  [pdf, ps, other

    eess.SP cs.IT

    Low-Complexity Multicast Beamforming for Millimeter Wave Communications

    Authors: Zhaohui Li, Chenhao Qi, Geoffrey Ye Li

    Abstract: To develop a low-complexity multicast beamforming method for millimeter wave communications, we first propose a channel gain estimation method in this article. We use the beam sweeping to find the best codeword and its two neighboring codewords to form a composite beam. We then estimate the channel gain based on the composite beam, which is computed off-line by minimizing the variance of beam gain… ▽ More

    Submitted 12 March, 2022; originally announced March 2022.