Skip to main content

Showing 1–5 of 5 results for author: Pei, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18070  [pdf, other

    cs.CV

    EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

    Authors: Baoqi Pei, Guo Chen, Jilan Xu, Yuping He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, Yu Qiao

    Abstract: In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the uniqu… ▽ More

    Submitted 30 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Champion solutions in the EgoVis CVPR 2024 workshop

  2. arXiv:2403.16182  [pdf, other

    cs.CV

    EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

    Authors: Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao

    Abstract: Being able to map the activities of others into one's own point of view is one fundamental human skill even from a very early age. Taking a step toward understanding this human ability, we introduce EgoExoLearn, a large-scale dataset that emulates the human demonstration following process, in which individuals record egocentric videos as they execute tasks guided by demonstration videos. Focusing… ▽ More

    Submitted 5 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  3. arXiv:2403.15377  [pdf, other

    cs.CV

    InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

    Authors: Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang

    Abstract: We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue. Our approach employs a progressive training paradigm that unifies the different self- or weakly-supervised learning frameworks of masked video token reconstruction, cross-modal contrastive learning, and next token predict… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: a technical report about video understanding

  4. arXiv:2403.09626  [pdf, other

    cs.CV

    Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

    Authors: Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, Limin Wang

    Abstract: Understanding videos is one of the fundamental directions in computer vision research, with extensive efforts dedicated to exploring various architectures such as RNN, 3D CNN, and Transformers. The newly proposed architecture of state space model, e.g., Mamba, shows promising traits to extend its success in long sequence modeling to video modeling. To assess whether Mamba can be a viable alternati… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Technical Report

  5. arXiv:2107.06170  [pdf

    eess.SP cs.IT math.OC

    Robust Blind Source Separation by Soft Decision-Directed Non-Unitary Joint Diagonalization

    Authors: Wenjuan Liu, Dazheng Feng, Bingnan Pei, Mengdao Xing, Xinhong Meng, Qianru Wei

    Abstract: Approximate joint diagonalization of a set of matrices provides a powerful framework for numerous statistical signal processing applications. For non-unitary joint diagonalization (NUJD) based on the least-squares (LS) criterion, outliers, also referred to as anomaly or discordant observations, have a negative influence on the performance, since squaring the residuals magnifies the effects of them… ▽ More

    Submitted 28 June, 2021; originally announced July 2021.

    Comments: 19 pages, 9 figures