First Joint Egocentric Vision (EgoVis) Workshop

Held in Conjunction with CVPR 2024

17 June 2024 - Seattle, USA
Room: Summit 428

This joint workshop aims to be the focal point for the egocentric computer vision community to meet and discuss progress in this fast growing research area, addressing egocentric vision in a comprehensive manner including key research challenges in video understanding, multi-modal data, interaction learning, self-supervised learning, AR/VR with applications to cognitive science and robotics.

Overview

Wearable cameras, smart glasses, and AR/VR headsets are gaining importance for research and commercial use. They feature various sensors like cameras, depth sensors, microphones, IMUs, and GPS. Advances in machine perception enable precise user localization (SLAM), eye tracking, and hand tracking. This data allows understanding user behavior, unlocking new interaction possibilities with augmented reality. Egocentric devices may soon automatically recognize user actions, surroundings, gestures, and social relationships. These devices have broad applications in assistive technology, education, fitness, entertainment, gaming, eldercare, robotics, and augmented reality, positively impacting society.

Previously, research in this field faced challenges due to limited datasets in a data-intensive environment. However, the community's recent efforts have addressed this issue by releasing numerous large-scale datasets covering various aspects of egocentric perception, including HoloAssist, Aria Digital Twin, Aria Synthetic Environments, Ego4D, Ego-Exo4D, and EPIC-KITCHENS.

The goal of this workshop is to provide an exciting discussion forum for researchers working in this challenging and fast-growing area, and to provide a means to unlock the potential of data-driven research with our datasets to further the state-of-the-art.

Challenges

We welcome submissions to the challenges from March to May (see important dates) through the leaderboards linked below. Participants to the challenges are are requested to submit a technical report on their method. This is a requirement for the competition. Reports should be 2-6 pages including references. Submissions should use the CVPR format and should be submitted through the CMT website.

HoloAssist Challenges

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Action Recognition Mahdi Rad, Microsoft, Switzerland Link
2 Mistake Detection Ishani Chakraborty, Microsoft, US Link
3 Intervention Type Prediction Taein Kwon, ETH Zurich, Switzerland Link

Aria Digital Twin Challenges

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Few-shots 3D Object detection & tracking Xiaqing Pan, Meta, US Link
2 3D Object detection & tracking Xiaqing Pan, Meta, US Link

Aria Synthetic Environments Challenges

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Scene Reconstruction using structured language Vasileios Baltnas, Meta, UK Link

Ego4D Challenges

Ego4D is a massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video. Please find details below on our challenges:

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Visual Queries 2D Santhosh Kumar Ramakrishnan, University of Texas, Austin, US Link
2 Visual Queries 3D Vincent Cartillier, Georgia Tech, US Link
3 Natural Language Queries Satwik Kottur, Meta, US Link
4 Moment Queries Chen Zhao & Merey Ramazanova, KAUST, SA Link
5 EgoTracks Hao Tang & Weiyao Wang, Meta, US Link
6 Goal Step Yale Song, Meta, US Link
7 Ego Schema Karttikeya Mangalam, Raiymbek Akshulakov, UC Berkeley, US Link
8 PNR temporal localization Yifei Huang, University of Tokyo, JP Link
9 Localization and Tracking Hao Jiang, Meta, US Link
10 Speech Transcription Leda Sari Jachym Kolar & Vamsi Krishna Ithapu, Meta Reality Labs, US Link
11 Looking at me Eric Zhongcong Xu, National University of Singapore, Singapore Link
12 Short-term Anticipation Francesco Ragusa, University of Catania, IT Link
13 Long-term Anticipation Tushar Nagarajan, FAIR, US Link

Ego-Exo4D Challenges

Ego-Exo4D is a diverse, large-scale multi-modal multi view video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured ego- centric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair).

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Ego-Pose Body Pablo Arbelaez & Maria Camila Escobar Palomeque, Universidad de los Andes Colombia Link
2 Ego-Pose Hands Jianbo Shi, Shan Shu, University of Pennsylvania, US Link

EPIC-Kitchens Challenges

Please check the EPIC-KITCHENS website for more information on the EPIC-KITCHENS challenges. Links to individual challenges are also reported below.

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Action Recognition Jacob Chalk, University of Bristol, UK Link
2 Action Anticipation Antonino Furnari and Francesco Ragusa University of Catania, IT Link
3 Action Detection Francesco Ragusa and Antonino Furnari, University of Catania, IT Link
4 Domain Adaptation for Action Recognition Toby Perrett, University of Bristol, UK Link
5 Multi-Instance Retrieval Michael Wray, University of Bristol, UK Link
6 Semi-Supervised Video-Object Segmentation Ahmad Dar Khalil, University of Bristol, UK Link
7 Hand-Object Segmentation Dandan Shan, University of Michigan, US Link
8 EPIC-SOUNDS Audio-Based Interaction Recognition Jacob Chalk, University of Bristol, UK Link
9 TREK-150 Object Tracking Matteo Dunnhofer, University of Udine, IT Link
10 EPIC-SOUNDS Audio-Based Interaction Detection Jacob Chalk, University of Bristol, UK Link

Winners

Benchmark Challenge Team Rank Winner Names Technical Report Code
EPIC-KITCHENS Action Recognition 1 Shuming Liu (KAUST)*; Lin Sui (Nanjing University); Chen-Lin Zhang (Moonshot AI); Fangzhou Mu (NVIDIA); Chen Zhao (KAUST); Bernard Ghanem (KAUST) Coming Soon... Coming Soon...
EPIC-KITCHENS Action Recognition 2 Yingxin Xia (DeepGlint); Ninghua Yang (DeepGlint)*; Kaicheng Yang (DeepGlint); Xiang An (DeepGlint); Xiangzi Dai (DeepGlint); Weimo Deng (DeepGlint); Ziyong Feng (DeepGlint) Coming Soon... Coming Soon...
EPIC-KITCHENS Action Recognition 2 Yingxin Xia (DeepGlint); Ninghua Yang (DeepGlint)*; Kaicheng Yang (DeepGlint); Xiang An (DeepGlint); Xiangzi Dai (DeepGlint); Weimo Deng (DeepGlint); Ziyong Feng (DeepGlint) Coming Soon... Coming Soon...
EPIC-KITCHENS Action Recognition 3 Jilan Xu (Fudan University)*; Baoqi Pei (Zhejiang University); Yifei Huang (The University of Tokyo); Guo Chen (Nanjing University); Yicheng Liu (Nanjing University); Yuping He (Nanjing University); Kanghua Pan (Nanjing University); Yali Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Tong Lu (Nanjing University); Limin Wang (Nanjing University); Yu Qiao (Shanghai Artificial Intelligence Laboratory) Coming Soon... Coming Soon...
EPIC-KITCHENS Action Detection 1 Shuming Liu (KAUST)*; Lin Sui (Nanjing University); Chen-Lin Zhang (Moonshot AI); Fangzhou Mu (NVIDIA); Chen Zhao (KAUST); Bernard Ghanem (KAUST) Coming Soon... Coming Soon...
EPIC-KITCHENS Action Detection 2 Yingxin Xia (DeepGlint); Ninghua Yang (DeepGlint)*; Kaicheng Yang (DeepGlint); Xiang An (DeepGlint); Xiangzi Dai (DeepGlint); Weimo Deng (DeepGlint); Ziyong Feng (DeepGlint) Coming Soon... Coming Soon...
EPIC-KITCHENS Action Detection 2 Yingxin Xia (DeepGlint); Ninghua Yang (DeepGlint)*; Kaicheng Yang (DeepGlint); Xiang An (DeepGlint); Xiangzi Dai (DeepGlint); Weimo Deng (DeepGlint); Ziyong Feng (DeepGlint) Coming Soon... Coming Soon...
EPIC-KITCHENS Action Detection 3 Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen Coming Soon... Coming Soon...
EPIC-KITCHENS Unsupervised Domain Adaptation for Action Recognition 1 Jilan Xu (Fudan University)*; Baoqi Pei (Zhejiang University); Yifei Huang (The University of Tokyo); Guo Chen (Nanjing University); Yicheng Liu (Nanjing University); Yuping He (Nanjing University); Kanghua Pan (Nanjing University); Yali Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Tong Lu (Nanjing University); Limin Wang (Nanjing University); Yu Qiao (Shanghai Artificial Intelligence Laboratory) Coming Soon... Coming Soon...
EPIC-KITCHENS Multi-Instance Retrieval 1 XIAOQI WANG (The Hong Kong Polytechnic University); Yi Wang (The Hong Kong Polytechnic University); Lap-Pui Chau (The Hong Kong Polytechnic University)* Coming Soon... Coming Soon...
EPIC-KITCHENS Multi-Instance Retrieval 2 Jilan Xu (Fudan University)*; Baoqi Pei (Zhejiang University); Yifei Huang (The University of Tokyo); Guo Chen (Nanjing University); Yicheng Liu (Nanjing University); Yuping He (Nanjing University); Kanghua Pan (Nanjing University); Yali Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Tong Lu (Nanjing University); Limin Wang (Nanjing University); Yu Qiao (Shanghai Artificial Intelligence Laboratory) Coming Soon... Coming Soon...
EPIC-KITCHENS Multi-Instance Retrieval 3 Jiamin Cao (Xidian University)*; Lingqi Wang (Xidian University); Jiayao Hao (Xidian University ); Shuyuan Yang (Xidian University); Licheng Jiao (Xidian University) Coming Soon... Coming Soon...
EPIC-KITCHENS Video Object Segmentation 1 Qinliang Wang (xidian university)*; xuejian Gou (xidian university); Zhongjian Huang (Xidian University); Lingling Li (Xidian University); Fang Liu (Xidian University) Coming Soon... Coming Soon...
EPIC-KITCHENS Video Object Segmentation 2 Sen Jia (Xidian University)*; Xinyue Yu (Xidian University); Long Sun (Xidian University); Licheng Jiao (Xidian University); Shuyuan Yang (Xidian University) Coming Soon... Coming Soon...
EPIC-KITCHENS Video Object Segmentation 3 Libo Yan (Xidian University)*; Shizhan Zhao (Xidian University); Zhang Yanzhao (Xidian University); Xu Liu (Xidian University); Puhua Chen (Xidian University) Coming Soon... Coming Soon...
EPIC-KITCHENS Audio-Based Interaction Recognition 1 Lingqi Wang (Xidian University)*; Jiamin Cao (Xidian University); xuejian Gou (xidian university); Lingling Li (Xidian University); Fang Liu (Xidian University) Coming Soon... Coming Soon...
EPIC-KITCHENS Audio-Based Interaction Recognition 2 Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen Coming Soon... Coming Soon...
EPIC-KITCHENS Audio-Based Interaction Recognition 3 Shizhan Zhao (Xidian University)*; Libo Yan (Xidian University); Zhang Yanzhao (Xidian University); Licheng Jiao (Xidian University); Xu Liu (Xidian University); Yuwei Guo (Xidian University) Coming Soon... Coming Soon...
EPIC-KITCHENS Audio-Based Interaction Detection 1 Shuming Liu (KAUST)*; Lin Sui (Nanjing University); Chen-Lin Zhang (Moonshot AI); Fangzhou Mu (NVIDIA); Chen Zhao (KAUST); Bernard Ghanem (KAUST) Coming Soon... Coming Soon...
EPIC-KITCHENS Audio-Based Interaction Detection 2 Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen Coming Soon... Coming Soon...
EPIC-KITCHENS Audio-Based Interaction Detection 3 xuejian Gou (xidian university)*; Qinliang Wang (xidian university); Jiamin Cao (Xidian University); Lingling Li (Xidian University); Fang Liu (Xidian University) Coming Soon... Coming Soon...
HoloAssist - Mistake Detection
Mistake Detection 1 Michele Mazzamuto (University of Catania), Antonino Furnari (University of Catania), and Giovanni Maria Farinella (University of Catania) Coming Soon... Coming Soon...
HoloAssist - Action Recognition
Fine-grained action recognition task 1 Artem Merinov (Free University of Bozen-Bolzano), Oswald Lanz (Free University of Bozen-Bolzano) Coming Soon... Coming Soon...
Ego-Exo4D Ego-Pose Hands 1 Feng Chen, Lenovo Research
Ling Ding, Lenovo Research
Kanokphan Lertniphonphan, Lenovo Research
Jian Li, Lenovo Research
Kaer Huang, Lenovo Research
Zhepeng Wang, Lenovo Research
Coming Soon... Coming Soon...
Ego-Exo4D Ego-Pose Hands 2 Georgios Pavlakos, UT Austin
Dandan Shan, University of Michigan
Ilija Radosavovic, UC Berkeley
Angjoo Kanazawa, UC Berkeley
David Fouhey, New York University
Jitendra Malik, UC Berkeley
Coming Soon... Coming Soon...
Ego-Exo4D Ego-Pose Hands 3 Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Guo Chen, Nanjing University, Shanghai AI Laboratory
Jilan Xu, Fudan University, Shanghai AI Laboratory
Yicheng Liu, Nanjing University
Yuping He, Nanjing University
Kanghua Pan, Nanjing University
Tong Lu, Nanjing University
Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory
Limin Wang, Nanjing University, Shanghai AI Laboratory
Yu Qiao, Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego-Exo4D Ego-Pose Body 1 Jilan Xu, Fudan University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Guo Chen, Nanjing University, Shanghai AI Laboratory
Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Yicheng Liu, Nanjing University
Yuping He, Nanjing University
Kanghua Pan, Nanjing University
Tong Lu, Nanjing University
Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory
Limin Wang, Nanjing University, Shanghai AI Laboratory
Yu Qiao, Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego-Exo4D Ego-Pose Body 3 Congsheng Xu, Shanghai Jiaotong University Coming Soon... Coming Soon...
Ego-Exo4D Ego-Pose Body 2 Brent Yi, UC Berkeley
Vickie Ye, UC Berkeley
Georgios Pavlakos, UT Austin
Lea Müller, UC Berkeley
Maya Zheng, UC Berkeley
Yi Ma, UC Berkeley
Jitendra Malik, UC Berkeley
Angjoo Kanazawa, UC Berkeley
Coming Soon... Coming Soon...
Ego4D Goal Step 1 Carlos Plou, Universidad de Zaragoza
Lorenzo Mur-Labadia, University of Zaragoza
Ruben Martinez-Cantin, University of Zaragoza
Ana Murillo, Universidad de Zaragoza
Coming Soon... Coming Soon...
Ego4D Goal Step 2 Yuping He, Nanjing University
Guo Chen, Nanjing University, Shanghai AI Laboratory
Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Yicheng Liu, Nanjing University
Kanghua Pan, Nanjing University
Jilan Xu, Fudan University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory
Tong Lu, Nanjing University
Limin Wang, Nanjing University, Shanghai AI Laboratory
Yu Qiao, Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego4D Moments Queries 2 Kanghua Pan, Nanjing University
Yuping He, Nanjing University
Guo Chen, Nanjing University, Shanghai AI Laboratory
Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Yicheng Liu, Nanjing University
Jilan Xu, Fudan University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory
Tong Lu, Nanjing University
Limin Wang, Nanjing University, Shanghai AI Laboratory (1,6)
Yu Qiao, Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego4D Natural Language Queries 1 Yuping He, Nanjing University
Guo Chen, Nanjing University, Shanghai AI Laboratory
Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Yicheng Liu, Nanjing University
Kanghua Pan, Nanjing University
Jilan Xu, Fudan University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory
Tong Lu, Nanjing University
Limin Wang, Nanjing University, Shanghai AI Laboratory
Yu Qiao,Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego4D Short-term Object Interaction Anticipation 1 Guo Chen, Nanjing University, Shanghai AI Laboratory
Yuping He, Nanjing University
Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Yicheng Liu, Nanjing University
Kanghua Pan, Nanjing University
Jilan Xu, Fudan University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Yali Wang, Shenzhen Institute of Advanced Technology Shanghai AI Laboratory
Tong Lu, Nanjing University
Limin Wang, Nanjing University, Shanghai AI Laboratory
Yu Qiao, Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego4D Long-term Action Anticipation 1 Yicheng Liu, Nanjing University
Guo Chen, Nanjing University, Shanghai AI Laboratory
Yuping He, Nanjing University
Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Kanghua Pan, Nanjing University
Jilan Xu, Fudan University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory
Tong Lu, Nanjing University
Limin Wang, Nanjing University, Shanghai AI Laboratory
Yu Qiao, Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego4D Ego Schema 2 Noriyuki Kugo, Panasonic Connect
Tatsuya Ishibashi, Panasonic Connect
Kosuke Ono, Panasonic Connect
Yuji Sato, Panasonic Connect
Coming Soon... Coming Soon...
Ego4D Ego Schema 3 Ying Wang, NYU
Yanlai Yang, NYU
Mengye Ren, NYU
Coming Soon... Coming Soon...
Ego4D Ego Schema 1 Haoyu Zhang, Harbin Institute of Technology,
Yuquan Xie, Harbin Institute of Technology
Yisen Feng, Harbin Institute of Technology
Zaijing Li, Harbin Institute of Technology
Meng Liu, Shandong Jianzhu University
Liqiang Nie, Harbin Institute of Technology
Coming Soon... Coming Soon...
Ego4D Long-term Action Anticipation 2 Zeyun Zhong, Karlsruhe Institute of Technology
Manuel Martin, Fraunhofer IOSB
Dr. Frederik Diederichs, Fraunhofer IOSB
Jürgen Beyerer, Fraunhofer IOSB
Coming Soon... Coming Soon...
Ego4D Looking at Me 2 Xin Li, University of Science and Technology Beijing
Xu Han, University of Science and Technology Beijing
Bochao Zou, University of Science and Technology Beijing
Huimin Ma, University of Science and Technology Beijing
Coming Soon... Coming Soon...
Ego4D Looking at Me 1 Kanokphan Lertniphonphan, Lenovo Research
Jun Xie, Lenovo Research
Yaqing Meng, Chinese Academy of Sciences
Shijing Wang, Beijing Jiaotong University
Feng Chen, Lenovo Research
Zhepeng Wang, Lenovo Research
Coming Soon... Coming Soon...
Ego4D Moments Queries 1 Shuming Liu, King Abdullah University of Science and Technology
Chen-Lin Zhang, Moonshot AI
Fangzhou Mu, NVIDIA
Bernard Ghanem, King Abdullah University of Science and Technology
Coming Soon... Coming Soon...
Ego4D Natural Language Queries 2 Haoyu Zhang, Harbin Institute of Technology,
Yuquan Xie, Harbin Institute of Technology
Yisen Feng, Harbin Institute of Technology
Zaijing Li, Harbin Institute of Technology
Meng Liu, Shandong Jianzhu University
Liqiang Nie, Harbin Institute of Technology
Coming Soon... Coming Soon...
Goal Step 3 Haoyu Zhang, Harbin Institute of Technology,
Yuquan Xie, Harbin Institute of Technology
Yisen Feng, Harbin Institute of Technology
Zaijing Li, Harbin Institute of Technology
Meng Liu, Shandong Jianzhu University
Liqiang Nie, Harbin Institute of Technology
Coming Soon... Coming Soon...
Ego4D Short-term Object Interaction Anticipation 3 Hyunjin Cho, Department of ECE, Seoul National University
Dong Un Kang, Department of ECE, Seoul National University
Se Young Chun, Department of ECE, Seoul National University
Coming Soon... Coming Soon...
Ego4D Short-term Object Interaction Anticipation 2 Lorenzo Mur-Labadia, University of Zaragoza
Jose Guerrero, Universidad de Zaragoza
Ruben Martinez-Cantin, University of Zaragoza
Giovanni Maria Farinella, University of Catania
Coming Soon... Coming Soon...
Ego4D Visual Queries 2D 1 Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Guo Chen, Nanjing University, Shanghai AI Laboratory
Jilan Xu, Fudan University, Shanghai AI Laboratory
Yicheng Liu, Nanjing University
Yuping He, Nanjing University
Kanghua Pan, Nanjing University
Tong Lu, Nanjing University
Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory
Limin Wang, Nanjing University, Shanghai AI Laboratory
Yu Qiao, Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego4D Visual Queries 3D 2 Jilan Xu, Fudan University, Shanghai AI Laboratory
Yifei Huang, University of Tokyo, Shanghai AI Laboratory
Guo Chen, Nanjing University, Shanghai AI Laboratory
Baoqi Pei, Zhejiang University, Shanghai AI Laboratory
Yicheng Liu, Nanjing University
Yuping He, Nanjing University
Kanghua Pan, Nanjing University
Tong Lu, Nanjing University
Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory
Limin Wang, Nanjing University
Yu Qiao, Shanghai AI Laboratory
Coming Soon... Coming Soon...
Ego4D Visual Queries 3D 1 Jinjie Mai, KAUST
Abdullah Hamdi, KAUST
Chen Zhao, KAUST
Silvio Giancola, KAUST
Bernard Ghanem, KAUST
Coming Soon... Coming Soon...

Call for Abstracts

You are invited to submit extended abstracts to the first edition of joint egocentric vision workshop which will be held alongside CVPR 2024 in Seattle.

These abstracts represent existing or ongoing work and will not be published as part of any proceedings. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the Ego4D dataset within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):

Format

The length of the extended abstracts is 2-4 pages, including figures, tables, and references. We invite submissions of ongoing or already published work, as well as reports on demonstrations and prototypes. The 1st joint egocentric vision workshop gives opportunities for authors to present their work to the egocentric community to provoke discussion and feedback. Accepted work will be presented as either an oral presentation (either virtual or in-person) or as a poster presentation. The review will be single-blind, so there is no need to anonymize your work, but otherwise will follow the format of the CVPR submissions, information can be found here. Accepted abstracts will not be published as part of a proceedings, so can be uploaded to ArXiv etc. and the links will be provided on the workshop’s webpage. The submission will be managed with the CMT website.

Important Dates

Challenges Leaderboards Open Mar 2024
Challenges Leaderboards Close 30 May 2024
Challenges Technical Reports Deadline (on CMT) 5 June 2024 (23:59 PT)
Extended Abstract Deadline 10 May 2024 (23:59 PT)
Extended Abstract Notification to Authors 29 May 2024
Extended Abstracts ArXiv Deadline 12 June 2024
Workshop Date 17 June 2024

Program

All dates are local to Seattle's time, PST.
Workshop Location: Room Summit 428

Time Event
09:00-09:15 Welcome and Introductions
09:15-09:45 Invited Keynote 1: Jim Rehg, University of Illinois Urbana-Champaign, US
09:45-10:20 HoloAssist Challenges
10:20-11:20 Coffee Break and Poster Session
11:20-11:50 Invited Keynote 2: Diane Larlus, Naver Labs Europe and MIAI Grenoble, FR
11:50-12:40 EPIC-KITCHENS Challenges
12:40-13:40 Lunch Break
13:40-14:10 EgoVis 2022/2023 Distinguished paper Awards
14:10-14:40 Invited Keynote 3: Michael C. Frank & Bria Long, Stanford University, US
14:40-15:30 Project Aria Datasets & Challenges
15:30-16:00 Coffee Break
16:00-16:30 Invited Keynote 4: Fernando de La Torre, Carnegie Mellon University, US
16:30-17:40 Ego4D & Ego-Exo4D Challenges
17:40-18:00 Conclusion

Papers

Note to authors: Please hang your poster following the indicated poster numbers. Posters can be put up ONLY during the poster session time (10.20 - 11.20).

All workshop posters are in ARCH building 4E

Extended Abstracts

EgoVis Poster Number Title Authors arXiv Link
192 On the Application of Egocentric Computer Vision to Industrial Scenarios Vivek Prabhakar Chavan (Fraunhofer Institute); Oliver Heimann (Fraunhofer IPK); Jörg Krüger (TU-Berlin) link
193 Instance Tracking in 3D Scenes from Egocentric Videos Yunhan Zhao (University of California, Irvine); Haoyu Ma (University of California, Irvine); Shu Kong (Texas A&M University); Charless Fowlkes (UC Irvine) link
194 The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective Wenqi Jia (Georgia Institute of Technology) link
195 ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition Sanjoy Kundu (Auburn University); Sathyanarayanan N Aakur (Auburn University); Shubham Trehan (Auburn University) link
196 Object Aware Egocentric Online Action Detection Joungbin An (Yonsei University); YUNSU PARK (Yonsei University); Hyolim Kang (Yonsei University); Seon Joo Kim (Yonsei University) link
197 From Observation to Abstractions: Efficient In-Context Learning from Human Feedback and Visual Demonstrations for VLM Agents Gabriel Sarch (Carnegie Mellon University); Lawrence Jang (Carnegie Mellon University); Michael J Tarr (Carnegie Mellon University); William W Cohen (Google AI); Kenneth Marino (Google DeepMind); Katerina Fragkiadaki (Carnegie Mellon University) link
198 Learning Mobile Manipulation Skills via Autonomous Exploration Russell Mendonca (Carnegie Mellon University); Deepak Pathak (Carnegie Mellon University) Coming soon...
199 RMem: Restricted Memory Banks Improve Video Object Segmentation Junbao Zhou (UIUC); Ziqi Pang (UIUC); Yu-Xiong Wang (University of Illinois at Urbana-Champaign) link
200 ENIGMA-51: Towards a Fine-Grained Understanding of Human Behavior in Industrial Scenarios Francesco Ragusa (University of Catania); Rosario Leonardi (University of Catania); Michele Mazzamuto (University of Catania); Claudia Bonanno (Università degli Studi di Catania); Rosario Scavo (University of Catania); Antonino Furnari (University of Catania); Giovanni Maria Farinella (University of Catania) link
201 Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? Rosario Leonardi (University of Catania); Antonino Furnari (University of Catania); Francesco Ragusa (University of Catania); Giovanni Maria Farinella (University of Catania) link
202 Contrastive Language Video Time Pre-training Hengyue Liu (UC Riverside); Kyle Min (Intel Labs); Hector A Valdez (Intel Corporation); Subarna Tripathi (Intel Labs) link
203 Identification of Conversation Partners from Egocentric Video Tobias Dorszewski (Technical University of Denmark); Søren Fuglsang (University Hospital of Copenhagen); Jens Hjortkjær (DTU) link
204 Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos Mi Luo (University of Texas at Austin); Zihui Xue (The University of Texas at Austin); Alex Dimakis (UT Austin); Kristen Grauman (Facebook AI Research & UT Austin) Coming soon...
205 HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model Khoa HV Vo (University of Arkansas); Thinh Phan (University of Arkansas); Kashu Yamazaki (University of Arkansas); Minh Q Tran (University of Arkansas); Ngan Le (University of Arkansas) link
206 Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera Inpyo Song (SungKyunKwan University); MinJun Joo (iislab); Joonhyung Kwon (Korea Aerospace University); Jangwon Lee (SungKyunKwan University) link
207 X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization Anna Kukleva (MPII); Fadime Sener (University of Bonn); Edoardo Remelli (Meta); Bugra Tekin (Meta); Eric Sauser (Meta); Bernt Schiele (MPI Informatics); Shugao Ma (Meta Reality Labs) link
208 HandFormer: Utilizing 3D Hand Pose for Egocentric Action Recognition Md Salman Shamil (National University of Singapore); Dibyadip Chatterjee (National University of Singapore); Fadime Sener (University of Bonn); Shugao Ma (Meta Reality Labs); Angela Yao (National University of Singapore) link
210 Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos Sagnik Majumder (University of Texas at Austin); Ziad Al-Halah (University of Utah); Kristen Grauman (University of Texas at Austin) link

Invited CVPR Papers

EgoVis Poster Number Title Authors arXiv Link CVPR 2024 Presentation Details
178 PREGO: online mistake detection in PRocedural EGOcentric videos Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso link Thursday, 20 June, 17:15 to 18:45
179 Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza, Yen-Ling Kuo, Kaichun Mo, Luc Van Gool, Otmar Hilliges, Xi Wang link Thursday, 20 June, 17:15 to 18:45
180 EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik link Wednesday, 19 June, 10:30 to 12:00
181 SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman link Friday, 21 June, 17:00 to 18:30
182 Action Scene Graphs for Long-Form Understanding of Egocentric Videos Rodin Ivan, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella link Thursday, 20 June, 17:15 to 18:45
183 EgoGen: An Egocentric Synthetic Data Generator (CVPR HIGHLIGHT) Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang link Thursday, 20 June, 17:15 to 18:45
184 Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting (CVPR HIGHLIGHT) Taeho Kang, Youngki Lee link Wednesday 19 June, 10:30 to 12:00
185 Retrieval-Augmented Egocentric Video Captioning Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie link Thursday, 20 June, 10:30 to 12:00
190 3D Human Pose Perception from Egocentric Stereo Videos (CVPR HIGHLIGHT) Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt link Wednesday 19 June, 10:30 to 12:00
187 A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro, Giuseppe Averta link Thursday, 20 June, 17:15 to 18:45
188 Egocentric Full Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt link Wednesday 19 June, 10:30 to 12:00
189 Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato link Wednesday 19 June, 10:30 to 12:00
186 EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao link Friday, 21 June, 10:30 to 12:00
191 EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models (CVPR HIGHLIGHT) Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li, Huaping Liu, Yang Liu link Thursday, 20 June, 10:30 to 12:00
193 Instance Tracking in 3D Scenes from Egocentric Videos Yunhan Zhao (University of California, Irvine); Haoyu Ma (University of California, Irvine); Shu Kong (Texas A&M University); Charless Fowlkes (UC Irvine) link Thursday, 21 June, 10:30 to 12:00
207 X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization Anna Kukleva (MPII); Fadime Sener (University of Bonn); Edoardo Remelli (Meta); Bugra Tekin (Meta); Eric Sauser (Meta); Bernt Schiele (MPI Informatics); Shugao Ma (Meta Reality Labs) link Thursday, 21 June, 17:15 to 18:45
210 Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos Sagnik Majumder (University of Texas at Austin); Ziad Al-Halah (University of Utah); Kristen Grauman (University of Texas at Austin) link Thursday, 21 June, 17:15 to 18:45

Invited Speakers


Jim Rehg

University of Illinois Urbana-Champaign, USA


Diane Larlus

Naver Labs Europe and MIAI Grenoble


Fernando De la Torre

Carnegie Mellon University, USA


Michael C. Frank

Stanford University, USA


Bria Long

University of California, San Diego, USA

Workshop Organisers


Antonino Furnari

University of Catania


Angela Yao

National University of Singapore


Xin Wang

Microsoft Research


Tushar Nagarajan

FAIR, Meta


Huiyu Wang

FAIR, Meta


Jing Dong

Meta


Jakob Engel

FAIR, Meta


Siddhant Bansal

University of Bristol


Takuma Yagi

National Institute of Advanced Industrial Science and Technology

Co-organizing Advisors


Dima Damen

University of Bristol


Giovanni Maria Farinella

University of Catania


Kristen Grauman

UT Austin


Jitendra Malik

UC Berkeley


Richard Newcombe

Reality Labs Research


Marc Pollefeys

ETH Zurich


Yoichi Sato

University of Tokyo


David Crandall

Indiana University

Related Past Events

This workshop follows the footsteps of the following previous events:

EPIC-Kitchens and Ego4D Past Workshops:


Human Body, Hands, and Activities from Egocentric and Multi-view Cameras Past Workshops:

Project Aria Past Tutorials: