Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Tekin, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.19811  [pdf, other

    cs.CV

    X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

    Authors: Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma

    Abstract: Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition. However, the adaptation of these models to egocentric videos has been largely unexplored. To address this gap, we propose a simple yet effective cross-modal adaptation framework, which we call X-MIC. Using a video adapter, o… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  2. arXiv:2403.17827  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

    Authors: Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin

    Abstract: Generating natural hand-object interactions in 3D is challenging as the resulting hand and object motions are expected to be physically plausible and semantically meaningful. Furthermore, generalization to unseen objects is hindered by the limited scale of available hand-object interaction datasets. We propose DiffH2O, a novel method to synthesize realistic, one or two-handed object interactions f… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Project Page: https://diffh2o.github.io/

  3. arXiv:2311.18809  [pdf, other

    cs.CV cs.RO

    FoundPose: Unseen Object Pose Estimation with Foundation Features

    Authors: Evin Pınar Örnek, Yann Labbé, Bugra Tekin, Lingni Ma, Cem Keskin, Christian Forster, Tomas Hodan

    Abstract: We propose FoundPose, a model-based method for 6D pose estimation of unseen objects from a single RGB image. The method can quickly onboard new objects using their 3D models without requiring any object- or task-specific training. In contrast, existing methods typically pre-train on large-scale, task-specific datasets in order to generalize to new objects and to bridge the image-to-model domain ga… ▽ More

    Submitted 19 July, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  4. arXiv:2309.17024  [pdf, other

    cs.CV

    HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

    Authors: Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, Marc Pollefeys

    Abstract: Building an interactive AI assistant that can perceive, reason, and collaborate with humans in the real world has been a long-standing pursuit in the AI community. This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. As a first step in this direction, we introduce HoloAssist, a large-scale e… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  5. arXiv:2204.12223  [pdf, other

    cs.CV

    Context-Aware Sequence Alignment using 4D Skeletal Augmentation

    Authors: Taein Kwon, Bugra Tekin, Siyu Tang, Marc Pollefeys

    Abstract: Temporal alignment of fine-grained human actions in videos is important for numerous applications in computer vision, robotics, and mixed reality. State-of-the-art methods directly learn image-based embedding space by leveraging powerful deep convolutional neural networks. While being straightforward, their results are far from satisfactory, the aligned videos exhibit severe temporal discontinuity… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: Project page: http://www.taeinkwon.com/projects/casa. Accepted to CVPR 2022 Oral

  6. arXiv:2111.09301  [pdf, other

    cs.CV cs.AI

    Learning to Align Sequential Actions in the Wild

    Authors: Weizhe Liu, Bugra Tekin, Huseyin Coskun, Vibhav Vineet, Pascal Fua, Marc Pollefeys

    Abstract: State-of-the-art methods for self-supervised sequential action alignment rely on deep networks that find correspondences across videos in time. They either learn frame-to-frame mapping across sequences, which does not leverage temporal information, or assume monotonic alignment between each video pair, which ignores variations in the order of actions. As such, these methods are not able to deal wi… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  7. arXiv:2109.04409  [pdf, other

    cs.CV

    Reconstructing and grounding narrated instructional videos in 3D

    Authors: Dimitri Zhukov, Ignacio Rocco, Ivan Laptev, Josef Sivic, Johannes L. Schönberger, Bugra Tekin, Marc Pollefeys

    Abstract: Narrated instructional videos often show and describe manipulations of similar objects, e.g., repairing a particular model of a car or laptop. In this work we aim to reconstruct such objects and to localize associated narrations in 3D. Contrary to the standard scenario of instance-level 3D reconstruction, where identical objects or scenes are present in all views, objects in different instructiona… ▽ More

    Submitted 10 September, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

  8. arXiv:2104.11181  [pdf, other

    cs.CV

    H2O: Two Hands Manipulating Objects for First Person Interaction Recognition

    Authors: Taein Kwon, Bugra Tekin, Jan Stuhmer, Federica Bogo, Marc Pollefeys

    Abstract: We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects. To this end, we propose a method to create a unified dataset for egocentric 3D interaction recognition. Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each fram… ▽ More

    Submitted 24 August, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: Accepted to ICCV 2021

  9. arXiv:2008.11239  [pdf, other

    cs.CV

    HoloLens 2 Research Mode as a Tool for Computer Vision Research

    Authors: Dorin Ungureanu, Federica Bogo, Silvano Galliani, Pooja Sama, Xin Duan, Casey Meekhof, Jan Stühmer, Thomas J. Cashman, Bugra Tekin, Johannes L. Schönberger, Pawel Olszta, Marc Pollefeys

    Abstract: Mixed reality headsets, such as the Microsoft HoloLens 2, are powerful sensing devices with integrated compute capabilities, which makes it an ideal platform for computer vision research. In this technical report, we present HoloLens 2 Research Mode, an API and a set of tools enabling access to the raw sensor streams. We provide an overview of the API and explain how it can be used to build mixed… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

  10. arXiv:2004.13449  [pdf, other

    cs.CV

    Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction

    Authors: Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, Cordelia Schmid

    Abstract: Modeling hand-object manipulations is essential for understanding how humans interact with their environment. While of practical importance, estimating the pose of hands and objects during interactions is challenging due to the large mutual occlusions that occur during manipulation. Recent efforts have been directed towards fully-supervised methods that require large amounts of labeled training sa… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: CVPR 2020. See the project webpage at https://hassony2.github.io/handobjectconsist.html

  11. Domain-Specific Priors and Meta Learning for Few-Shot First-Person Action Recognition

    Authors: Huseyin Coskun, Zeeshan Zia, Bugra Tekin, Federica Bogo, Nassir Navab, Federico Tombari, Harpreet Sawhney

    Abstract: The lack of large-scale real datasets with annotations makes transfer learning a necessity for video activity understanding. We aim to develop an effective method for few-shot transfer learning for first-person action classification. We leverage independently trained local visual cues to learn representations that can be transferred from a source domain, which provides primitive action labels, to… ▽ More

    Submitted 7 December, 2021; v1 submitted 22 July, 2019; originally announced July 2019.

    Comments: Paper has been accepted in Transactions on Pattern Analysis and Machine Intelligence

    Journal ref: year = {5555}, volume = {}, number = {01}, issn = {1939-3539}, pages = {1-1},

  12. arXiv:1904.05349  [pdf, other

    cs.CV

    H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions

    Authors: Bugra Tekin, Federica Bogo, Marc Pollefeys

    Abstract: We present a unified framework for understanding 3D hand and object interactions in raw image sequences from egocentric RGB cameras. Given a single RGB image, our model jointly estimates the 3D hand and object poses, models their interactions, and recognizes the object and action classes with a single feed-forward pass through a neural network. We propose a single architecture that does not rely o… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: CVPR 2019 (Oral)

  13. arXiv:1711.08848  [pdf, other

    cs.CV

    Real-Time Seamless Single Shot 6D Object Pose Prediction

    Authors: Bugra Tekin, Sudipta N. Sinha, Pascal Fua

    Abstract: We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. Unlike a recently proposed single-shot technique for this task (Kehl et al., ICCV'17) that only predicts an approximate 6D pose that must then be refined, ours is accurate enough not to require additional pos… ▽ More

    Submitted 7 December, 2018; v1 submitted 23 November, 2017; originally announced November 2017.

    Comments: CVPR 2018

  14. arXiv:1611.05708  [pdf, other

    cs.CV

    Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

    Authors: Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, Pascal Fua

    Abstract: Most recent approaches to monocular 3D human pose estimation rely on Deep Learning. They typically involve regressing from an image to either 3D joint coordinates directly or 2D joint locations from which 3D coordinates are inferred. Both approaches have their strengths and weaknesses and we therefore propose a novel architecture designed to deliver the best of both worlds by performing both simul… ▽ More

    Submitted 10 April, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

  15. arXiv:1605.05180  [pdf, other

    cs.CV

    Structured Prediction of 3D Human Pose with Deep Neural Networks

    Authors: Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, Pascal Fua

    Abstract: Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from image to 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Lear… ▽ More

    Submitted 17 May, 2016; originally announced May 2016.

  16. arXiv:1511.06692  [pdf, other

    cs.CV

    Direct Prediction of 3D Body Poses from Motion Compensated Sequences

    Authors: Bugra Tekin, Artem Rozantsev, Vincent Lepetit, Pascal Fua

    Abstract: We propose an efficient approach to exploiting motion information from consecutive frames of a video sequence to recover the 3D pose of people. Previous approaches typically compute candidate poses in individual frames and then link them in a post-processing step to resolve ambiguities. By contrast, we directly regress from a spatio-temporal volume of bounding boxes to a 3D pose in the central fra… ▽ More

    Submitted 2 September, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: Published in CVPR 2016. supersedes arXiv:1504.08200

  17. arXiv:1504.08200   

    cs.CV

    Predicting People's 3D Poses from Short Sequences

    Authors: Bugra Tekin, Xiaolu Sun, Xinchao Wang, Vincent Lepetit, Pascal Fua

    Abstract: We propose an efficient approach to exploiting motion information from consecutive frames of a video sequence to recover the 3D pose of people. Instead of computing candidate poses in individual frames and then linking them, as is often done, we regress directly from a spatio-temporal block of frames to a 3D pose in the central one. We will demonstrate that this approach allows us to effectively o… ▽ More

    Submitted 23 November, 2015; v1 submitted 30 April, 2015; originally announced April 2015.

    Comments: superseded by arXiv:1511.06692