Skip to main content

Showing 1–35 of 35 results for author: Simon, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13038  [pdf, other

    cs.CV cs.LG

    Universal Facial Encoding of Codec Avatars from VR Headsets

    Authors: Shaojie Bai, Te-Li Wang, Chenghui Li, Akshay Venkatesh, Tomas Simon, Chen Cao, Gabriel Schwartz, Ryan Wrench, Jason Saragih, Yaser Sheikh, Shih-En Wei

    Abstract: Faithful real-time facial animation is essential for avatar-mediated telepresence in Virtual Reality (VR). To emulate authentic communication, avatar animation needs to be efficient and accurate: able to capture both extreme and subtle expressions within a few milliseconds to sustain the rhythm of natural conversations. The oblique and incomplete views of the face, variability in the donning of he… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024 (ACM Transactions on Graphics (TOG))

    Journal ref: ACM Trans. Graph. 43, 4, Article 93 (July 2024), 22 pages.

  2. arXiv:2405.02508  [pdf, other

    cs.CV cs.GR

    Rasterized Edge Gradients: Handling Discontinuities Differentiably

    Authors: Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, Jason Saragih

    Abstract: Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for… ▽ More

    Submitted 16 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  3. arXiv:2403.18853  [pdf, other

    physics.soc-ph cs.LG stat.AP

    Spatio-seasonal risk assessment of upward lightning at tall objects using meteorological reanalysis data

    Authors: Isabell Stucke, Deborah Morgenstern, Georg J. Mayr, Thorsten Simon, Achim Zeileis, Gerhard Diendorfer, Wolfgang Schulz, Hannes Pichler

    Abstract: This study investigates lightning at tall objects and evaluates the risk of upward lightning (UL) over the eastern Alps and its surrounding areas. While uncommon, UL poses a threat, especially to wind turbines, as the long-duration current of UL can cause significant damage. Current risk assessment methods overlook the impact of meteorological conditions, potentially underestimating UL risks. Ther… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  4. arXiv:2401.05334  [pdf, other

    cs.CV cs.GR

    URHand: Universal Relightable Hands

    Authors: Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhöfer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito

    Abstract: Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. Our model allows f… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Project Page https://frozenburning.github.io/projects/urhand/

  5. arXiv:2312.03704  [pdf, other

    cs.GR cs.CV

    Relightable Gaussian Codec Avatars

    Authors: Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam

    Abstract: The fidelity of relighting is bounded by both geometry and appearance representations. For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Ga… ▽ More

    Submitted 27 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: CVPR2024 (Oral) Website: https://shunsukesaito.github.io/rgca/

  6. arXiv:2310.17768  [pdf, other

    cs.CV

    A Dataset of Relighted 3D Interacting Hands

    Authors: Gyeongsik Moon, Shunsuke Saito, Weipeng Xu, Rohan Joshi, Julia Buffalini, Harley Bellan, Nicholas Rosen, Jesse Richardson, Mallorie Mize, Philippe de Bree, Tomas Simon, Bo Peng, Shubham Garg, Kevyn McPhail, Takaaki Shiratori

    Abstract: The two-hand interaction is one of the most challenging signals to analyze due to the self-similarity, complicated articulations, and occlusions of hands. Although several datasets have been proposed for the two-hand interaction analysis, all of them do not achieve 1) diverse and realistic image appearances and 2) diverse and large-scale groundtruth (GT) 3D poses at the same time. In this work, we… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023 (Datasets and Benchmarks Track)

  7. arXiv:2302.04868  [pdf, other

    cs.CV cs.GR

    MEGANE: Morphable Eyeglass and Avatar Network

    Authors: Junxuan Li, Shunsuke Saito, Tomas Simon, Stephen Lombardi, Hongdong Li, Jason Saragih

    Abstract: Eyeglasses play an important role in the perception of identity. Authentic virtual representations of faces can benefit greatly from their inclusion. However, modeling the geometric and appearance interactions of glasses and the face of virtual representations of humans is challenging. Glasses and faces affect each other's geometry at their contact points, and also induce appearance changes due to… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Project page: https://junxuan-li.github.io/megane/

  8. arXiv:2302.04866  [pdf, other

    cs.CV cs.GR

    RelightableHands: Efficient Neural Relighting of Articulated Hand Models

    Authors: Shun Iwase, Shunsuke Saito, Tomas Simon, Stephen Lombardi, Timur Bagautdinov, Rohan Joshi, Fabian Prada, Takaaki Shiratori, Yaser Sheikh, Jason Saragih

    Abstract: We present the first neural relighting approach for rendering high-fidelity personalized hands that can be animated in real-time under novel illumination. Our approach adopts a teacher-student framework, where the teacher learns appearance under a single point light from images captured in a light-stage, allowing us to synthesize hands in arbitrary illuminations but with heavy compute. Using image… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: 8 pages, 16 figures, Website: https://sh8.io/#/relightable_hands

  9. arXiv:2301.03360  [pdf, other

    stat.ML cs.LG

    Upward lightning at wind turbines: Risk assessment from larger-scale meteorology

    Authors: Isabell Stucke, Deborah Morgenstern, Thorsten Simon, Georg J. Mayr, Achim Zeileis, Gerhard Diendorfer, Wolfgang Schulz, Hannes Pichler

    Abstract: Upward lightning (UL) has become an increasingly important threat to wind turbines as ever more of them are being installed for renewably producing electricity. The taller the wind turbine the higher the risk that the type of lightning striking the man-made structure is UL. UL can be much more destructive than downward lightning due to its long lasting initial continuous current leading to a large… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: 24 pages, 8 figures

  10. arXiv:2207.11243  [pdf, other

    cs.CV cs.GR

    Multiface: A Dataset for Neural Face Rendering

    Authors: Cheng-hsin Wuu, Ningyuan Zheng, Scott Ardisson, Rohan Bali, Danielle Belko, Eric Brockmeyer, Lucas Evans, Timothy Godisart, Hyowon Ha, Xuhua Huang, Alexander Hypes, Taylor Koska, Steven Krenn, Stephen Lombardi, Xiaomin Luo, Kevyn McPhail, Laura Millerschoen, Michal Perdoch, Mark Pitts, Alexander Richard, Jason Saragih, Junko Saragih, Takaaki Shiratori, Tomas Simon, Matt Stewart , et al. (6 additional authors not shown)

    Abstract: Photorealistic avatars of human faces have come a long way in recent years, yet research along this area is limited by a lack of publicly available, high-quality datasets covering both, dense multi-view camera captures, and rich facial expressions of the captured subjects. In this work, we present Multiface, a new multi-view, high-resolution human face dataset collected from 13 identities at Reali… ▽ More

    Submitted 26 June, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

  11. arXiv:2207.09774  [pdf, other

    cs.CV

    Drivable Volumetric Avatars using Texel-Aligned Features

    Authors: Edoardo Remelli, Timur Bagautdinov, Shunsuke Saito, Tomas Simon, Chenglei Wu, Shih-En Wei, Kaiwen Guo, Zhe Cao, Fabian Prada, Jason Saragih, Yaser Sheikh

    Abstract: Photorealistic telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance that is indistinguishable from reality. In this work, we propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people. One challenge is driving an avatar while staying faithful to details and dynamics… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Journal ref: SIGGRAPH 2022 Conference Proceedings

  12. arXiv:2203.17019  [pdf, other

    eess.AS cs.LG cs.SD

    DeepFry: Identifying Vocal Fry Using Deep Neural Networks

    Authors: Bronya R. Chernyak, Talia Ben Simon, Yael Segal, Jeremy Steffman, Eleanor Chodroff, Jennifer S. Cole, Joseph Keshet

    Abstract: Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly f… ▽ More

    Submitted 26 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted to Interspeech 2022

  13. arXiv:2111.05849  [pdf, other

    cs.GR cs.CV

    Advances in Neural Rendering

    Authors: Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, Yifan Wang, Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi, Tomas Simon, Christian Theobalt, Matthias Niessner, Jonathan T. Barron, Gordon Wetzstein, Michael Zollhoefer, Vladislav Golyanik

    Abstract: Synthesizing photo-realistic images and videos is at the heart of computer graphics and has been the focus of decades of research. Traditionally, synthetic images of a scene are generated using rendering algorithms such as rasterization or ray tracing, which take specifically defined representations of geometry and material properties as input. Collectively, these inputs define the actual scene an… ▽ More

    Submitted 30 March, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

    Comments: 33 pages, 14 figures, 5 tables; State of the Art Report at EUROGRAPHICS 2022

  14. arXiv:2105.10441  [pdf, other

    cs.CV cs.AI cs.GR

    Driving-Signal Aware Full-Body Avatars

    Authors: Timur Bagautdinov, Chenglei Wu, Tomas Simon, Fabian Prada, Takaaki Shiratori, Shih-En Wei, Weipeng Xu, Yaser Sheikh, Jason Saragih

    Abstract: We present a learning-based method for building driving-signal aware full-body avatars. Our model is a conditional variational autoencoder that can be animated with incomplete driving signals, such as human pose and facial keypoints, and produces a high-quality representation of human geometry and view-dependent appearance. The core intuition behind our method is that better drivability and genera… ▽ More

    Submitted 25 June, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

  15. arXiv:2104.04638  [pdf, other

    cs.CV

    Pixel Codec Avatars

    Authors: Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, Yaser Sheikh

    Abstract: Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication in 3D over remote physical distances. In this work, we present the Pixel Codec Avatars (PiCA): a deep generative model of 3D human faces that achieves state of the art reconstruction performance while being computationally efficient and adaptive to th… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 Oral

  16. arXiv:2104.00683  [pdf, other

    cs.CV cs.LG

    SimPoE: Simulated Character Control for 3D Human Pose Estimation

    Authors: Ye Yuan, Shih-En Wei, Tomas Simon, Kris Kitani, Jason Saragih

    Abstract: Accurate estimation of 3D human motion from monocular video requires modeling both kinematics (body motion without physical forces) and dynamics (motion with physical forces). To demonstrate this, we present SimPoE, a Simulation-based approach for 3D human Pose Estimation, which integrates image-based kinematic inference and physics-based dynamics modeling. SimPoE learns a policy that takes as inp… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 (Oral). Project page: https://www.ye-yuan.com/simpoe/

  17. arXiv:2103.01954  [pdf, other

    cs.GR cs.CV

    Mixture of Volumetric Primitives for Efficient Neural Rendering

    Authors: Stephen Lombardi, Tomas Simon, Gabriel Schwartz, Michael Zollhoefer, Yaser Sheikh, Jason Saragih

    Abstract: Real-time rendering and animation of humans is a core function in games, movies, and telepresence applications. Existing methods have a number of drawbacks we aim to address with our work. Triangle meshes have difficulty modeling thin structures like hair, volumetric representations like Neural Volumes are too low-resolution given a reasonable memory budget, and high-resolution implicit representa… ▽ More

    Submitted 6 May, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: 13 pages; SIGGRAPH 2021

  18. arXiv:2101.02697  [pdf, other

    cs.CV

    PVA: Pixel-aligned Volumetric Avatars

    Authors: Amit Raj, Michael Zollhoefer, Tomas Simon, Jason Saragih, Shunsuke Saito, James Hays, Stephen Lombardi

    Abstract: Acquisition and rendering of photo-realistic human heads is a highly challenging research problem of particular importance for virtual telepresence. Currently, the highest quality is achieved by volumetric approaches trained in a person specific manner on multi-view data. These models better represent fine structure, such as hair, compared to simpler mesh-based models. Volumetric models typically… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

    Comments: Project page located at https://volumetric-avatars.github.io/

  19. arXiv:2012.09955  [pdf, other

    cs.CV cs.GR

    Learning Compositional Radiance Fields of Dynamic Human Heads

    Authors: Ziyan Wang, Timur Bagautdinov, Stephen Lombardi, Tomas Simon, Jason Saragih, Jessica Hodgins, Michael Zollhöfer

    Abstract: Photorealistic rendering of dynamic humans is an important ability for telepresence systems, virtual shopping, synthetic data generation, and more. Recently, neural rendering methods, which combine techniques from computer graphics and machine learning, have created high-fidelity models of humans and objects. Some of these methods do not produce results with high-enough fidelity for driveable huma… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  20. arXiv:2004.03805  [pdf, other

    cs.CV cs.GR

    State of the Art on Neural Rendering

    Authors: Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner, Rohit Pandey, Sean Fanello, Gordon Wetzstein, Jun-Yan Zhu, Christian Theobalt, Maneesh Agrawala, Eli Shechtman, Dan B Goldman, Michael Zollhöfer

    Abstract: Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: Eurographics 2020 survey paper

  21. arXiv:2004.00452  [pdf, other

    cs.CV cs.GR

    PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

    Authors: Shunsuke Saito, Tomas Simon, Jason Saragih, Hanbyul Joo

    Abstract: Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks. Although current approaches have demonstrated the potential in real world settings, they still fail to produce reconstructions with the level of detail often present in the input images. We argue that this limitation stems primarily form… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: project page: https://shunsukesaito.github.io/PIFuHD

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  22. arXiv:1909.13423  [pdf, other

    cs.CV cs.LG

    Single-Network Whole-Body Pose Estimation

    Authors: Gines Hidalgo, Yaadhav Raaj, Haroon Idrees, Donglai Xiang, Hanbyul Joo, Tomas Simon, Yaser Sheikh

    Abstract: We present the first single-network approach for 2D~whole-body pose estimation, which entails simultaneous localization of body, face, hands, and feet keypoints. Due to the bottom-up formulation, our method maintains constant real-time performance regardless of the number of people in the image. The network is trained in a single stage using multi-task learning, through an improved architecture wh… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Comments: ICCV 2019

  23. arXiv:1909.11784  [pdf, other

    stat.CO cs.LG stat.ME stat.ML

    bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)

    Authors: Nikolaus Umlauf, Nadja Klein, Thorsten Simon, Achim Zeileis

    Abstract: Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible model specifications are needed as big(ger) data become available, facilitated by more powerful computing infrastructure. (2) Full probabilistic modeling rather than predicting just means or expectations is crucial in many applications. (3) Interest in Bayesian… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: 48 pages, 12 figures

  24. Neural Volumes: Learning Dynamic Renderable Volumes from Images

    Authors: Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, Yaser Sheikh

    Abstract: Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity.… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to SIGGRAPH 2019

    Journal ref: ACM Transactions on Graphics (SIGGRAPH 2019) 38, 4, Article 65

  25. arXiv:1906.04158  [pdf, other

    cs.CV cs.AI

    Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in A Triadic Interaction

    Authors: Hanbyul Joo, Tomas Simon, Mina Cikara, Yaser Sheikh

    Abstract: We present a new research task and a dataset to understand human social interactions via computational methods, to ultimately endow machines with the ability to encode and decode a broad channel of social signals humans use. This research direction is essential to make a machine that genuinely communicates with humans, which we call Social Artificial Intelligence. We first formulate the "social si… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: CVPR 2019

  26. arXiv:1904.10324  [pdf, other

    cs.CV

    VITAMIN-E: VIsual Tracking And MappINg with Extremely Dense Feature Points

    Authors: Masashi Yokozuka, Shuji Oishi, Thompson Simon, Atsuhiko Banno

    Abstract: In this paper, we propose a novel indirect monocular SLAM algorithm called "VITAMIN-E," which is highly accurate and robust as a result of tracking extremely dense feature points. Typical indirect methods have difficulty in reconstructing dense geometry because of their careful feature point selection for accurate matching. Unlike conventional methods, the proposed method processes an enormous num… ▽ More

    Submitted 16 December, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

  27. arXiv:1904.10037  [pdf, other

    cs.CV cs.LG

    LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds

    Authors: Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabás Póczos, Yaser Sheikh

    Abstract: We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds. As input, we take a sequence of point clouds to be registered as well as an artist-rigged mesh, i.e. a template mesh equipped with a linear-blend skinning (LBS) deformation space parameterized by a skeleton hierarchy. As output, we learn an LBS-based autoencoder that produces registered… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

    Comments: In the Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

  28. arXiv:1812.08008  [pdf, other

    cs.CV

    OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

    Authors: Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh

    Abstract: Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the… ▽ More

    Submitted 30 May, 2019; v1 submitted 18 December, 2018; originally announced December 2018.

    Comments: Journal version of arXiv:1611.08050, with better accuracy and faster speed, release a new foot keypoint dataset: https://cmu-perceptual-computing-lab.github.io/foot_keypoint_dataset/

  29. An Empirical Evaluation of Allgatherv on Multi-GPU Systems

    Authors: Thomas B. Rolinger, Tyler A. Simon, Christopher D. Krieger

    Abstract: Applications for deep learning and big data analytics have compute and memory requirements that exceed the limits of a single GPU. However, effectively scaling out an application to multiple GPUs is challenging due to the complexities of communication between the GPUs, particularly for collective communication with irregular message sizes. In this work, we provide a performance evaluation of the A… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

  30. Parallel Sparse Tensor Decomposition in Chapel

    Authors: Thomas B. Rolinger, Tyler A. Simon, Christopher D. Krieger

    Abstract: In big-data analytics, using tensor decomposition to extract patterns from large, sparse multivariate data is a popular technique. Many challenges exist for designing parallel, high performance tensor decomposition algorithms due to irregular data accesses and the growing size of tensors that are processed. There have been many efforts at implementing shared-memory algorithms for tensor decomposit… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 5th Annual Chapel Implementers and Users Workshop (CHIUW 2018)

  31. Deep Appearance Models for Face Rendering

    Authors: Stephen Lombardi, Jason Saragih, Tomas Simon, Yaser Sheikh

    Abstract: We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop a data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup. Vertex positions and view-specific textures are modeled using a deep variational autoencoder that captures complex nonlinear effects while producing a… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

    Comments: Accepted to SIGGRAPH 2018

    Journal ref: ACM Transactions on Graphics (SIGGRAPH 2018) 37, 4, Article 68

  32. arXiv:1801.01615  [pdf, other

    cs.CV

    Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies

    Authors: Hanbyul Joo, Tomas Simon, Yaser Sheikh

    Abstract: We present a unified deformation model for the markerless capture of multiple scales of human movement, including facial expressions, body motion, and hand gestures. An initial model is generated by locally stitching together models of the individual parts of the human body, which we refer to as the "Frankenstein" model. This model enables the full expression of part movements, including face and… ▽ More

    Submitted 4 January, 2018; originally announced January 2018.

  33. arXiv:1704.07809  [pdf, other

    cs.CV

    Hand Keypoint Detection in Single Images using Multiview Bootstrapping

    Authors: Tomas Simon, Hanbyul Joo, Iain Matthews, Yaser Sheikh

    Abstract: We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand. We call this procedure multiview bootstrapping: first, an initial keypoint detector is used to produce noisy labels in multiple views of the hand. The noisy detections are then triangulated in 3D using multiview geometry or marked as outlie… ▽ More

    Submitted 25 April, 2017; originally announced April 2017.

    Comments: CVPR 2017

  34. arXiv:1612.03153  [pdf, other

    cs.CV

    Panoptic Studio: A Massively Multiview System for Social Interaction Capture

    Authors: Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, Yaser Sheikh

    Abstract: We present an approach to capture the 3D motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functional and frequent; (2) subtle motion needs to be measured over a space large enough to host a social group; (3) human appearance and configuration variation is immense; and (4) attaching markers to the body may prime… ▽ More

    Submitted 9 December, 2016; originally announced December 2016.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

  35. arXiv:1611.08050  [pdf, other

    cs.CV

    Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

    Authors: Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh

    Abstract: We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance… ▽ More

    Submitted 13 April, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: Accepted as CVPR 2017 Oral. Video result: https://youtu.be/pW6nZXeWlGM