Zum Hauptinhalt springen

Showing 1–29 of 29 results for author: Grundmann, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.08714  [pdf, other

    cs.LG cs.AI

    PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

    Authors: Fei Deng, Qifei Wang, Wei Wei, Matthias Grundmann, Tingbo Hou

    Abstract: Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that reflect human preference. However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, renderi… ▽ More

    Submitted 27 March, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: CVPR 2024. Project page: https://fdeng18.github.io/prdp

  2. arXiv:2401.08864  [pdf, other

    eess.AS cs.LG cs.SD

    Binaural Angular Separation Network

    Authors: Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann

    Abstract: We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. The model is trained with simulated room impulse responses (RIRs) using omni-directional microphones without needing to collect real RIRs. By relying on specific angular regions and multiple room simulations, the model utilizes consistent time diffe… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  3. arXiv:2401.03078  [pdf, other

    eess.AS cs.LG cs.SD

    StreamVC: Real-Time Low-Latency Voice Conversion

    Authors: Yang Yang, Yury Kartynnik, Yunpeng Li, Jiuqiang Tang, Xing Li, George Sung, Matthias Grundmann

    Abstract: We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing,… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  4. arXiv:2309.10858  [pdf, other

    cs.CV

    On-device Real-time Custom Hand Gesture Recognition

    Authors: Esha Uboweja, David Tian, Qifei Wang, Yi-Chun Kuo, Joe Zou, Lu Wang, George Sung, Matthias Grundmann

    Abstract: Most existing hand gesture recognition (HGR) systems are limited to a predefined set of gestures. However, users and developers often want to recognize new, unseen gestures. This is challenging due to the vast diversity of all plausible hand shapes, e.g. it is impossible for developers to include all hand gestures in a predefined list. In this paper, we present a user-friendly framework that lets… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 5 pages, 6 figures; Accepted to ICCV Workshop on Computer Vision for Metaverse, Paris, France, 2023

  5. arXiv:2309.05782  [pdf, other

    cs.CV

    Blendshapes GHUM: Real-time Monocular Facial Blendshape Prediction

    Authors: Ivan Grishchenko, Geng Yan, Eduard Gabriel Bazavan, Andrei Zanfir, Nikolai Chinaev, Karthik Raveendran, Matthias Grundmann, Cristian Sminchisescu

    Abstract: We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars. Our main contributions are: i) an annotation-free offline method for obtaining blendshape coefficients from real-world human scans, ii) a lightweight real-time… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 4 pages, 3 figures

  6. arXiv:2307.08996  [pdf, other

    cs.CV

    Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond

    Authors: Yang Zhao, Tingbo Hou, Yu-Chuan Su, Xuhui Jia. Yandong Li, Matthias Grundmann

    Abstract: An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e.g., image enhancement, video communication, and taking portrait. Most of the advanced face restoration models can recover high-quality faces from low-quality ones but usually fail to faithfully generate realistic and high-frequency details that are favored by users. To achieve authentic… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: ICCV 2023

  7. arXiv:2307.02342  [pdf, other

    cs.LO cs.CR cs.DC

    Towards a Formal Verification of the Lightning Network with TLA+

    Authors: Matthias Grundmann, Hannes Hartenstein

    Abstract: Payment channel networks are an approach to improve the scalability of blockchain-based cryptocurrencies. Because payment channel networks are used for transfer of financial value, their security in the presence of adversarial participants should be verified formally. We formalize the protocol of the Lightning Network, a payment channel network built for Bitcoin, and show that the protocol fulfill… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  8. arXiv:2306.12511  [pdf, other

    cs.LG cs.CV

    Semi-Implicit Denoising Diffusion Models (SIDDMs)

    Authors: Yanwu Xu, Mingming Gong, Shaoan Xie, Wei Wei, Matthias Grundmann, Kayhan Batmanghelich, Tingbo Hou

    Abstract: Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) at… ▽ More

    Submitted 10 October, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  9. arXiv:2304.11267  [pdf, other

    cs.CV cs.LG eess.IV

    Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

    Authors: Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, Matthias Grundmann

    Abstract: The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, commo… ▽ More

    Submitted 16 June, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 4 pages (not including references), 2 figures, 2 tables. Accepted to Efficient Deep Learning for Computer Vision workshop 2023

  10. arXiv:2303.07486  [pdf, other

    eess.AS cs.LG cs.SD

    Guided Speech Enhancement Network

    Authors: Yang Yang, Shao-Fu Shih, Hakan Erdogan, Jamie Menjay Lin, Chehung Lee, Yunpeng Li, George Sung, Matthias Grundmann

    Abstract: High quality speech capture has been widely studied for both voice communication and human computer interface reasons. To improve the capture performance, we can often find multi-microphone speech enhancement techniques deployed on various devices. Multi-microphone speech enhancement problem is often decomposed into two decoupled steps: a beamformer that provides spatial filtering and a single-cha… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023

  11. arXiv:2208.11666  [pdf, other

    cs.CV cs.LG

    Efficient Heterogeneous Video Segmentation at the Edge

    Authors: Jamie Menjay Lin, Siargey Pisarchyk, Juhyun Lee, David Tian, Tingbo Hou, Karthik Raveendran, Raman Sarokin, George Sung, Trent Tolley, Matthias Grundmann

    Abstract: We introduce an efficient video segmentation system for resource-limited edge devices leveraging heterogeneous compute. Specifically, we design network models by searching across multiple dimensions of specifications for the neural architectures and operations on top of already light-weight backbones, targeting commercially available edge inference engines. We further analyze and optimize the hete… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: Published as a workshop paper at CVPRW CV4ARVR 2022

  12. arXiv:2206.11678  [pdf, other

    cs.CV

    BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation

    Authors: Ivan Grishchenko, Valentin Bazarevsky, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, Richard Yee, Karthik Raveendran, Matsvei Zhdanovich, Matthias Grundmann, Cristian Sminchisescu

    Abstract: We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference. BlazePose GHUM Holistic enables motion capture from a single RGB image including avatar control, fitness tracking and AR/VR effects. Our main contributions include i) a novel method for 3D ground truth data acquisition, i… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 4 pages, 4 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, New Orleans, LA, 2022

  13. arXiv:2111.00038  [pdf, other

    cs.CV

    On-device Real-time Hand Gesture Recognition

    Authors: George Sung, Kanstantsin Sokal, Esha Uboweja, Valentin Bazarevsky, Jonathan Baccash, Eduard Gabriel Bazavan, Chuo-Ling Chang, Matthias Grundmann

    Abstract: We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We cre… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

    Comments: 5 pages, 6 figures; ICCV Workshop on Computer Vision for Augmented and Virtual Reality, Montreal, Canada, 2021

  14. arXiv:2108.00815  [pdf, other

    cs.NI cs.CR

    Estimating the Peer Degree of Reachable Peers in the Bitcoin P2P Network

    Authors: Matthias Grundmann, Max Baumstark, Hannes Hartenstein

    Abstract: A recent spam wave of IP addresses in the Bitcoin P2P network allowed us to estimate the degree distribution of reachable peers in the network. The resulting distribution shows that about every second reachable peer runs with Bitcoin Core's default setting of a maximum of 125 concurrent connections and nearly all connection slots are taken. We validate this result and, in addition, use our observa… ▽ More

    Submitted 15 December, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

  15. arXiv:2102.12774  [pdf, other

    cs.CR cs.NI

    On the Estimation of the Number of Unreachable Peers in the Bitcoin P2P Network by Observation of Peer Announcements

    Authors: Matthias Grundmann, Hedwig Amberg, Hannes Hartenstein

    Abstract: Bitcoin is based on a P2P network that is used to propagate transactions and blocks. While the P2P network design intends to hide the topology of the P2P network, information about the topology is required to understand the network from a scientific point of view. Thus, there is a natural tension between the 'desire' for unobservability on the one hand, and for observability on the other hand. On… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

  16. arXiv:2012.09988  [pdf, other

    cs.CV

    Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations

    Authors: Adel Ahmadyan, Liangkai Zhang, Jianing Wei, Artsiom Ablavatski, Matthias Grundmann

    Abstract: 3D object detection has recently become popular due to many applications in robotics, augmented reality, autonomy, and image retrieval. We introduce the Objectron dataset to advance the state of the art in 3D object detection and foster new research and applications, such as 3D object tracking, view synthesis, and improved 3D shape representation. The dataset contains object-centric short videos w… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: Github repo see https://github.com/google-research-datasets/Objectron

  17. arXiv:2010.08316  [pdf, other

    cs.CR cs.DC

    Fundamental Properties of the Layer Below a Payment Channel Network (Extended Version)

    Authors: Matthias Grundmann, Hannes Hartenstein

    Abstract: Payment channel networks are a highly discussed approach for improving scalability of cryptocurrencies such as Bitcoin. As they allow processing transactions off-chain, payment channel networks are referred to as second layer technology, while the blockchain is the first layer. We uncouple payment channel networks from blockchains and look at them as first-class citizens. This brings up the questi… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: Extended version of short paper published at 4th International Workshop on Cryptocurrencies and Blockchain Technology - CBT 2020

  18. arXiv:2006.13194  [pdf, other

    cs.CV

    Instant 3D Object Tracking with Applications in Augmented Reality

    Authors: Adel Ahmadyan, Tingbo Hou, Jianing Wei, Liangkai Zhang, Artsiom Ablavatski, Matthias Grundmann

    Abstract: Tracking object poses in 3D is a crucial building block for Augmented Reality applications. We propose an instant motion tracking system that tracks an object's pose in space (represented by its 3D bounding box) in real-time on mobile devices. Our system does not require any prior sensory calibration or initialization to function. We employ a deep neural network to detect objects and estimate thei… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 4 pages, five figures, CVPR Fourth Workshop on Computer Vision for AR/VR

  19. arXiv:2006.10962  [pdf, other

    cs.CV

    Attention Mesh: High-fidelity Face Mesh Prediction in Real-time

    Authors: Ivan Grishchenko, Artsiom Ablavatski, Yury Kartynnik, Karthik Raveendran, Matthias Grundmann

    Abstract: We present Attention Mesh, a lightweight architecture for 3D face mesh prediction that uses attention to semantically meaningful regions. Our neural network is designed for real-time on-device inference and runs at over 50 FPS on a Pixel 2 phone. Our solution enables applications like AR makeup, eye tracking and AR puppeteering that rely on highly accurate landmarks for eye and lips regions. Our m… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: 4 pages, 5 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020

  20. arXiv:2006.10214  [pdf, other

    cs.CV

    MediaPipe Hands: On-device Real-time Hand Tracking

    Authors: Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, Matthias Grundmann

    Abstract: We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) a hand landmark model. It's implemented via MediaPipe, a framework for building cross-platform ML solutions. The proposed model and pipeline architecture demonstrates real-time inference speed on mobile GPUs a… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: 5 pages, 7 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020

  21. arXiv:2006.10204  [pdf, other

    cs.CV

    BlazePose: On-device Real-time Body Pose tracking

    Authors: Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, Matthias Grundmann

    Abstract: We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is tailored for real-time inference on mobile devices. During inference, the network produces 33 body keypoints for a single person and runs at over 30 frames per second on a Pixel 2 phone. This makes it particularly suited to real-time use cases like fitness tracking and sign language reco… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: 4 pages, 6 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, USA, 2020

  22. arXiv:2003.03522  [pdf, other

    cs.CV

    MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision

    Authors: Tingbo Hou, Adel Ahmadyan, Liangkai Zhang, Jianing Wei, Matthias Grundmann

    Abstract: In this paper, we address the problem of detecting unseen objects from RGB images and estimating their poses in 3D. We propose two mobile friendly networks: MobilePose-Base and MobilePose-Shape. The former is used when there is only pose supervision, and the latter is for the case when shape supervision is available, even a weak one. We revisit shape features used in previous methods, including se… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

  23. arXiv:1907.06796  [pdf, other

    cs.CV

    Instant Motion Tracking and Its Applications to Augmented Reality

    Authors: Jianing Wei, Genzhi Ye, Tyler Mullen, Matthias Grundmann, Adel Ahmadyan, Tingbo Hou

    Abstract: Augmented Reality (AR) brings immersive experiences to users. With recent advances in computer vision and mobile computing, AR has scaled across platforms, and has increased adoption in major products. One of the key challenges in enabling AR features is proper anchoring of the virtual content to the real world, a process referred to as tracking. In this paper, we present a system for motion track… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Long Beach, CA, 2019

    Journal ref: CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Long Beach, CA, 2019

  24. arXiv:1907.06724  [pdf, other

    cs.CV

    Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs

    Authors: Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, Matthias Grundmann

    Abstract: We present an end-to-end neural network-based model for inferring an approximate 3D mesh representation of a human face from single camera input for AR applications. The relatively dense mesh model of 468 vertices is well-suited for face-based AR effects. The proposed model demonstrates super-realtime inference speed on mobile GPUs (100-1000+ FPS, depending on the device and model variant) and a h… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: 4 pages, 4 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Long Beach, CA, USA, 2019

  25. arXiv:1907.05047  [pdf, other

    cs.CV

    BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs

    Authors: Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, Matthias Grundmann

    Abstract: We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, fac… ▽ More

    Submitted 14 July, 2019; v1 submitted 11 July, 2019; originally announced July 2019.

    Comments: 4 pages, 3 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Long Beach, CA, USA, 2019

  26. arXiv:1907.01989  [pdf, ps, other

    cs.LG cs.CV cs.DC stat.ML

    On-Device Neural Net Inference with Mobile GPUs

    Authors: Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann

    Abstract: On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Re… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: Computer Vision and Pattern Recognition Workshop: Efficient Deep Learning for Computer Vision 2019

  27. arXiv:1906.08172  [pdf, other

    cs.DC

    MediaPipe: A Framework for Building Perception Pipelines

    Authors: Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg, Matthias Grundmann

    Abstract: Building applications that perceive the world around them is challenging. A developer needs to (a) select and develop corresponding machine learning algorithms and models, (b) build a series of prototypes and demos, (c) balance resource consumption against the quality of the solutions, and finally (d) identify and mitigate problematic cases. The MediaPipe framework addresses all of these challenge… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

  28. Finding Temporally Consistent Occlusion Boundaries in Videos using Geometric Context

    Authors: S. Hussain Raza, Ahmad Humayun, Matthias Grundmann, David Anderson, Irfan Essa

    Abstract: We present an algorithm for finding temporally consistent occlusion boundaries in videos to support segmentation of dynamic scenes. We learn occlusion boundaries in a pairwise Markov random field (MRF) framework. We first estimate the probability of an spatio-temporal edge being an occlusion boundary by using appearance, flow, and geometric features. Next, we enforce occlusion boundary continuity… ▽ More

    Submitted 25 October, 2015; originally announced October 2015.

    Comments: Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on

  29. Geometric Context from Videos

    Authors: S. Hussain Raza, Matthias Grundmann, Irfan Essa

    Abstract: We present a novel algorithm for estimating the broad 3D geometric structure of outdoor video scenes. Leveraging spatio-temporal video segmentation, we decompose a dynamic scene captured by a video into geometric classes, based on predictions made by region-classifiers that are trained on appearance and motion features. By examining the homogeneity of the prediction, we combine predictions across… ▽ More

    Submitted 25 October, 2015; originally announced October 2015.

    Comments: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on