-
3D Reconstruction with Spatial Memory
Authors:
Hengyi Wang,
Lourdes Agapito
Abstract:
We present Spann3R, a novel approach for dense 3D reconstruction from ordered or unordered image collections. Built on the DUSt3R paradigm, Spann3R uses a transformer-based architecture to directly regress pointmaps from images without any prior knowledge of the scene or camera parameters. Unlike DUSt3R, which predicts per image-pair pointmaps each expressed in its local coordinate frame, Spann3R…
▽ More
We present Spann3R, a novel approach for dense 3D reconstruction from ordered or unordered image collections. Built on the DUSt3R paradigm, Spann3R uses a transformer-based architecture to directly regress pointmaps from images without any prior knowledge of the scene or camera parameters. Unlike DUSt3R, which predicts per image-pair pointmaps each expressed in its local coordinate frame, Spann3R can predict per-image pointmaps expressed in a global coordinate system, thus eliminating the need for optimization-based global alignment. The key idea of Spann3R is to manage an external spatial memory that learns to keep track of all previous relevant 3D information. Spann3R then queries this spatial memory to predict the 3D structure of the next frame in a global coordinate system. Taking advantage of DUSt3R's pre-trained weights, and further fine-tuning on a subset of datasets, Spann3R shows competitive performance and generalization ability on various unseen datasets and can process ordered image collections in real time. Project page: \url{https://hengyiwang.github.io/projects/spanner}
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation
Authors:
Mirgahney Mohamed,
Harry Jake Cunningham,
Marc P. Deisenroth,
Lourdes Agapito
Abstract:
Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expre…
▽ More
Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expressiveness. However, generated sequences still suffer from motion incoherence, and are limited to short duration, and simpler motion and take considerable time during inference. To address these limitations, we propose \textit{RecMoDiffuse: Recurrent Flow Diffusion}, a new recurrent diffusion formulation for temporal modelling. Unlike previous work, which applies diffusion to the whole sequence without any temporal dependency, an approach that inherently makes temporal consistency hard to achieve. Our method explicitly enforces temporal constraints with the means of normalizing flow models in the diffusion process and thereby extends diffusion to the temporal dimension. We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion. Our experiments show that RecMoDiffuse achieves comparable results with state-of-the-art methods while generating coherent motion sequences and reducing the computational overhead in the inference stage.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
NPGA: Neural Parametric Gaussian Avatars
Authors:
Simon Giebenhain,
Tobias Kirschstein,
Martin Rünz,
Lourdes Agapito,
Matthias Nießner
Abstract:
The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven appro…
▽ More
The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian Splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. To increase the representational capacity of our avatars, we augment the canonical Gaussian point cloud using per-primitive latent features which govern its dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior
Authors:
David Svitov,
Pietro Morerio,
Lourdes Agapito,
Alessio Del Bue
Abstract:
We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to app…
▽ More
We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded. We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Authors:
Wonbong Jang,
Lourdes Agapito
Abstract:
We propose NViST, a transformer-based model for efficient and generalizable novel-view synthesis from a single image for real-world scenes. In contrast to many methods that are trained on synthetic data, object-centred scenarios, or in a category-specific manner, NViST is trained on MVImgNet, a large-scale dataset of casually-captured real-world videos of hundreds of object categories with diverse…
▽ More
We propose NViST, a transformer-based model for efficient and generalizable novel-view synthesis from a single image for real-world scenes. In contrast to many methods that are trained on synthetic data, object-centred scenarios, or in a category-specific manner, NViST is trained on MVImgNet, a large-scale dataset of casually-captured real-world videos of hundreds of object categories with diverse backgrounds. NViST transforms image inputs directly into a radiance field, conditioned on camera parameters via adaptive layer normalisation. In practice, NViST exploits fine-tuned masked autoencoder (MAE) features and translates them to 3D output tokens via cross-attention, while addressing occlusions with self-attention. To move away from object-centred datasets and enable full scene synthesis, NViST adopts a 6-DOF camera pose model and only requires relative pose, dropping the need for canonicalization of the training data, which removes a substantial barrier to it being used on casually captured datasets. We show results on unseen objects and categories from MVImgNet and even generalization to casual phone captures. We conduct qualitative and quantitative evaluations on MVImgNet and ShapeNet to show that our model represents a step forward towards enabling true in-the-wild generalizable novel-view synthesis from a single image. Project webpage: https://wbjang.github.io/nvist_webpage.
△ Less
Submitted 1 April, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Authors:
Simon Giebenhain,
Tobias Kirschstein,
Markos Georgopoulos,
Martin Rünz,
Lourdes Agapito,
Matthias Nießner
Abstract:
We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos. To this end, we propose a latent appearance space that parameterizes a texture field on top of a neural parametric model. We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry code…
▽ More
We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos. To this end, we propose a latent appearance space that parameterizes a texture field on top of a neural parametric model. We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry codes during inverse rendering. To increase the representational capacity of our expression space, we augment our backward deformation field with hyper-dimensions, thus improving color and geometry representation in topologically challenging expressions. Using MonoNPHM as a learned prior, we approach the task of 3D head reconstruction using signed distance field based volumetric rendering. By numerically inverting our backward deformation field, we incorporated a landmark loss using facial anchor points that are closely tied to our canonical geometry representation. To evaluate the task of dynamic face reconstruction from monocular RGB videos we record 20 challenging Kinect sequences under casual conditions. MonoNPHM outperforms all baselines with a significant margin, and makes an important step towards easily accessible neural parametric face models through RGB tracking.
△ Less
Submitted 29 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video
Authors:
Hengyi Wang,
Jingwen Wang,
Lourdes Agapito
Abstract:
Neural rendering has demonstrated remarkable success in dynamic scene reconstruction. Thanks to the expressiveness of neural representations, prior works can accurately capture the motion and achieve high-fidelity reconstruction of the target object. Despite this, real-world video scenarios often feature large unobserved regions where neural representations struggle to achieve realistic completion…
▽ More
Neural rendering has demonstrated remarkable success in dynamic scene reconstruction. Thanks to the expressiveness of neural representations, prior works can accurately capture the motion and achieve high-fidelity reconstruction of the target object. Despite this, real-world video scenarios often feature large unobserved regions where neural representations struggle to achieve realistic completion. To tackle this challenge, we introduce MorpheuS, a framework for dynamic 360° surface reconstruction from a casually captured RGB-D video. Our approach models the target scene as a canonical field that encodes its geometry and appearance, in conjunction with a deformation field that warps points from the current frame to the canonical space. We leverage a view-dependent diffusion prior and distill knowledge from it to achieve realistic completion of unobserved regions. Experimental results on various real-world and synthetic datasets show that our method can achieve high-fidelity 360° surface reconstruction of a deformable object from a monocular RGB-D video.
△ Less
Submitted 4 April, 2024; v1 submitted 1 December, 2023;
originally announced December 2023.
-
DynamicSurf: Dynamic Neural RGB-D Surface Reconstruction with an Optimizable Feature Grid
Authors:
Mirgahney Mohamed,
Lourdes Agapito
Abstract:
We propose DynamicSurf, a model-free neural implicit surface reconstruction method for high-fidelity 3D modelling of non-rigid surfaces from monocular RGB-D video. To cope with the lack of multi-view cues in monocular sequences of deforming surfaces, one of the most challenging settings for 3D reconstruction, DynamicSurf exploits depth, surface normals, and RGB losses to improve reconstruction fid…
▽ More
We propose DynamicSurf, a model-free neural implicit surface reconstruction method for high-fidelity 3D modelling of non-rigid surfaces from monocular RGB-D video. To cope with the lack of multi-view cues in monocular sequences of deforming surfaces, one of the most challenging settings for 3D reconstruction, DynamicSurf exploits depth, surface normals, and RGB losses to improve reconstruction fidelity and optimisation time. DynamicSurf learns a neural deformation field that maps a canonical representation of the surface geometry to the current frame. We depart from current neural non-rigid surface reconstruction models by designing the canonical representation as a learned feature grid which leads to faster and more accurate surface reconstruction than competing approaches that use a single MLP. We demonstrate DynamicSurf on public datasets and show that it can optimize sequences of varying frames with $6\times$ speedup over pure MLP-based approaches while achieving comparable results to the state-of-the-art methods. Project is available at https://mirgahney.github.io//DynamicSurf.io/.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Task-guided Domain Gap Reduction for Monocular Depth Prediction in Endoscopy
Authors:
Anita Rau,
Binod Bhattarai,
Lourdes Agapito,
Danail Stoyanov
Abstract:
Colorectal cancer remains one of the deadliest cancers in the world. In recent years computer-aided methods have aimed to enhance cancer screening and improve the quality and availability of colonoscopies by automatizing sub-tasks. One such task is predicting depth from monocular video frames, which can assist endoscopic navigation. As ground truth depth from standard in-vivo colonoscopy remains u…
▽ More
Colorectal cancer remains one of the deadliest cancers in the world. In recent years computer-aided methods have aimed to enhance cancer screening and improve the quality and availability of colonoscopies by automatizing sub-tasks. One such task is predicting depth from monocular video frames, which can assist endoscopic navigation. As ground truth depth from standard in-vivo colonoscopy remains unobtainable due to hardware constraints, two approaches have aimed to circumvent the need for real training data: supervised methods trained on labeled synthetic data and self-supervised models trained on unlabeled real data. However, self-supervised methods depend on unreliable loss functions that struggle with edges, self-occlusion, and lighting inconsistency. Methods trained on synthetic data can provide accurate depth for synthetic geometries but do not use any geometric supervisory signal from real data and overfit to synthetic anatomies and properties. This work proposes a novel approach to leverage labeled synthetic and unlabeled real data. While previous domain adaptation methods indiscriminately enforce the distributions of both input data modalities to coincide, we focus on the end task, depth prediction, and translate only essential information between the input domains. Our approach results in more resilient and accurate depth maps of real colonoscopy sequences.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
Authors:
Mel Vecerik,
Carl Doersch,
Yi Yang,
Todor Davchev,
Yusuf Aytar,
Guangyao Zhou,
Raia Hadsell,
Lourdes Agapito,
Jon Scholz
Abstract:
For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster a…
▽ More
For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.
△ Less
Submitted 31 August, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation
Authors:
Jingwen Wang,
Juan Tarrio,
Lourdes Agapito,
Pablo F. Alcantarilla,
Alexander Vakhitov
Abstract:
The availability of real-time semantics greatly improves the core geometric functionality of SLAM systems, enabling numerous robotic and AR/VR applications. We present a new methodology for real-time semantic mapping from RGB-D sequences that combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. When segmenting a new frame we perform latent feature re-proj…
▽ More
The availability of real-time semantics greatly improves the core geometric functionality of SLAM systems, enabling numerous robotic and AR/VR applications. We present a new methodology for real-time semantic mapping from RGB-D sequences that combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. When segmenting a new frame we perform latent feature re-projection from previous frames based on differentiable rendering. Fusing re-projected feature maps from previous frames with current-frame features greatly improves image segmentation quality, compared to a baseline that processes images independently. For 3D map processing, we propose a novel geometric quasi-planar over-segmentation method that groups 3D map elements likely to belong to the same semantic classes, relying on surface normals. We also describe a novel neural network design for lightweight semantic map post-processing. Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems and matches the performance of 3D convolutional networks on three real indoor datasets, while working in real-time. Moreover, it shows better cross-sensor generalization abilities compared to 3D CNNs, enabling training and inference with different depth sensors. Code and data will be released on project page: http://jingwenwang95.github.io/SeMLaPS
△ Less
Submitted 13 October, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion
Authors:
Mustafa Işık,
Martin Rünz,
Markos Georgopoulos,
Taras Khakhulin,
Jonathan Starck,
Lourdes Agapito,
Matthias Nießner
Abstract:
Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints.…
▽ More
Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints. Our novel representation acts as a dynamic video encoding that captures fine details at high compression rates by factorizing space-time into a temporal matrix-vector decomposition. This allows us to obtain temporally coherent reconstructions of human actors for long sequences, while representing high-resolution details even in the context of challenging motion. While most research focuses on synthesizing at resolutions of 4MP or lower, we address the challenge of operating at 12MP. To this end, we introduce ActorsHQ, a novel multi-view dataset that provides 12MP footage from 160 cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions. We demonstrate challenges that emerge from using such high-resolution data and show that our newly introduced HumanRF effectively leverages this data, making a significant step towards production-level quality novel view synthesis.
△ Less
Submitted 11 May, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM
Authors:
Hengyi Wang,
Jingwen Wang,
Lourdes Agapito
Abstract:
We present Co-SLAM, a neural RGB-D SLAM system based on a hybrid representation, that performs robust camera tracking and high-fidelity surface reconstruction in real time. Co-SLAM represents the scene as a multi-resolution hash-grid to exploit its high convergence speed and ability to represent high-frequency local features. In addition, Co-SLAM incorporates one-blob encoding, to encourage surfac…
▽ More
We present Co-SLAM, a neural RGB-D SLAM system based on a hybrid representation, that performs robust camera tracking and high-fidelity surface reconstruction in real time. Co-SLAM represents the scene as a multi-resolution hash-grid to exploit its high convergence speed and ability to represent high-frequency local features. In addition, Co-SLAM incorporates one-blob encoding, to encourage surface coherence and completion in unobserved areas. This joint parametric-coordinate encoding enables real-time and robust performance by bringing the best of both worlds: fast convergence and surface hole filling. Moreover, our ray sampling strategy allows Co-SLAM to perform global bundle adjustment over all keyframes instead of requiring keyframe selection to maintain a small number of active keyframes as competing neural SLAM approaches do. Experimental results show that Co-SLAM runs at 10-17Hz and achieves state-of-the-art scene reconstruction results, and competitive tracking performance in various datasets and benchmarks (ScanNet, TUM, Replica, Synthetic RGBD). Project page: https://hengyiwang.github.io/projects/CoSLAM
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Learning Neural Parametric Head Models
Authors:
Simon Giebenhain,
Tobias Kirschstein,
Markos Georgopoulos,
Martin Rünz,
Lourdes Agapito,
Matthias Nießner
Abstract:
We propose a novel 3D morphable model for complete human heads based on hybrid neural fields. At the core of our model lies a neural parametric representation that disentangles identity and expressions in disjoint latent spaces. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF), and model facial expressions with a neural deformation field. In additio…
▽ More
We propose a novel 3D morphable model for complete human heads based on hybrid neural fields. At the core of our model lies a neural parametric representation that disentangles identity and expressions in disjoint latent spaces. To this end, we capture a person's identity in a canonical space as a signed distance field (SDF), and model facial expressions with a neural deformation field. In addition, our representation achieves high-fidelity local detail by introducing an ensemble of local fields centered around facial anchor points. To facilitate generalization, we train our model on a newly-captured dataset of over 5200 head scans from 255 different identities using a custom high-end 3D scanning setup. Our dataset significantly exceeds comparable existing datasets, both with respect to quality and completeness of geometry, averaging around 3.5M mesh faces per scan. Finally, we demonstrate that our approach outperforms state-of-the-art methods in terms of fitting error and reconstruction quality.
△ Less
Submitted 14 April, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
GNPM: Geometric-Aware Neural Parametric Models
Authors:
Mirgahney Mohamed,
Lourdes Agapito
Abstract:
We propose Geometric Neural Parametric Models (GNPM), a learned parametric model that takes into account the local structure of data to learn disentangled shape and pose latent spaces of 4D dynamics, using a geometric-aware architecture on point clouds. Temporally consistent 3D deformations are estimated without the need for dense correspondences at training time, by exploiting cycle consistency.…
▽ More
We propose Geometric Neural Parametric Models (GNPM), a learned parametric model that takes into account the local structure of data to learn disentangled shape and pose latent spaces of 4D dynamics, using a geometric-aware architecture on point clouds. Temporally consistent 3D deformations are estimated without the need for dense correspondences at training time, by exploiting cycle consistency. Besides its ability to learn dense correspondences, GNPMs also enable latent-space manipulations such as interpolation and shape/pose transfer. We evaluate GNPMs on various datasets of clothed humans, and show that it achieves comparable performance to state-of-the-art methods that require dense correspondences during training.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
One-Shot Transfer of Affordance Regions? AffCorrs!
Authors:
Denis Hadjivelichkov,
Sicelukwanda Zwane,
Marc Peter Deisenroth,
Lourdes Agapito,
Dimitrios Kanoulas
Abstract:
In this work, we tackle one-shot visual search of object parts. Given a single reference image of an object with annotated affordance regions, we segment semantically corresponding parts within a target scene. We propose AffCorrs, an unsupervised model that combines the properties of pre-trained DINO-ViT's image descriptors and cyclic correspondences. We use AffCorrs to find corresponding affordan…
▽ More
In this work, we tackle one-shot visual search of object parts. Given a single reference image of an object with annotated affordance regions, we segment semantically corresponding parts within a target scene. We propose AffCorrs, an unsupervised model that combines the properties of pre-trained DINO-ViT's image descriptors and cyclic correspondences. We use AffCorrs to find corresponding affordances both for intra- and inter-class one-shot part segmentation. This task is more difficult than supervised alternatives, but enables future work such as learning affordances via imitation and assisted teleoperation.
△ Less
Submitted 16 September, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
GO-Surf: Neural Feature Grid Optimization for Fast, High-Fidelity RGB-D Surface Reconstruction
Authors:
Jingwen Wang,
Tymoteusz Bleja,
Lourdes Agapito
Abstract:
We present GO-Surf, a direct feature grid optimization method for accurate and fast surface reconstruction from RGB-D sequences. We model the underlying scene with a learned hierarchical feature voxel grid that encapsulates multi-level geometric and appearance local information. Feature vectors are directly optimized such that after being tri-linearly interpolated, decoded by two shallow MLPs into…
▽ More
We present GO-Surf, a direct feature grid optimization method for accurate and fast surface reconstruction from RGB-D sequences. We model the underlying scene with a learned hierarchical feature voxel grid that encapsulates multi-level geometric and appearance local information. Feature vectors are directly optimized such that after being tri-linearly interpolated, decoded by two shallow MLPs into signed distance and radiance values, and rendered via surface volume rendering, the discrepancy between synthesized and observed RGB/depth values is minimized. Our supervision signals -- RGB, depth and approximate SDF -- can be obtained directly from input images without any need for fusion or post-processing. We formulate a novel SDF gradient regularization term that encourages surface smoothness and hole filling while maintaining high frequency details. GO-Surf can optimize sequences of $1$-$2$K frames in $15$-$45$ minutes, a speedup of $\times60$ over NeuralRGB-D, the most related approach based on an MLP representation, while maintaining on par performance on standard benchmarks. Project page: https://jingwenwang95.github.io/go_surf/
△ Less
Submitted 17 September, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Bimodal Camera Pose Prediction for Endoscopy
Authors:
Anita Rau,
Binod Bhattarai,
Lourdes Agapito,
Danail Stoyanov
Abstract:
Deducing the 3D structure of endoscopic scenes from images is exceedingly challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from their self-occluding and repetitive anatomical structure. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy, and a novel method that explicitly le…
▽ More
Deducing the 3D structure of endoscopic scenes from images is exceedingly challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from their self-occluding and repetitive anatomical structure. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy, and a novel method that explicitly learns a bimodal distribution to predict the endoscope pose. Our dataset replicates real colonoscope motion and highlights the drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available. We evaluate different camera pose prediction methods and demonstrate that, when trained on our data, they generalize to real colonoscopy sequences, and our bimodal approach outperforms prior unimodal work.
△ Less
Submitted 15 December, 2023; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings
Authors:
Mel Vecerik,
Jackie Kay,
Raia Hadsell,
Lourdes Agapito,
Jon Scholz
Abstract:
Dense object tracking, the ability to localize specific object points with pixel-level accuracy, is an important computer vision task with numerous downstream applications in robotics. Existing approaches either compute dense keypoint embeddings in a single forward pass, meaning the model is trained to track everything at once, or allocate their full capacity to a sparse predefined set of points,…
▽ More
Dense object tracking, the ability to localize specific object points with pixel-level accuracy, is an important computer vision task with numerous downstream applications in robotics. Existing approaches either compute dense keypoint embeddings in a single forward pass, meaning the model is trained to track everything at once, or allocate their full capacity to a sparse predefined set of points, trading generality for accuracy. In this paper we explore a middle ground based on the observation that the number of relevant points at a given time are typically relatively few, e.g. grasp points on a target object. Our main contribution is a novel architecture, inspired by few-shot task adaptation, which allows a sparse-style network to condition on a keypoint embedding that indicates which point to track. Our central finding is that this approach provides the generality of dense-embedding models, while offering accuracy significantly closer to sparse-keypoint approaches. We present results illustrating this capacity vs. accuracy trade-off, and demonstrate the ability to zero-shot transfer to new object instances (within-class) using a real-robot pick-and-place task.
△ Less
Submitted 13 December, 2021; v1 submitted 9 December, 2021;
originally announced December 2021.
-
CodeNeRF: Disentangled Neural Radiance Fields for Object Categories
Authors:
Wonbong Jang,
Lourdes Agapito
Abstract:
CodeNeRF is an implicit 3D neural representation that learns the variation of object shapes and textures across a category and can be trained, from a set of posed images, to synthesize novel views of unseen objects. Unlike the original NeRF, which is scene specific, CodeNeRF learns to disentangle shape and texture by learning separate embeddings. At test time, given a single unposed image of an un…
▽ More
CodeNeRF is an implicit 3D neural representation that learns the variation of object shapes and textures across a category and can be trained, from a set of posed images, to synthesize novel views of unseen objects. Unlike the original NeRF, which is scene specific, CodeNeRF learns to disentangle shape and texture by learning separate embeddings. At test time, given a single unposed image of an unseen object, CodeNeRF jointly estimates camera viewpoint, and shape and appearance codes via optimization. Unseen objects can be reconstructed from a single image, and then rendered from new viewpoints or their shape and texture edited by varying the latent codes. We conduct experiments on the SRN benchmark, which show that CodeNeRF generalises well to unseen objects and achieves on-par performance with methods that require known camera pose at test time. Our results on real-world images demonstrate that CodeNeRF can bridge the sim-to-real gap. Project page: \url{https://github.com/wayne1123/code-nerf}
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
DSP-SLAM: Object Oriented SLAM with Deep Shape Priors
Authors:
Jingwen Wang,
Martin Rünz,
Lourdes Agapito
Abstract:
We propose DSP-SLAM, an object-oriented SLAM system that builds a rich and accurate joint map of dense 3D models for foreground objects, and sparse landmark points to represent the background. DSP-SLAM takes as input the 3D point cloud reconstructed by a feature-based SLAM system and equips it with the ability to enhance its sparse map with dense reconstructions of detected objects. Objects are de…
▽ More
We propose DSP-SLAM, an object-oriented SLAM system that builds a rich and accurate joint map of dense 3D models for foreground objects, and sparse landmark points to represent the background. DSP-SLAM takes as input the 3D point cloud reconstructed by a feature-based SLAM system and equips it with the ability to enhance its sparse map with dense reconstructions of detected objects. Objects are detected via semantic instance segmentation, and their shape and pose is estimated using category-specific deep shape embeddings as priors, via a novel second order optimization. Our object-aware bundle adjustment builds a pose-graph to jointly optimize camera poses, object locations and feature points. DSP-SLAM can operate at 10 frames per second on 3 different input modalities: monocular, stereo, or stereo+LiDAR. We demonstrate DSP-SLAM operating at almost frame rate on monocular-RGB sequences from the Friburg and Redwood-OS datasets, and on stereo+LiDAR sequences on the KITTI odometry dataset showing that it achieves high-quality full object reconstructions, even from partial observations, while maintaining a consistent global map. Our evaluation shows improvements in object pose and shape reconstruction with respect to recent deep prior-based reconstruction methods and reductions in camera tracking drift on the KITTI dataset.
△ Less
Submitted 22 October, 2021; v1 submitted 21 August, 2021;
originally announced August 2021.
-
Multi-person Implicit Reconstruction from a Single Image
Authors:
Armin Mustafa,
Akin Caliskan,
Lourdes Agapito,
Adrian Hilton
Abstract:
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image. Existing multi-person methods suffer from two main drawbacks: they are often model-based and therefore cannot capture accurate 3D models of people with loose clothing and hair; or they require manual intervention to resolve occlusions or interactions. Our…
▽ More
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image. Existing multi-person methods suffer from two main drawbacks: they are often model-based and therefore cannot capture accurate 3D models of people with loose clothing and hair; or they require manual intervention to resolve occlusions or interactions. Our method addresses both limitations by introducing the first end-to-end learning approach to perform model-free implicit reconstruction for realistic 3D capture of multiple clothed people in arbitrary poses (with occlusions) from a single image. Our network simultaneously estimates the 3D geometry of each person and their 6DOF spatial locations, to obtain a coherent multi-human reconstruction. In addition, we introduce a new synthetic dataset that depicts images with a varying number of inter-occluded humans and a variety of clothing and hair styles. We demonstrate robust, high-resolution reconstructions on images of multiple humans with complex occlusions, loose clothing and a large variety of poses and scenes. Our quantitative evaluation on both synthetic and real-world datasets demonstrates state-of-the-art performance with significant improvements in the accuracy and completeness of the reconstructions over competing approaches.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera
Authors:
Denis Tome,
Thiemo Alldieck,
Patrick Peluse,
Gerard Pons-Moll,
Lourdes Agapito,
Hernan Badino,
Fernando De la Torre
Abstract:
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device. This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions that result in drastic differences in resolution between lower and upper body. We propose an e…
▽ More
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device. This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions that result in drastic differences in resolution between lower and upper body. We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions. The quantitative evaluation, on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric approaches. To tackle the lack of labelled data we also introduced a large photo-realistic synthetic dataset. xR-EgoPose offers high quality renderings of people with diverse skintones, body shapes and clothing, performing a range of actions. Our experiments show that the high variability in our new synthetic training corpus leads to good generalization to real world footage and to state of theart results on real world datasets with ground truth. Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via Multi-View Consistency
Authors:
Mel Vecerik,
Jean-Baptiste Regli,
Oleg Sushkov,
David Barker,
Rugile Pevceviciute,
Thomas Rothörl,
Christopher Schuster,
Raia Hadsell,
Lourdes Agapito,
Jonathan Scholz
Abstract:
A robot's ability to act is fundamentally constrained by what it can perceive. Many existing approaches to visual representation learning utilize general-purpose training criteria, e.g. image reconstruction, smoothness in latent space, or usefulness for control, or else make use of large datasets annotated with specific features (bounding boxes, segmentations, etc.). However, both approaches often…
▽ More
A robot's ability to act is fundamentally constrained by what it can perceive. Many existing approaches to visual representation learning utilize general-purpose training criteria, e.g. image reconstruction, smoothness in latent space, or usefulness for control, or else make use of large datasets annotated with specific features (bounding boxes, segmentations, etc.). However, both approaches often struggle to capture the fine-detail required for precision tasks on specific objects, e.g. grasping and mating a plug and socket. We argue that these difficulties arise from a lack of geometric structure in these models. In this work we advocate semantic 3D keypoints as a visual representation, and present a semi-supervised training objective that can allow instance or category-level keypoints to be trained to 1-5 millimeter-accuracy with minimal supervision. Furthermore, unlike local texture-based approaches, our model integrates contextual information from a large area and is therefore robust to occlusion, noise, and lack of discernible texture. We demonstrate that this ability to locate semantic keypoints enables high level scripting of human understandable behaviours. Finally we show that these keypoints provide a good way to define reward functions for reinforcement learning and are a good representation for training agents.
△ Less
Submitted 13 October, 2020; v1 submitted 30 September, 2020;
originally announced September 2020.
-
DiverseNet: When One Right Answer is not Enough
Authors:
Michael Firman,
Neill D. F. Campbell,
Lourdes Agapito,
Gabriel J. Brostow
Abstract:
Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a sing…
▽ More
Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a single test-time prediction for each query, failing to find other modes in the output space. Existing methods that allow for sampling often sacrifice speed or accuracy.
We introduce a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query. For a single input, we learn to predict a range of possible answers. We compare favorably to methods that seek diversity through an ensemble of networks. Such stochastic multiple choice learning faces mode collapse, where one or more ensemble members fail to receive any training signal. Our best performing solution can be deployed for various tasks, and just involves small modifications to the existing single-mode architecture, loss function, and training regime. We demonstrate that our method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
FroDO: From Detections to 3D Objects
Authors:
Kejie Li,
Martin Rünz,
Meng Tang,
Lingni Ma,
Chen Kong,
Tanner Schmidt,
Ian Reid,
Lourdes Agapito,
Julian Straub,
Steven Lovegrove,
Richard Newcombe
Abstract:
Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel le…
▽ More
Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel learnt space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a category-aware 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse and dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera
Authors:
Denis Tome,
Patrick Peluse,
Lourdes Agapito,
Hernan Badino
Abstract:
We present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. This unusual viewpoint, just 2 cm. away from the user's face, leads to images with unique visual appearance, characterized by severe self-occlusions and strong perspective distortions that result in a d…
▽ More
We present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. This unusual viewpoint, just 2 cm. away from the user's face, leads to images with unique visual appearance, characterized by severe self-occlusions and strong perspective distortions that result in a drastic difference in resolution between lower and upper body. Our contribution is two-fold. Firstly, we propose a new encoder-decoder architecture with a novel dual branch decoder designed specifically to account for the varying uncertainty in the 2D joint locations. Our quantitative evaluation, both on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric pose estimation approaches. Our second contribution is a new large-scale photorealistic synthetic dataset - xR-EgoPose - offering 383K frames of high quality renderings of people with a diversity of skin tones, body shapes, clothing, in a variety of backgrounds and lighting conditions, performing a range of actions. Our experiments show that the high variability in our new synthetic training corpus leads to good generalization to real world footage and to state of the art results on real world datasets with ground truth. Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
3D Pick & Mix: Object Part Blending in Joint Shape and Image Manifolds
Authors:
Adrian Penate-Sanchez,
Lourdes Agapito
Abstract:
We present 3D Pick & Mix, a new 3D shape retrieval system that provides users with a new level of freedom to explore 3D shape and Internet image collections by introducing the ability to reason about objects at the level of their constituent parts. While classic retrieval systems can only formulate simple searches such as "find the 3D model that is most similar to the input image" our new approach…
▽ More
We present 3D Pick & Mix, a new 3D shape retrieval system that provides users with a new level of freedom to explore 3D shape and Internet image collections by introducing the ability to reason about objects at the level of their constituent parts. While classic retrieval systems can only formulate simple searches such as "find the 3D model that is most similar to the input image" our new approach can formulate advanced and semantically meaningful search queries such as: "find me the 3D model that best combines the design of the legs of the chair in image 1 but with no armrests, like the chair in image 2". Many applications could benefit from such rich queries, users could browse through catalogues of furniture and pick and mix parts, combining for example the legs of a chair from one shop and the armrests from another shop.
△ Less
Submitted 2 November, 2018;
originally announced November 2018.
-
Rethinking Pose in 3D: Multi-stage Refinement and Recovery for Markerless Motion Capture
Authors:
Denis Tome,
Matteo Toso,
Lourdes Agapito,
Chris Russell
Abstract:
We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our approach makes use of 3D reasoning throughout a multi-stage approach. This novelty allows us to use provisional 3D models of human pose to rethink where the joints should be loca…
▽ More
We propose a CNN-based approach for multi-camera markerless motion capture of the human body. Unlike existing methods that first perform pose estimation on individual cameras and generate 3D models as post-processing, our approach makes use of 3D reasoning throughout a multi-stage approach. This novelty allows us to use provisional 3D models of human pose to rethink where the joints should be located in the image and to recover from past mistakes. Our principled refinement of 3D human poses lets us make use of image cues, even from images where we previously misdetected joints, to refine our estimates as part of an end-to-end approach. Finally, we demonstrate how the high-quality output of our multi-camera setup can be used as an additional training source to improve the accuracy of existing single camera models.
△ Less
Submitted 4 August, 2018;
originally announced August 2018.
-
MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects
Authors:
Martin Rünz,
Maud Buffier,
Lourdes Agapito
Abstract:
We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene. MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera.
As an RGB-D camera scans a…
▽ More
We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene. MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera.
As an RGB-D camera scans a cluttered scene, image-based instance-level semantic segmentation creates semantic object masks that enable real-time object recognition and the creation of an object-level representation for the world map. Unlike previous recognition-based SLAM systems, MaskFusion does not require known models of the objects it can recognize, and can deal with multiple independent motions. MaskFusion takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into an object-aware map, unlike recent semantics enabled SLAM systems that perform voxel-level semantic segmentation. We show augmented-reality applications that demonstrate the unique features of the map output by MaskFusion: instance-aware, semantic and dynamic.
△ Less
Submitted 22 October, 2018; v1 submitted 24 April, 2018;
originally announced April 2018.
-
Training VAEs Under Structured Residuals
Authors:
Garoe Dorta,
Sara Vicente,
Lourdes Agapito,
Neill D. F. Campbell,
Ivor Simpson
Abstract:
Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorized likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demons…
▽ More
Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorized likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demonstrates a novel scheme to incorporate a structured Gaussian likelihood prediction network within the VAE that allows the residual correlations to be modeled. Our novel architecture, with minimal increase in complexity, incorporates the covariance matrix prediction within the VAE. We also propose a new mechanism for allowing structured uncertainty on color images. Furthermore, we provide a scheme for effectively training this model, and include some suggestions for improving performance in terms of efficiency or modeling longer range correlations.
△ Less
Submitted 31 July, 2018; v1 submitted 3 April, 2018;
originally announced April 2018.
-
Ab Initio Electron-Phonon Interactions Using Atomic Orbital Wavefunctions
Authors:
Luis A. Agapito,
Marco Bernardi
Abstract:
The interaction between electrons and lattice vibrations determines key physical properties of materials, including their electrical and heat transport, excited electron dynamics, phase transitions, and superconductivity. We present a new ab initio method that employs atomic orbital (AO) wavefunctions to compute the electron-phonon (e-ph) interactions in materials and interpolate the e-ph coupling…
▽ More
The interaction between electrons and lattice vibrations determines key physical properties of materials, including their electrical and heat transport, excited electron dynamics, phase transitions, and superconductivity. We present a new ab initio method that employs atomic orbital (AO) wavefunctions to compute the electron-phonon (e-ph) interactions in materials and interpolate the e-ph coupling matrix elements to fine Brillouin zone grids. We detail the numerical implementation of such AO-based e-ph calculations, and benchmark them against direct density functional theory calculations and Wannier function (WF) interpolation. The key advantages of AOs over WFs for e-ph calculations are outlined. Since AOs are fixed basis functions associated with the atoms, they circumvent the need to generate a material-specific localized basis set with a trial-and-error approach, as is needed in WFs. Therefore, AOs are ideal to compute e-ph interactions in chemically and structurally complex materials for which WFs are challenging to generate, and are also promising for high-throughput materials discovery. While our results focus on AOs, the formalism we present generalizes e-ph calculations to arbitrary localized basis sets, with WFs recovered as a special case.
△ Less
Submitted 16 March, 2018;
originally announced March 2018.
-
Structured Uncertainty Prediction Networks
Authors:
Garoe Dorta,
Sara Vicente,
Lourdes Agapito,
Neill D. F. Campbell,
Ivor Simpson
Abstract:
This paper is the first work to propose a network to predict a structured uncertainty distribution for a synthesized image. Previous approaches have been mostly limited to predicting diagonal covariance matrices. Our novel model learns to predict a full Gaussian covariance matrix for each reconstruction, which permits efficient sampling and likelihood evaluation.
We demonstrate that our model ca…
▽ More
This paper is the first work to propose a network to predict a structured uncertainty distribution for a synthesized image. Previous approaches have been mostly limited to predicting diagonal covariance matrices. Our novel model learns to predict a full Gaussian covariance matrix for each reconstruction, which permits efficient sampling and likelihood evaluation.
We demonstrate that our model can accurately reconstruct ground truth correlated residual distributions for synthetic datasets and generate plausible high frequency samples for real face images. We also illustrate the use of these predicted covariances for structure preserving image denoising.
△ Less
Submitted 23 March, 2018; v1 submitted 20 February, 2018;
originally announced February 2018.
-
Charge Transport in Organic Molecular Semiconductors from First Principles: The Band-Like Hole Mobility in Naphthalene Crystal
Authors:
Nien-En Lee,
Jin-Jian Zhou,
Luis A. Agapito,
Marco Bernardi
Abstract:
Predicting charge transport in organic molecular crystals is notoriously challenging. Carrier mobility calculations in organic semiconductors are dominated by quantum chemistry methods based on charge hopping, which are laborious and only moderately accurate. We compute from first principles the electron-phonon scattering and the phonon-limited hole mobility of naphthalene crystal in the framework…
▽ More
Predicting charge transport in organic molecular crystals is notoriously challenging. Carrier mobility calculations in organic semiconductors are dominated by quantum chemistry methods based on charge hopping, which are laborious and only moderately accurate. We compute from first principles the electron-phonon scattering and the phonon-limited hole mobility of naphthalene crystal in the framework of ab initio band theory. Our calculations combine GW electronic bandstructures, ab initio electron-phonon scattering, and the Boltzmann transport equation. The calculated hole mobility is in very good agreement with experiment between 100$-$300 K, and we can predict its temperature dependence with high accuracy. We show that scattering between inter-molecular phonons and holes regulates the mobility, though intra-molecular phonons possess the strongest coupling with holes. We revisit the common belief that only rigid molecular motions affect carrier dynamics in organic molecular crystals. Our work provides a quantitative and rigorous framework to compute charge transport in organic crystals, and is a first step toward reconciling band theory and carrier hopping computational methods.
△ Less
Submitted 12 March, 2018; v1 submitted 1 December, 2017;
originally announced December 2017.
-
The AFLOW Fleet for Materials Discovery
Authors:
Cormac Toher,
Corey Oses,
David Hicks,
Eric Gossett,
Frisco Rose,
Pinku Nath,
Demet Usanmaz,
Denise C. Ford,
Eric Perim,
Camilo E. Calderon,
Jose J. Plata,
Yoav Lederer,
Michal Jahnátek,
Wahyu Setyawan,
Shidong Wang,
Junkai Xue,
Kevin Rasch,
Roman V. Chepulskii,
Richard H. Taylor,
Geena Gomez,
Harvey Shi,
Andrew R. Supka,
Rabih Al Rahal Al Orabi,
Priya Gopal,
Frank T. Cerasoli
, et al. (26 additional authors not shown)
Abstract:
The traditional paradigm for materials discovery has been recently expanded to incorporate substantial data driven research. With the intent to accelerate the development and the deployment of new technologies, the AFLOW Fleet for computational materials design automates high-throughput first principles calculations, and provides tools for data verification and dissemination for a broad community…
▽ More
The traditional paradigm for materials discovery has been recently expanded to incorporate substantial data driven research. With the intent to accelerate the development and the deployment of new technologies, the AFLOW Fleet for computational materials design automates high-throughput first principles calculations, and provides tools for data verification and dissemination for a broad community of users. AFLOW incorporates different computational modules to robustly determine thermodynamic stability, electronic band structures, vibrational dispersions, thermo-mechanical properties and more. The AFLOW data repository is publicly accessible online at aflow.org, with more than 1.7 million materials entries and a panoply of queryable computed properties. Tools to programmatically search and process the data, as well as to perform online machine learning predictions, are also available.
△ Less
Submitted 1 December, 2017;
originally announced December 2017.
-
Better Together: Joint Reasoning for Non-rigid 3D Reconstruction with Specularities and Shading
Authors:
Qi Liu-Yin,
Rui Yu,
Lourdes Agapito,
Andrew Fitzgibbon,
Chris Russell
Abstract:
We demonstrate the use of shape-from-shading (SfS) to improve both the quality and the robustness of 3D reconstruction of dynamic objects captured by a single camera. Unlike previous approaches that made use of SfS as a post-processing step, we offer a principled integrated approach that solves dynamic object tracking and reconstruction and SfS as a single unified cost function. Moving beyond Lamb…
▽ More
We demonstrate the use of shape-from-shading (SfS) to improve both the quality and the robustness of 3D reconstruction of dynamic objects captured by a single camera. Unlike previous approaches that made use of SfS as a post-processing step, we offer a principled integrated approach that solves dynamic object tracking and reconstruction and SfS as a single unified cost function. Moving beyond Lambertian S f S , we propose a general approach that models both specularities and shading while simultaneously tracking and reconstructing general dynamic objects. Solving these problems jointly prevents the kinds of tracking failures which can not be recovered from by pipeline approaches. We show state-of-the-art results both qualitatively and quantitatively.
△ Less
Submitted 4 August, 2017;
originally announced August 2017.
-
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
Authors:
Martin Rünz,
Lourdes Agapito
Abstract:
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live stream of RGB-D images as input and segments the scene into different objects (using either motion or semantic cues) while simultaneously tracking and reconstructing their 3D shape in real time. We use a multiple model fitting approach where each object can move independently from the background and still be effectively tr…
▽ More
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live stream of RGB-D images as input and segments the scene into different objects (using either motion or semantic cues) while simultaneously tracking and reconstructing their 3D shape in real time. We use a multiple model fitting approach where each object can move independently from the background and still be effectively tracked and its shape fused over time using only the information from pixels associated with that object label. Previous attempts to deal with dynamic scenes have typically considered moving regions as outliers, and consequently do not model their shape or track their motion over time. In contrast, we enable the robot to maintain 3D models for each of the segmented objects and to improve them over time through fusion. As a result, our system can enable a robot to maintain a scene description at the object level which has the potential to allow interactions with its working environment; even in the case of dynamic scenes.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image
Authors:
Denis Tome,
Chris Russell,
Lourdes Agapito
Abstract:
We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks. We take an integrated approach that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search…
▽ More
We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks. We take an integrated approach that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search for better 2D locations. The entire process is trained end-to-end, is extremely efficient and obtains state- of-the-art results on Human3.6M outperforming previous approaches both on 2D and 3D errors.
△ Less
Submitted 11 October, 2017; v1 submitted 1 January, 2017;
originally announced January 2017.
-
Accurate $ab~initio$ tight-binding Hamiltonians: effective tools for electronic transport and optical spectroscopy from first principles
Authors:
Pino D'Amico,
Luis A. Agapito,
Alessandra Catellani,
Alice Ruini,
Stefano Curtarolo,
Marco Fornari,
Marco Buongiorno Nardelli,
Arrigo Calzolari
Abstract:
The calculations of electronic transport coefficients and optical properties require a very dense interpolation of the electronic band structure in reciprocal space that is computationally expensive and may have issues with band crossing and degeneracies. Capitalizing on a recently developed pseudo-atomic orbital projection technique, we exploit the exact tight-binding representation of the first…
▽ More
The calculations of electronic transport coefficients and optical properties require a very dense interpolation of the electronic band structure in reciprocal space that is computationally expensive and may have issues with band crossing and degeneracies. Capitalizing on a recently developed pseudo-atomic orbital projection technique, we exploit the exact tight-binding representation of the first principles electronic structure for the purposes of (1) providing an efficient strategy to explore the full band structure $E_n({\bf k})$, (2) computing the momentum operator differentiating directly the Hamiltonian, and (3) calculating the imaginary part of the dielectric function. This enables us to determine the Boltzmann transport coefficients and the optical properties within the independent particle approximation. In addition, the local nature of the tight-binding representation facilitates the calculation of the ballistic transport within the Landauer theory for systems with hundreds of atoms. In order to validate our approach we study the multi-valley band structure of CoSb$_3$ and a large core-shell nanowire using the ACBN0 functional. In CoSb$_3$ we point the many band minima contributing to the electronic transport that enhance the thermoelectric properties; for the core-shell nanowire we identify possible mechanisms for photo-current generation and justify the presence of protected transport channels in the wire.
△ Less
Submitted 19 August, 2016;
originally announced August 2016.
-
Accurate Tight-Binding Hamiltonians for 2D and Layered Materials
Authors:
Luis Agapito,
Marco Fornari,
Davide Ceresoli,
Andrea Ferretti,
Stefano Curtarolo,
Marco Buongiorno Nardelli
Abstract:
We present a scheme to controllably improve the accuracy of tight-binding Hamiltonian matrices derived by projecting the solutions of plane-wave ab initio calculations on atomic orbital basis sets. By systematically increasing the completeness of the basis set of atomic orbitals, we are able to optimize the quality of the band structure interpolation over wide energy ranges including unoccupied st…
▽ More
We present a scheme to controllably improve the accuracy of tight-binding Hamiltonian matrices derived by projecting the solutions of plane-wave ab initio calculations on atomic orbital basis sets. By systematically increasing the completeness of the basis set of atomic orbitals, we are able to optimize the quality of the band structure interpolation over wide energy ranges including unoccupied states. This methodology is applied to the case of interlayer and image states, which appear several eV above the Fermi level in materials with large interstitial regions or surfaces such as graphite and graphene. Due to their spatial localization in the empty regions inside or outside of the system, these states have been inaccessible to traditional tight-binding models and even to ab initio calculations with atom-centered basis functions.
△ Less
Submitted 11 January, 2016;
originally announced January 2016.
-
Solving Jigsaw Puzzles with Linear Programming
Authors:
Rui Yu,
Chris Russell,
Lourdes Agapito
Abstract:
We propose a novel Linear Program (LP) based formula- tion for solving jigsaw puzzles. We formulate jigsaw solving as a set of successive global convex relaxations of the stan- dard NP-hard formulation, that can describe both jigsaws with pieces of unknown position and puzzles of unknown po- sition and orientation. The main contribution and strength of our approach comes from the LP assembly strat…
▽ More
We propose a novel Linear Program (LP) based formula- tion for solving jigsaw puzzles. We formulate jigsaw solving as a set of successive global convex relaxations of the stan- dard NP-hard formulation, that can describe both jigsaws with pieces of unknown position and puzzles of unknown po- sition and orientation. The main contribution and strength of our approach comes from the LP assembly strategy. In contrast to existing greedy methods, our LP solver exploits all the pairwise matches simultaneously, and computes the position of each piece/component globally. The main ad- vantages of our LP approach include: (i) a reduced sensi- tivity to local minima compared to greedy approaches, since our successive approximations are global and convex and (ii) an increased robustness to the presence of mismatches in the pairwise matches due to the use of a weighted L1 penalty. To demonstrate the effectiveness of our approach, we test our algorithm on public jigsaw datasets and show that it outperforms state-of-the-art methods.
△ Less
Submitted 13 November, 2015;
originally announced November 2015.
-
Accurate tight-binding Hamiltonian matrices from ab-initio calculations: Minimal basis sets
Authors:
Luis A. Agapito,
Sohrab Ismail-Beigi. Stefano Curtarolo,
Marco Fornari,
Marco Buongiorno Nardelli
Abstract:
Projection of Bloch states obtained from quantum-mechanical calculations onto atomic orbitals is the fastest scheme to construct ab-initio tight-binding Hamiltonian matrices. However, the presence of spurious states and unphysical hybridizations of the tight-binding eigenstates has hindered the applicability of this construction. Here we demonstrate that those spurious effects are due to the inclu…
▽ More
Projection of Bloch states obtained from quantum-mechanical calculations onto atomic orbitals is the fastest scheme to construct ab-initio tight-binding Hamiltonian matrices. However, the presence of spurious states and unphysical hybridizations of the tight-binding eigenstates has hindered the applicability of this construction. Here we demonstrate that those spurious effects are due to the inclusion of Bloch states with low projectability. The mechanism for the formation of those effects is derived analytically. We present an improved scheme for the removal of the spurious states which results in an efficient scheme for the construction of highly accurate ab-initio tight-binding Hamiltonians.
△ Less
Submitted 19 October, 2015; v1 submitted 8 September, 2015;
originally announced September 2015.
-
Improved predictions of the physical properties of Zn- and Cd-based wide band-gap semiconductors: a validation of the ACBN0 functional
Authors:
Priya Gopal,
Marco Fornari,
Stefano Curtarolo,
Luis A. Agapito,
Laalitha S. I. Liyanage,
Marco Buongiorno Nardelli
Abstract:
We study the physical properties of Zn$X$ ($X$=O, S, Se, Te) and Cd$X$ ($X$=O, S, Se, Te) in the zinc-blende, rock-salt, and wurtzite structures using the recently developed fully $ab$ $initio$ pseudo-hybrid Hubbard density functional ACBN0. We find that both the electronic and vibrational properties of these wide-band gap semiconductors are systematically improved over the PBE values and reproduc…
▽ More
We study the physical properties of Zn$X$ ($X$=O, S, Se, Te) and Cd$X$ ($X$=O, S, Se, Te) in the zinc-blende, rock-salt, and wurtzite structures using the recently developed fully $ab$ $initio$ pseudo-hybrid Hubbard density functional ACBN0. We find that both the electronic and vibrational properties of these wide-band gap semiconductors are systematically improved over the PBE values and reproduce closely the experimental measurements. Similar accuracy is found for the structural parameters, especially the bulk modulus. ACBN0 results compare well with hybrid functional calculations at a fraction of the computational cost.
△ Less
Submitted 20 May, 2015;
originally announced May 2015.
-
Lifting Object Detection Datasets into 3D
Authors:
Joao Carreira,
Sara Vicente,
Lourdes Agapito,
Jorge Batista
Abstract:
While data has certainly taken the center stage in computer vision in recent years, it can still be difficult to obtain in certain scenarios. In particular, acquiring ground truth 3D shapes of objects pictured in 2D images remains a challenging feat and this has hampered progress in recognition-based object reconstruction from a single image. Here we propose to bypass previous solutions such as 3D…
▽ More
While data has certainly taken the center stage in computer vision in recent years, it can still be difficult to obtain in certain scenarios. In particular, acquiring ground truth 3D shapes of objects pictured in 2D images remains a challenging feat and this has hampered progress in recognition-based object reconstruction from a single image. Here we propose to bypass previous solutions such as 3D scanning or manual design, that scale poorly, and instead populate object category detection datasets semi-automatically with dense, per-object 3D reconstructions, bootstrapped from:(i) class labels, (ii) ground truth figure-ground segmentations and (iii) a small set of keypoint annotations. Our proposed algorithm first estimates camera viewpoint using rigid structure-from-motion and then reconstructs object shapes by optimizing over visual hull proposals guided by loose within-class shape similarity assumptions. The visual hull sampling process attempts to intersect an object's projection cone with the cones of minimal subsets of other similar objects among those pictured from certain vantage points. We show that our method is able to produce convincing per-object 3D reconstructions and to accurately estimate cameras viewpoints on one of the most challenging existing object-category detection datasets, PASCAL VOC. We hope that our results will re-stimulate interest on joint object recognition and 3D reconstruction from a single image.
△ Less
Submitted 31 July, 2016; v1 submitted 22 March, 2015;
originally announced March 2015.
-
Reformulation of DFT+U as a pseudo-hybrid Hubbard density functional
Authors:
Luis A. Agapito,
Stefano Curtarolo,
Marco Buongiorno Nardelli
Abstract:
The accurate prediction of the electronic properties of materials at a low computational expense is a necessary conditions for the development of effective high-throughput quantum-mechanics (HTQM) frameworks for accelerated materials discovery. HTQM infrastructures rely on the predictive capability of Density Functional Theory (DFT), the method of choice for the first principles study of materials…
▽ More
The accurate prediction of the electronic properties of materials at a low computational expense is a necessary conditions for the development of effective high-throughput quantum-mechanics (HTQM) frameworks for accelerated materials discovery. HTQM infrastructures rely on the predictive capability of Density Functional Theory (DFT), the method of choice for the first principles study of materials properties. However, DFT suffers of approximations that result in a somewhat inaccurate description of the electronic band structure of semiconductors and insulators. In this article we introduce ACBN0, a pseudo-hybrid Hubbard density functional that yields an improved prediction of the band structure of insulators such as transition-metal oxides, as shown for TiO2, MnO, NiO and ZnO, with only a negligible increase in computational cost.
△ Less
Submitted 21 October, 2014; v1 submitted 12 June, 2014;
originally announced June 2014.
-
Effective and accurate representation of extended Bloch states on finite Hilbert spaces
Authors:
Luis A. Agapito,
Andrea Ferretti,
Arrigo Calzolari,
Stefano Curtarolo,
Marco Buongiorno Nardelli
Abstract:
We present a straightforward, noniterative projection scheme that can represent the electronic ground state of a periodic system on a finite atomic-orbital-like basis, up to a predictable number of electronic states and with controllable accuracy. By co-filtering the projections of plane-wave Bloch states with high-kinetic-energy components, the richness of the finite space and thus the number of…
▽ More
We present a straightforward, noniterative projection scheme that can represent the electronic ground state of a periodic system on a finite atomic-orbital-like basis, up to a predictable number of electronic states and with controllable accuracy. By co-filtering the projections of plane-wave Bloch states with high-kinetic-energy components, the richness of the finite space and thus the number of exactly-reproduced bands can be selectively increased at a negligible computational cost, an essential requirement for the design of efficient algorithms for electronic structure simulations of realistic material systems and massive high-throughput investigations.
△ Less
Submitted 30 September, 2013;
originally announced October 2013.
-
Strain-induced topological insulator phase transition in HgSe
Authors:
Lars Winterfeld,
Luis A. Agapito,
Jin Li,
Nicholas Kioussis,
Peter Blaha,
Yong P. Chen
Abstract:
Using ab initio electronic structure calculations we investigate the change of the band structure and the nu_0 topological invariant in HgSe (non-centrosymmetric system) under two different type of uniaxial strain along the [001] and [110] directions, respectively. Both compressive [001] and [110] strain leads to the opening of a (crystal field) band gap (with a maximum value of about 37 meV) in t…
▽ More
Using ab initio electronic structure calculations we investigate the change of the band structure and the nu_0 topological invariant in HgSe (non-centrosymmetric system) under two different type of uniaxial strain along the [001] and [110] directions, respectively. Both compressive [001] and [110] strain leads to the opening of a (crystal field) band gap (with a maximum value of about 37 meV) in the vicinity of Gamma, and the concomitant formation of a camel-back- (inverse camel-back-) shape valence (conduction) band along the direction perpendicular to the strain with a minimum (maximum) at Gamma. We find that the Z_2 invariant nu_0=1, which demonstrates conclusively that HgSe is a strong topological insulator (TI). With further increase of the strain the band gap decreases vanishing at a critical strain value (which depends on the strain type) where HgSe undergoes a transition from a strong TI to a trivial (normal) insulator. HgSe exhibits a similar behavior under a tensile [110] uniaxial strain. On the other hand, HgSe remains a normal insulator by applying a [001] tensile uniaxial strain. Complementary electronic structure calculations of the non-polar (110) surface under compressive [110] tensile strain show two Dirac cones at the Gamma point whose spin chiral states are associated with the top and bottom slab surfaces.
△ Less
Submitted 17 February, 2013;
originally announced February 2013.
-
Aviram-Ratner rectifying mechanism for DNA base pair sequencing through graphene nanogaps
Authors:
Luis A. Agapito,
Jacob Gayles,
Christian Wolowiec,
Nicholas Kioussis
Abstract:
We demonstrate that biological molecules such as Watson-Crick DNA base pairs can behave as biological Aviram-Ratner electrical rectifiers because of the spatial separation and weak hydrogen bonding between the nucleobases. We have performed a parallel computational implementation of the ab-initio non-equilibrium Green's function (NEGF) theory to determine the electrical response of graphene---base…
▽ More
We demonstrate that biological molecules such as Watson-Crick DNA base pairs can behave as biological Aviram-Ratner electrical rectifiers because of the spatial separation and weak hydrogen bonding between the nucleobases. We have performed a parallel computational implementation of the ab-initio non-equilibrium Green's function (NEGF) theory to determine the electrical response of graphene---base-pair---graphene junctions. The results show an asymmetric (rectifying) current-voltage response for the Cytosine-Guanine base pair adsorbed on a graphene nanogap. In sharp contrast we find a symmetric response for the Thymine-Adenine case. We propose applying the asymmetry of the current-voltage response as a sensing criterion to the technological challenge of rapid DNA sequencing via graphene nanogaps.
△ Less
Submitted 17 February, 2012; v1 submitted 30 December, 2011;
originally announced January 2012.
-
Approaching the Intrinsic Bandgap in Suspended High-Mobility Graphene Nanoribbons
Authors:
Ming-Wei Lin,
Cheng Ling,
Luis A. Agapito,
Nicholas Kioussis,
Yiyang Zhang,
Mark Ming-Cheng Cheng,
Wei L. Wang,
Efthimios Kaxiras,
Zhixian Zhou
Abstract:
We report electrical transport measurements on a suspended ultra-low-disorder graphene nanoribbon(GNR) with nearly atomically smooth edges that reveal a high mobility exceeding 3000 cm2 V-1 s-1 and an intrinsic band gap. The experimentally derived bandgap is in quantitative agreement with the results of our electronic-structure calculations on chiral GNRs with comparable width taking into account…
▽ More
We report electrical transport measurements on a suspended ultra-low-disorder graphene nanoribbon(GNR) with nearly atomically smooth edges that reveal a high mobility exceeding 3000 cm2 V-1 s-1 and an intrinsic band gap. The experimentally derived bandgap is in quantitative agreement with the results of our electronic-structure calculations on chiral GNRs with comparable width taking into account the electron-electron interactions, indicating that the origin of the bandgap in non-armchair GNRs is partially due to the magnetic zigzag edges.
△ Less
Submitted 27 May, 2011;
originally announced May 2011.
-
Room-Temperature High On/Off Ratio in Suspended Graphene Nanoribbon Field Effect Transistors
Authors:
Ming-Wei Lin,
Cheng Ling,
Yiyang Zhang,
Hyeun Joong Yoon,
Mark Ming-Cheng Cheng,
Luis A. Agapito,
Nicholas Kioussis,
Noppi Widjaja,
Zhixian Zhou
Abstract:
We have fabricated suspended few layer (1-3 layers) graphene nanoribbon field effect transistors from unzipped multiwall carbon nanotubes. Electrical transport measurements show that current-annealing effectively removes the impurities on the suspended graphene nanoribbons, uncovering the intrinsic ambipolar transfer characteristic of graphene. Further increasing the annealing current creates a na…
▽ More
We have fabricated suspended few layer (1-3 layers) graphene nanoribbon field effect transistors from unzipped multiwall carbon nanotubes. Electrical transport measurements show that current-annealing effectively removes the impurities on the suspended graphene nanoribbons, uncovering the intrinsic ambipolar transfer characteristic of graphene. Further increasing the annealing current creates a narrow constriction in the ribbon, leading to the formation of a large band-gap and subsequent high on/off ratio (which can exceed 104). Such fabricated devices are thermally and mechanically stable: repeated thermal cycling has little effect on their electrical properties. This work shows for the first time that ambipolar field effect characteristics and high on/off ratios at room temperature can be achieved in relatively wide graphene nanoribbon (15 nm ~50 nm) by controlled current annealing.
△ Less
Submitted 8 April, 2011;
originally announced April 2011.