Zum Hauptinhalt springen

Showing 1–42 of 42 results for author: Prisacariu, V A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13912  [pdf, other

    cs.CV cs.LG

    Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

    Authors: Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu

    Abstract: In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extend… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Our project page can be found at: https://splatt3r.active.vision/

  2. arXiv:2408.11085  [pdf, other

    cs.CV

    GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting

    Authors: Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan Hu, Zirui Wang, Ming Cheng, Victor Adrian Prisacariu, Tristan Braud

    Abstract: We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement framework, GSLoc. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences.… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: The project page is available at https://gsloc.active.vision

  3. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  4. arXiv:2404.14409  [pdf, other

    cs.CV

    CrossScore: Towards Multi-View Image Evaluation and Scoring

    Authors: Zirui Wang, Wenjing Bian, Victor Adrian Prisacariu

    Abstract: We introduce a novel cross-reference image quality assessment method that effectively fills the gap in the image assessment landscape, complementing the array of established evaluation schemes -- ranging from full-reference metrics like SSIM, no-reference metrics such as NIQE, to general-reference metrics including FID, and Multi-modal-reference metrics, e.g., CLIPScore. Utilising a neural network… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted at ECCV 2024. Project page see https://crossscore.active.vision

  5. arXiv:2404.14351  [pdf, other

    cs.CV

    Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

    Authors: Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cavallari, Áron Monszpart, Daniyar Turmukhambetov, Victor Adrian Prisacariu

    Abstract: We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of… ▽ More

    Submitted 26 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: ECCV 2024, Project page: https://nianticlabs.github.io/acezero/

  6. arXiv:2404.09884  [pdf, other

    cs.CV cs.LG

    Map-Relative Pose Regression for Visual Re-Localization

    Authors: Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu, Eric Brachmann

    Abstract: Pose regression networks predict the camera pose of a query image relative to a known environment. Within this family of methods, absolute pose regression (APR) has recently shown promising accuracy in the range of a few centimeters in position error. APR networks encode the scene geometry implicitly in their weights. To achieve high accuracy, they require vast amounts of training data that, reali… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024, Highlight Paper

  7. arXiv:2404.06337  [pdf, other

    cs.CV

    Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

    Authors: Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu, Eric Brachmann

    Abstract: Given two images, we can estimate the relative camera pose between them by establishing image-to-image correspondences. Usually, correspondences are 2D-to-2D and the pose we estimate is defined only up to scale. Some applications, aiming at instant augmented reality anywhere, require scale-metric pose estimates, and hence, they rely on external depth estimators to recover the scale. We present Mic… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  8. arXiv:2403.08733  [pdf, other

    cs.CV

    GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

    Authors: Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu

    Abstract: We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editin… ▽ More

    Submitted 14 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: ECCV2024, Project Website: https://gaussctrl.active.vision/

  9. arXiv:2402.10728  [pdf, other

    eess.IV cs.CV

    Semi-weakly-supervised neural network training for medical image registration

    Authors: Yiwen Li, Yunguan Fu, Iani J. M. B. Gayo, Qianye Yang, Zhe Min, Shaheer U. Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, Matthew J. Clarkson, Dean C. Barratt, Victor A. Prisacariu, Yipeng Hu

    Abstract: For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialis… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  10. arXiv:2310.07449  [pdf, other

    cs.CV

    PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction

    Authors: Jia-Wang Bian, Wenjing Bian, Victor Adrian Prisacariu, Philip Torr

    Abstract: Neural surface reconstruction is sensitive to the camera pose noise, even if state-of-the-art pose estimators like COLMAP or ARKit are used. More importantly, existing Pose-NeRF joint optimisation methods have struggled to improve pose accuracy in challenging real-world scenarios. To overcome the challenges, we introduce the pose residual field (PoRF), a novel implicit representation that uses an… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024. Find the project page at https://porf.active.vision/

  11. arXiv:2309.04820  [pdf, other

    cs.CV cs.LG

    ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting

    Authors: Michael A. Hobley, Victor A. Prisacariu

    Abstract: Class-agnostic counting methods enumerate objects of an arbitrary class, providing tremendous utility in many fields. Prior works have limited usefulness as they require either a set of examples of the type to be counted or that the query image contains only a single type of object. A significant factor in these shortcomings is the lack of a dataset to properly address counting in settings with mo… ▽ More

    Submitted 12 July, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

  12. arXiv:2306.01596  [pdf, other

    cs.CV

    Two-View Geometry Scoring Without Correspondences

    Authors: Axel Barroso-Laguna, Eric Brachmann, Victor Adrian Prisacariu, Gabriel J. Brostow, Daniyar Turmukhambetov

    Abstract: Camera pose estimation for two-view geometry traditionally relies on RANSAC. Normally, a multitude of image correspondences leads to a pool of proposed hypotheses, which are then scored to find a winning model. The inlier count is generally regarded as a reliable indicator of "consensus". We examine this scoring heuristic, and find that it favors disappointing models under certain circumstances. A… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  13. arXiv:2305.18492  [pdf, other

    cs.LG cs.AI

    DMS: Differentiable Mean Shift for Dataset Agnostic Task Specific Clustering Using Side Information

    Authors: Michael A. Hobley, Victor A. Prisacariu

    Abstract: We present a novel approach, in which we learn to cluster data directly from side information, in the form of a small set of pairwise examples. Unlike previous methods, with or without side information, we do not need to know the number of clusters, their centers or any kind of distance metric for similarity. Our method is able to divide the same data points in various ways dependant on the needs… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  14. arXiv:2305.14059  [pdf, other

    cs.CV cs.LG

    Accelerated Coordinate Encoding: Learning to Relocalize in Minutes using RGB and Poses

    Authors: Eric Brachmann, Tommaso Cavallari, Victor Adrian Prisacariu

    Abstract: Learning-based visual relocalizers exhibit leading pose accuracy, but require hours or days of training. Since training needs to happen on each new scene again, long training times make learning-based relocalization impractical for most applications, despite its promise of high accuracy. In this paper we show how such a system can actually achieve the same accuracy in less than 5 minutes. We start… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: CVPR 2023 Highlight

  15. arXiv:2305.13312  [pdf, other

    cs.CV

    Contextualising Implicit Representations for Semantic Tasks

    Authors: Theo W. Costain, Kejie Li, Victor A. Prisacariu

    Abstract: Prior works have demonstrated that implicit representations trained only for reconstruction tasks typically generate encodings that are not useful for semantic tasks. In this work, we propose a method that contextualises the encodings of implicit representations, enabling their use in downstream tasks (e.g. semantic segmentation), without requiring access to the original training data or encoding… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  16. arXiv:2305.02385  [pdf, other

    cs.CV

    SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning

    Authors: Xinghui Li, Kai Han, Xingchen Wan, Victor Adrian Prisacariu

    Abstract: We propose SimSC, a remarkably simple framework, to address the problem of semantic matching only based on the feature backbone. We discover that when fine-tuning ImageNet pre-trained backbone on the semantic matching task, L2 normalization of the feature map, a standard procedure in feature matching, produces an overly smooth matching distribution and significantly hinders the fine-tuning process… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  17. arXiv:2303.10087  [pdf, other

    cs.CV

    Neural Refinement for Absolute Pose Regression with Feature Synthesis

    Authors: Shuai Chen, Yash Bhalgat, Xinghui Li, Jiawang Bian, Kejie Li, Zirui Wang, Victor Adrian Prisacariu

    Abstract: Absolute Pose Regression (APR) methods use deep neural networks to directly regress camera poses from RGB images. However, the predominant APR architectures only rely on 2D operations during inference, resulting in limited accuracy of pose estimation due to the lack of 3D geometry constraints or priors. In this work, we propose a test-time refinement pipeline that leverages implicit geometric cons… ▽ More

    Submitted 29 February, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: Paper Accepted by CVPR 2024. Project Page: http://nefes.active.vision. Code will be released at https://github.com/ActiveVisionLab/NeFeS

  18. arXiv:2303.01932  [pdf, other

    cs.CV cs.LG

    MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices

    Authors: Kejie Li, Jia-Wang Bian, Robert Castle, Philip H. S. Torr, Victor Adrian Prisacariu

    Abstract: High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation. However, it is difficult to create a replica of an object in reality, and even 3D reconstructions generated by 3D scanners have artefacts that cause biases in evaluation. To address this issue, we introduce a novel multi-view RGBD dataset captured using a mobile device, which includes highly precise 3D ground… ▽ More

    Submitted 9 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: To be appeared at CVPR 2023

  19. arXiv:2212.07388  [pdf, other

    cs.CV

    NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

    Authors: Wenjing Bian, Zirui Wang, Kejie Li, Jia-Wang Bian, Victor Adrian Prisacariu

    Abstract: Training a Neural Radiance Field (NeRF) without pre-computed camera poses is challenging. Recent advances in this direction demonstrate the possibility of jointly optimising a NeRF and camera poses in forward-facing scenes. However, these methods still face difficulties during dramatic camera movement. We tackle this challenging problem by incorporating undistorted monocular depth priors. These pr… ▽ More

    Submitted 14 April, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

  20. arXiv:2210.08951  [pdf, other

    cs.CV

    Approximating Continuous Convolutions for Deep Network Compression

    Authors: Theo W. Costain, Victor Adrian Prisacariu

    Abstract: We present ApproxConv, a novel method for compressing the layers of a convolutional neural network. Reframing conventional discrete convolution as continuous convolution of parametrised functions over space, we use functional approximations to capture the essential structures of CNN filters with fewer parameters than conventional operations. Our method is able to reduce the size of trained CNN lay… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  21. arXiv:2210.05494  [pdf, other

    cs.CV

    Map-free Visual Relocalization: Metric Pose Relative to a Single Image

    Authors: Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, Áron Monszpart, Victor Adrian Prisacariu, Daniyar Turmukhambetov, Eric Brachmann

    Abstract: Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization, i.e., using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: ECCV2022 camera-ready. 14 pages + 4 reference pages

  22. arXiv:2209.05160  [pdf, other

    eess.IV cs.CV

    Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registration

    Authors: Yiwen Li, Yunguan Fu, Iani Gayo, Qianye Yang, Zhe Min, Shaheer Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, Matthew J. Clarkson, Henkjan Huisman, Dean Barratt, Victor Adrian Prisacariu, Yipeng Hu

    Abstract: The prowess that makes few-shot learning desirable in medical image analysis is the efficient use of the support image data, which are labelled to classify or segment new classes, a task that otherwise requires substantially more training images and expert annotations. This work describes a fully 3D prototypical few-shot segmentation algorithm, such that the trained networks can be effectively ada… ▽ More

    Submitted 25 August, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: accepted by Medical Image Analysis

  23. arXiv:2204.01139  [pdf, other

    cs.CV

    BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion

    Authors: Kejie Li, Yansong Tang, Victor Adrian Prisacariu, Philip H. S. Torr

    Abstract: Dense 3D reconstruction from a stream of depth images is the key to many mixed reality and robotic applications. Although methods based on Truncated Signed Distance Function (TSDF) Fusion have advanced the field over the years, the TSDF volume representation is confronted with striking a balance between the robustness to noisy measurements and maintaining the level of detail. We present Bi-level N… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

    Comments: Accepted at CVPR 2022

  24. arXiv:2204.00559  [pdf, other

    cs.CV

    DFNet: Enhance Absolute Pose Regression with Direct Feature Matching

    Authors: Shuai Chen, Xinghui Li, Zirui Wang, Victor Adrian Prisacariu

    Abstract: We introduce a camera relocalization pipeline that combines absolute pose regression (APR) and direct feature matching. By incorporating exposure-adaptive novel view synthesis, our method successfully addresses photometric distortions in outdoor environments that existing photometric-based methods fail to handle. With domain-invariant feature matching, our solution improves pose regression accurac… ▽ More

    Submitted 20 July, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: ECCV 2022. Code released at https://github.com/ActiveVisionLab/DFNet

  25. arXiv:2201.06358  [pdf, other

    eess.IV cs.CV

    Few-shot image segmentation for cross-institution male pelvic organs using registration-assisted prototypical learning

    Authors: Yiwen Li, Yunguan Fu, Qianye Yang, Zhe Min, Wen Yan, Henkjan Huisman, Dean Barratt, Victor Adrian Prisacariu, Yipeng Hu

    Abstract: The ability to adapt medical image segmentation networks for a novel class such as an unseen anatomical or pathological structure, when only a few labelled examples of this class are available from local healthcare providers, is sought-after. This potentially addresses two widely recognised limitations in deploying modern deep learning models to clinical practice, expertise-and-labour-intensive la… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: To appear in the proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI) 2022

  26. arXiv:2110.11742  [pdf, other

    cs.CV cs.AI

    Few-shot Semantic Segmentation with Self-supervision from Pseudo-classes

    Authors: Yiwen Li, Gratianus Wesley Putra Data, Yunguan Fu, Yipeng Hu, Victor Adrian Prisacariu

    Abstract: Despite the success of deep learning methods for semantic segmentation, few-shot semantic segmentation remains a challenging task due to the limited training data and the generalisation requirement for unseen classes. While recent progress has been particularly encouraging, we discover that existing methods tend to have poor performance in terms of meanIoU when query images contain other semantic… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: To appear in the proceedings of the British Machine Vision Conference (BMVC) 2021

  27. arXiv:2107.01899  [pdf, other

    cs.CV

    Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

    Authors: Wenjing Bian, Zirui Wang, Kejie Li, Victor Adrian Prisacariu

    Abstract: We propose Ray-ONet to reconstruct detailed 3D models from monocular images efficiently. By predicting a series of occupancy probabilities along a ray that is back-projected from a pixel in the camera coordinate, our method Ray-ONet improves the reconstruction accuracy in comparison with Occupancy Networks (ONet), while reducing the network inference complexity to O($N^2$). As a result, Ray-ONet a… ▽ More

    Submitted 22 October, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: accepted in BMVC 2021

  28. arXiv:2104.09169  [pdf, other

    cs.CV

    LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments

    Authors: Henry Howard-Jenkins, Jose-Raul Ruiz-Sarmiento, Victor Adrian Prisacariu

    Abstract: We present LaLaLoc to localise in environments without the need for prior visitation, and in a manner that is robust to large changes in scene appearance, such as a full rearrangement of furniture. Specifically, LaLaLoc performs localisation through latent representations of room layout. LaLaLoc learns a rich embedding space shared between RGB panoramas and layouts inferred from a known floor plan… ▽ More

    Submitted 12 October, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: As presented at the International Conference on Computer Vision (ICCV) 2021

  29. arXiv:2102.07064  [pdf, other

    cs.CV

    NeRF--: Neural Radiance Fields Without Known Camera Parameters

    Authors: Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, Victor Adrian Prisacariu

    Abstract: Considering the problem of novel view synthesis (NVS) from only a set of 2D images, we simplify the training process of Neural Radiance Field (NeRF) on forward-facing scenes by removing the requirement of known or pre-computed camera parameters, including both intrinsics and 6DoF poses. To this end, we propose NeRF$--$, with three contributions: First, we show that the camera parameters can be joi… ▽ More

    Submitted 6 April, 2022; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Project page see https://nerfmm.active.vision. Add a break point analysis experiment and release a BLEFF dataset

  30. arXiv:2101.12690  [pdf, other

    cs.CV

    Towards Generalising Neural Implicit Representations

    Authors: Theo W. Costain, Victor Adrian Prisacariu

    Abstract: Neural implicit representations have shown substantial improvements in efficiently storing 3D data, when compared to conventional formats. However, the focus of existing work has mainly been on storage and subsequent reconstruction. In this work, we show that training neural representations for reconstruction tasks alongside conventional tasks can produce more general encodings that admit equal qu… ▽ More

    Submitted 17 October, 2022; v1 submitted 29 January, 2021; originally announced January 2021.

    Comments: ECCVW 2022

  31. arXiv:2008.09965  [pdf, other

    cs.CV

    Neighbourhood-Insensitive Point Cloud Normal Estimation Network

    Authors: Zirui Wang, Victor Adrian Prisacariu

    Abstract: We introduce a novel self-attention-based normal estimation network that is able to focus softly on relevant points and adjust the softness by learning a temperature parameter, making it able to work naturally and effectively within a large neighbourhood range. As a result, our model outperforms all existing normal estimation algorithms by a large margin, achieving 94.1% accuracy in comparison wit… ▽ More

    Submitted 15 January, 2021; v1 submitted 23 August, 2020; originally announced August 2020.

    Comments: Accepted in BMVC 2020 as oral presentation. Code available at https://code.active.vision and project page at http://ninormal.active.vision

  32. arXiv:2007.07743  [pdf, other

    cs.CV

    Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian Processes

    Authors: Marcelo Gennari do Nascimento, Theo W. Costain, Victor Adrian Prisacariu

    Abstract: We propose a novel method for neural network quantization that casts the neural architecture search problem as one of hyperparameter search to find non-uniform bit distributions throughout the layers of a CNN. We perform the search assuming a Multi-Task Gaussian Processes prior, which splits the problem to multiple tasks, each corresponding to different number of training epochs, and explore the s… ▽ More

    Submitted 20 July, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

    Comments: Accepted for publication at ECCV 2020. Code availiable at https://code.active.vision . Updated for typo

  33. arXiv:2006.08844  [pdf, other

    cs.CV

    Dual-Resolution Correspondence Networks

    Authors: Xinghui Li, Kai Han, Shuda Li, Victor Adrian Prisacariu

    Abstract: We tackle the problem of establishing dense pixel-wise correspondences between a pair of images. In this work, we introduce Dual-Resolution Correspondence Networks (DualRC-Net), to obtain pixel-wise correspondences in a coarse-to-fine manner. DualRC-Net extracts both coarse- and fine- resolution feature maps. The coarse maps are used to produce a full but coarse 4D correlation tensor, which is the… ▽ More

    Submitted 28 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020, code at https://dualrcnet.active.vision/

  34. arXiv:1912.01438  [pdf, other

    cs.CV cs.RO

    FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation

    Authors: Zirui Wang, Shuda Li, Henry Howard-Jenkins, Victor Adrian Prisacariu, Min Chen

    Abstract: We present FlowNet3D++, a deep scene flow estimation network. Inspired by classical methods, FlowNet3D++ incorporates geometric constraints in the form of point-to-plane distance and angular alignment between individual vectors in the flow field, into FlowNet3D. We demonstrate that the addition of these geometric loss terms improves the previous state-of-art FlowNet3D accuracy from 57.85% to 63.43… ▽ More

    Submitted 26 April, 2021; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: WACV 2020

  35. arXiv:1912.00673  [pdf, other

    cs.LG stat.ML

    GroSS: Group-Size Series Decomposition for Grouped Architecture Search

    Authors: Henry Howard-Jenkins, Yiwen Li, Victor A. Prisacariu

    Abstract: We present a novel approach which is able to explore the configuration of grouped convolutions within neural networks. Group-size Series (GroSS) decomposition is a mathematical formulation of tensor factorisation into a series of approximations of increasing rank terms. GroSS allows for dynamic and differentiable selection of factorisation rank, which is analogous to a grouped convolution. Therefo… ▽ More

    Submitted 16 July, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: Accepted for publication at ECCV 2020

  36. arXiv:1901.01928  [pdf, other

    cs.CV

    DSConv: Efficient Convolution Operator

    Authors: Marcelo Gennari, Roger Fawcett, Victor Adrian Prisacariu

    Abstract: Quantization is a popular way of increasing the speed and lowering the memory usage of Convolution Neural Networks (CNNs). When labelled training data is available, network weights and activations have successfully been quantized down to 1-bit. The same cannot be said about the scenario when labelled training data is not available, e.g. when quantizing a pre-trained model, where current approaches… ▽ More

    Submitted 7 November, 2019; v1 submitted 7 January, 2019; originally announced January 2019.

    Journal ref: The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 5148-5157

  37. Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade

    Authors: Tommaso Cavallari, Stuart Golodetz, Nicholas A. Lord, Julien Valentin, Victor A. Prisacariu, Luigi Di Stefano, Philip H. S. Torr

    Abstract: Camera pose estimation is an important problem in computer vision. Common techniques either match the current image against keyframes with known poses, directly regress the pose, or establish correspondences between keypoints in the image and points in the scene to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achie… ▽ More

    Submitted 2 July, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin assert joint first authorship

    MSC Class: 68T45

  38. Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation

    Authors: Stuart Golodetz, Tommaso Cavallari, Nicholas A Lord, Victor A Prisacariu, David W Murray, Philip H S Torr

    Abstract: Reconstructing dense, volumetric models of real-world 3D scenes is important for many tasks, but capturing large scenes can take significant time, and the risk of transient changes to the scene goes up as the capture time increases. These are good reasons to want instead to capture several smaller sub-scenes that can be joined to make the whole scene. Achieving this has traditionally been difficul… ▽ More

    Submitted 2 July, 2019; v1 submitted 25 January, 2018; originally announced January 2018.

    Comments: Stuart Golodetz, Tommaso Cavallari and Nicholas Lord assert joint first authorship

    MSC Class: 68T45

    Journal ref: IEEE Transactions on Visualization and Computer Graphics 24(11):2895-2905, 2018

  39. arXiv:1708.00783  [pdf, other

    cs.CV

    InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure

    Authors: Victor Adrian Prisacariu, Olaf Kähler, Stuart Golodetz, Michael Sapienza, Tommaso Cavallari, Philip H S Torr, David W Murray

    Abstract: Volumetric models have become a popular representation for 3D scenes in recent years. One breakthrough leading to their popularity was KinectFusion, which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a TSDF leads to most of the simplicity and efficiency that can… ▽ More

    Submitted 2 August, 2017; originally announced August 2017.

    Comments: This article largely supersedes arxiv:1410.0925 (it describes version 3 of the InfiniTAM framework)

  40. SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes

    Authors: Stuart Golodetz, Michael Sapienza, Julien P. C. Valentin, Vibhav Vineet, Ming-Ming Cheng, Anurag Arnab, Victor A. Prisacariu, Olaf Kähler, Carl Yuheng Ren, David W. Murray, Shahram Izadi, Philip H. S. Torr

    Abstract: We present an open-source, real-time implementation of SemanticPaint, a system for geometric reconstruction, object-class segmentation and learning of 3D scenes. Using our system, a user can walk into a room wearing a depth camera and a virtual reality headset, and both densely reconstruct the 3D scene and interactively segment the environment into object classes such as 'chair', 'floor' and 'tabl… ▽ More

    Submitted 13 October, 2015; originally announced October 2015.

    Comments: 33 pages, Project: http://www.semantic-paint.com, Code: https://github.com/torrvision/spaint

    ACM Class: I.2.10

  41. arXiv:1509.04232  [pdf, other

    cs.CV

    gSLICr: SLIC superpixels at over 250Hz

    Authors: Carl Yuheng Ren, Victor Adrian Prisacariu, Ian D Reid

    Abstract: We introduce a parallel GPU implementation of the Simple Linear Iterative Clustering (SLIC) superpixel segmentation. Using a single graphic card, our implementation achieves speedups of up to $83\times$ from the standard sequential implementation. Our implementation is fully compatible with the standard sequential implementation and the software is now available online and is open source.

    Submitted 14 September, 2015; originally announced September 2015.

  42. arXiv:1410.0925  [pdf, other

    cs.CV

    A Framework for the Volumetric Integration of Depth Images

    Authors: Victor Adrian Prisacariu, Olaf Kähler, Ming Ming Cheng, Carl Yuheng Ren, Julien Valentin, Philip H. S. Torr, Ian D. Reid, David W. Murray

    Abstract: Volumetric models have become a popular representation for 3D scenes in recent years. One of the breakthroughs leading to their popularity was KinectFusion, where the focus is on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a truncated signed distance function leads to most… ▽ More

    Submitted 23 October, 2014; v1 submitted 3 October, 2014; originally announced October 2014.

    Comments: 17 pages, 8 figures