-
Zero-shot detection of buildings in mobile LiDAR using Language Vision Model
Authors:
June Moh Goo,
Zichao Zeng,
Jan Boehm
Abstract:
Recent advances have demonstrated that Language Vision Models (LVMs) surpass the existing State-of-the-Art (SOTA) in two-dimensional (2D) computer vision tasks, motivating attempts to apply LVMs to three-dimensional (3D) data. While LVMs are efficient and effective in addressing various downstream 2D vision tasks without training, they face significant challenges when it comes to point clouds, a r…
▽ More
Recent advances have demonstrated that Language Vision Models (LVMs) surpass the existing State-of-the-Art (SOTA) in two-dimensional (2D) computer vision tasks, motivating attempts to apply LVMs to three-dimensional (3D) data. While LVMs are efficient and effective in addressing various downstream 2D vision tasks without training, they face significant challenges when it comes to point clouds, a representative format for representing 3D data. It is more difficult to extract features from 3D data and there are challenges due to large data sizes and the cost of the collection and labelling, resulting in a notably limited availability of datasets. Moreover, constructing LVMs for point clouds is even more challenging due to the requirements for large amounts of data and training time. To address these issues, our research aims to 1) apply the Grounded SAM through Spherical Projection to transfer 3D to 2D, and 2) experiment with synthetic data to evaluate its effectiveness in bridging the gap between synthetic and real-world data domains. Our approach exhibited high performance with an accuracy of 0.96, an IoU of 0.85, precision of 0.92, recall of 0.91, and an F1 score of 0.92, confirming its potential. However, challenges such as occlusion problems and pixel-level overlaps of multi-label points during spherical image generation remain to be addressed in future studies.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Zero-shot Building Age Classification from Facade Image Using GPT-4
Authors:
Zichao Zeng,
June Moh Goo,
Xinglei Wang,
Bin Chi,
Meihui Wang,
Jan Boehm
Abstract:
A building's age of construction is crucial for supporting many geospatial applications. Much current research focuses on estimating building age from facade images using deep learning. However, building an accurate deep learning model requires a considerable amount of labelled training data, and the trained models often have geographical constraints. Recently, large pre-trained vision language mo…
▽ More
A building's age of construction is crucial for supporting many geospatial applications. Much current research focuses on estimating building age from facade images using deep learning. However, building an accurate deep learning model requires a considerable amount of labelled training data, and the trained models often have geographical constraints. Recently, large pre-trained vision language models (VLMs) such as GPT-4 Vision, which demonstrate significant generalisation capabilities, have emerged as potential training-free tools for dealing with specific vision tasks, but their applicability and reliability for building information remain unexplored. In this study, a zero-shot building age classifier for facade images is developed using prompts that include logical instructions. Taking London as a test case, we introduce a new dataset, FI-London, comprising facade images and building age epochs. Although the training-free classifier achieved a modest accuracy of 39.69%, the mean absolute error of 0.85 decades indicates that the model can predict building age epochs successfully albeit with a small bias. The ensuing discussion reveals that the classifier struggles to predict the age of very old buildings and is challenged by fine-grained predictions within 2 decades. Overall, the classifier utilising GPT-4 Vision is capable of predicting the rough age epoch of a building from a single facade image without any training.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Self-supervised Visualisation of Medical Image Datasets
Authors:
Ifeoma Veronica Nwabufo,
Jan Niklas Böhm,
Philipp Berens,
Dmitry Kobak
Abstract:
Self-supervised learning methods based on data augmentations, such as SimCLR, BYOL, or DINO, allow obtaining semantically meaningful representations of image datasets and are widely used prior to supervised fine-tuning. A recent self-supervised learning method, $t$-SimCNE, uses contrastive learning to directly train a 2D representation suitable for visualisation. When applied to natural image data…
▽ More
Self-supervised learning methods based on data augmentations, such as SimCLR, BYOL, or DINO, allow obtaining semantically meaningful representations of image datasets and are widely used prior to supervised fine-tuning. A recent self-supervised learning method, $t$-SimCNE, uses contrastive learning to directly train a 2D representation suitable for visualisation. When applied to natural image datasets, $t$-SimCNE yields 2D visualisations with semantically meaningful clusters. In this work, we used $t$-SimCNE to visualise medical image datasets, including examples from dermatology, histology, and blood microscopy. We found that increasing the set of data augmentations to include arbitrary rotations improved the results in terms of class separability, compared to data augmentations used for natural images. Our 2D representations show medically relevant structures and can be used to aid data exploration and annotation, improving on common approaches for data visualisation.
△ Less
Submitted 24 July, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Evaluating end-to-end entity linking on domain-specific knowledge bases: Learning about ancient technologies from museum collections
Authors:
Sebastian Cadavid-Sanchez,
Khalil Kacem,
Rafael Aparecido Martins Frade,
Johannes Boehm,
Thomas Chaney,
Danial Lashkari,
Daniel Simig
Abstract:
To study social, economic, and historical questions, researchers in the social sciences and humanities have started to use increasingly large unstructured textual datasets. While recent advances in NLP provide many tools to efficiently process such data, most existing approaches rely on generic solutions whose performance and suitability for domain-specific tasks is not well understood. This work…
▽ More
To study social, economic, and historical questions, researchers in the social sciences and humanities have started to use increasingly large unstructured textual datasets. While recent advances in NLP provide many tools to efficiently process such data, most existing approaches rely on generic solutions whose performance and suitability for domain-specific tasks is not well understood. This work presents an attempt to bridge this domain gap by exploring the use of modern Entity Linking approaches for the enrichment of museum collection data. We collect a dataset comprising of more than 1700 texts annotated with 7,510 mention-entity pairs, evaluate some off-the-shelf solutions in detail using this dataset and finally fine-tune a recent end-to-end EL model on this data. We show that our fine-tuned model significantly outperforms other approaches currently available in this domain and present a proof-of-concept use case of this model. We release our dataset and our best model.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Unsupervised visualization of image datasets using contrastive learning
Authors:
Jan Niklas Böhm,
Philipp Berens,
Dmitry Kobak
Abstract:
Visualization methods based on the nearest neighbor graph, such as t-SNE or UMAP, are widely used for visualizing high-dimensional data. Yet, these approaches only produce meaningful results if the nearest neighbors themselves are meaningful. For images represented in pixel space this is not the case, as distances in pixel space are often not capturing our sense of similarity and therefore neighbo…
▽ More
Visualization methods based on the nearest neighbor graph, such as t-SNE or UMAP, are widely used for visualizing high-dimensional data. Yet, these approaches only produce meaningful results if the nearest neighbors themselves are meaningful. For images represented in pixel space this is not the case, as distances in pixel space are often not capturing our sense of similarity and therefore neighbors are not semantically close. This problem can be circumvented by self-supervised approaches based on contrastive learning, such as SimCLR, relying on data augmentation to generate implicit neighbors, but these methods do not produce two-dimensional embeddings suitable for visualization. Here, we present a new method, called t-SimCNE, for unsupervised visualization of image data. T-SimCNE combines ideas from contrastive learning and neighbor embeddings, and trains a parametric mapping from the high-dimensional pixel space into two dimensions. We show that the resulting 2D embeddings achieve classification accuracy comparable to the state-of-the-art high-dimensional SimCLR representations, thus faithfully capturing semantic relationships. Using t-SimCNE, we obtain informative visualizations of the CIFAR-10 and CIFAR-100 datasets, showing rich cluster structure and highlighting artifacts and outliers.
△ Less
Submitted 28 February, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
From $t$-SNE to UMAP with contrastive learning
Authors:
Sebastian Damrich,
Jan Niklas Böhm,
Fred A. Hamprecht,
Dmitry Kobak
Abstract:
Neighbor embedding methods $t$-SNE and UMAP are the de facto standard for visualizing high-dimensional datasets. Motivated from entirely different viewpoints, their loss functions appear to be unrelated. In practice, they yield strongly differing embeddings and can suggest conflicting interpretations of the same data. The fundamental reasons for this and, more generally, the exact relationship bet…
▽ More
Neighbor embedding methods $t$-SNE and UMAP are the de facto standard for visualizing high-dimensional datasets. Motivated from entirely different viewpoints, their loss functions appear to be unrelated. In practice, they yield strongly differing embeddings and can suggest conflicting interpretations of the same data. The fundamental reasons for this and, more generally, the exact relationship between $t$-SNE and UMAP have remained unclear. In this work, we uncover their conceptual connection via a new insight into contrastive learning methods. Noise-contrastive estimation can be used to optimize $t$-SNE, while UMAP relies on negative sampling, another contrastive method. We find the precise relationship between these two contrastive methods and provide a mathematical characterization of the distortion introduced by negative sampling. Visually, this distortion results in UMAP generating more compact embeddings with tighter clusters compared to $t$-SNE. We exploit this new conceptual connection to propose and implement a generalization of negative sampling, allowing us to interpolate between (and even extrapolate beyond) $t$-SNE and UMAP and their respective embeddings. Moving along this spectrum of embeddings leads to a trade-off between discrete / local and continuous / global structures, mitigating the risk of over-interpreting ostensible features of any single embedding. We provide a PyTorch implementation.
△ Less
Submitted 28 February, 2023; v1 submitted 3 June, 2022;
originally announced June 2022.
-
Curiosity-driven 3D Object Detection Without Labels
Authors:
David Griffiths,
Jan Boehm,
Tobias Ritschel
Abstract:
In this paper we set out to solve the task of 6-DOF 3D object detection from 2D images, where the only supervision is a geometric representation of the objects we aim to find. In doing so, we remove the need for 6-DOF labels (i.e., position, orientation etc.), allowing our network to be trained on unlabeled images in a self-supervised manner. We achieve this through a neural network which learns a…
▽ More
In this paper we set out to solve the task of 6-DOF 3D object detection from 2D images, where the only supervision is a geometric representation of the objects we aim to find. In doing so, we remove the need for 6-DOF labels (i.e., position, orientation etc.), allowing our network to be trained on unlabeled images in a self-supervised manner. We achieve this through a neural network which learns an explicit scene parameterization which is subsequently passed into a differentiable renderer. We analyze why analysis-by-synthesis-like losses for supervision of 3D scene structure using differentiable rendering is not practical, as it almost always gets stuck in local minima of visual ambiguities. This can be overcome by a novel form of training, where an additional network is employed to steer the optimization itself to explore the entire parameter space i.e., to be curious, and hence, to resolve those ambiguities and find workable minima.
△ Less
Submitted 15 October, 2021; v1 submitted 2 December, 2020;
originally announced December 2020.
-
Attraction-Repulsion Spectrum in Neighbor Embeddings
Authors:
Jan Niklas Böhm,
Philipp Berens,
Dmitry Kobak
Abstract:
Neighbor embeddings are a family of methods for visualizing complex high-dimensional datasets using $k$NN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we empirically show that changing the balance between th…
▽ More
Neighbor embeddings are a family of methods for visualizing complex high-dimensional datasets using $k$NN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we empirically show that changing the balance between the attractive and the repulsive forces in t-SNE using the exaggeration parameter yields a spectrum of embeddings, which is characterized by a simple trade-off: stronger attraction can better represent continuous manifold structures, while stronger repulsion can better represent discrete cluster structures and yields higher $k$NN recall. We find that UMAP embeddings correspond to t-SNE with increased attraction; mathematical analysis shows that this is because the negative sampling optimisation strategy employed by UMAP strongly lowers the effective repulsion. Likewise, ForceAtlas2, commonly used for visualizing developmental single-cell transcriptomic data, yields embeddings corresponding to t-SNE with the attraction increased even more. At the extreme of this spectrum lie Laplacian Eigenmaps. Our results demonstrate that many prominent neighbor embedding algorithms can be placed onto the attraction-repulsion spectrum, and highlight the inherent trade-offs between them.
△ Less
Submitted 18 October, 2022; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Finding Your (3D) Center: 3D Object Detection Using a Learned Loss
Authors:
David Griffiths,
Jan Boehm,
Tobias Ritschel
Abstract:
Massive semantically labeled datasets are readily available for 2D images, however, are much harder to achieve for 3D scenes. Objects in 3D repositories like ShapeNet are labeled, but regrettably only in isolation, so without context. 3D scenes can be acquired by range scanners on city-level scale, but much fewer with semantic labels. Addressing this disparity, we introduce a new optimization proc…
▽ More
Massive semantically labeled datasets are readily available for 2D images, however, are much harder to achieve for 3D scenes. Objects in 3D repositories like ShapeNet are labeled, but regrettably only in isolation, so without context. 3D scenes can be acquired by range scanners on city-level scale, but much fewer with semantic labels. Addressing this disparity, we introduce a new optimization procedure, which allows training for 3D detection with raw 3D scans while using as little as 5% of the object labels and still achieve comparable performance. Our optimization uses two networks. A scene network maps an entire 3D scene to a set of 3D object centers. As we assume the scene not to be labeled by centers, no classic loss, such as Chamfer can be used to train it. Instead, we use another network to emulate the loss. This loss network is trained on a small labeled subset and maps a non centered 3D object in the presence of distractions to its own center. This function is very similar - and hence can be used instead of - the gradient the supervised loss would provide. Our evaluation documents competitive fidelity at a much lower level of supervision, respectively higher quality at comparable supervision. Supplementary material can be found at: https://dgriffiths3.github.io.
△ Less
Submitted 22 July, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Parallel Computation of tropical varieties, their positive part, and tropical Grassmannians
Authors:
Dominik Bendle,
Janko Boehm,
Yue Ren,
Benjamin Schröter
Abstract:
In this article, we present a massively parallel framework for computing tropicalizations of algebraic varieties which can make use of finite symmetries. We compute the tropical Grassmannian TGr$_0(3,8)$, and show that it refines the $15$-dimensional skeleton of the Dressian Dr$(3,8)$ with the exception of $23$ special cones for which we construct explicit obstructions to the realizability of thei…
▽ More
In this article, we present a massively parallel framework for computing tropicalizations of algebraic varieties which can make use of finite symmetries. We compute the tropical Grassmannian TGr$_0(3,8)$, and show that it refines the $15$-dimensional skeleton of the Dressian Dr$(3,8)$ with the exception of $23$ special cones for which we construct explicit obstructions to the realizability of their tropical linear spaces. Moreover, we propose algorithms for identifying maximal-dimensional tropical cones which belong to the positive tropicalization. These algorithms exploit symmetries of the tropical variety even though the positive tropicalization need not be symmetric. We compute the maximal-dimensional cones of the positive Grassmannian TGr$^+(3,8)$ and compare them to the cluster complex of the classical Grassmannian Gr$(3,8)$.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Random growth on a Ramanujan graph
Authors:
Janko Boehm,
Michael Joswig,
Lars Kastner,
Andrew Newman
Abstract:
The behavior of a certain random growth process is analyzed on arbitrary regular and non-regular graphs. Our argument is based on the Expander Mixing Lemma, which entails that the results are strongest for Ramanujan graphs, which asymptotically maximize the spectral gap. Further, we consider Erdős--Rényi random graphs and compare our theoretical results with computational experiments on flip graph…
▽ More
The behavior of a certain random growth process is analyzed on arbitrary regular and non-regular graphs. Our argument is based on the Expander Mixing Lemma, which entails that the results are strongest for Ramanujan graphs, which asymptotically maximize the spectral gap. Further, we consider Erdős--Rényi random graphs and compare our theoretical results with computational experiments on flip graphs of point configurations. The latter is relevant for enumerating triangulations.
△ Less
Submitted 22 October, 2019; v1 submitted 26 August, 2019;
originally announced August 2019.
-
SynthCity: A large scale synthetic point cloud
Authors:
David Griffiths,
Jan Boehm
Abstract:
With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data t…
▽ More
With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist which, if proved effective can be exploited. We therefore argue that research in this domain would be of significant use. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is assigned a label from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
A review on deep learning techniques for 3D sensed data classification
Authors:
David Griffiths,
Jan Boehm
Abstract:
Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically und…
▽ More
Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches including; RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Weighted Point Cloud Augmentation for Neural Network Training Data Class-Imbalance
Authors:
David Griffiths,
Jan Boehm
Abstract:
Recent developments in the field of deep learning for 3D data have demonstrated promising potential for end-to-end learning directly from point clouds. However, many real-world point clouds contain a large class im-balance due to the natural class im-balance observed in nature. For example, a 3D scan of an urban environment will consist mostly of road and facade, whereas other objects such as pole…
▽ More
Recent developments in the field of deep learning for 3D data have demonstrated promising potential for end-to-end learning directly from point clouds. However, many real-world point clouds contain a large class im-balance due to the natural class im-balance observed in nature. For example, a 3D scan of an urban environment will consist mostly of road and facade, whereas other objects such as poles will be under-represented. In this paper we address this issue by employing a weighted augmentation to increase classes that contain fewer points. By mitigating the class im-balance present in the data we demonstrate that a standard PointNet++ deep neural network can achieve higher performance at inference on validation data. This was observed as an increase of F1 score of 19% and 25% on two test benchmark datasets; ScanNet and Semantic3D respectively where no class im-balance pre-processing had been performed. Our networks performed better on both highly-represented and under-represented classes, which indicates that the network is learning more robust and meaningful features when the loss function is not overly exposed to only a few classes.
△ Less
Submitted 9 April, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
Bad Primes in Computational Algebraic Geometry
Authors:
Janko Boehm,
Wolfram Decker,
Claus Fieker,
Santiago Laplagne,
Gerhard Pfister
Abstract:
Computations over the rational numbers often suffer from intermediate coefficient swell. One solution to this problem is to apply the given algorithm modulo a number of primes and then lift the modular results to the rationals. This method is guaranteed to work if we use a sufficiently large set of good primes. In many applications, however, there is no efficient way of excluding bad primes. In th…
▽ More
Computations over the rational numbers often suffer from intermediate coefficient swell. One solution to this problem is to apply the given algorithm modulo a number of primes and then lift the modular results to the rationals. This method is guaranteed to work if we use a sufficiently large set of good primes. In many applications, however, there is no efficient way of excluding bad primes. In this note, we describe a technique for rational reconstruction which will nevertheless return the correct result, provided the number of good primes in the selected set of primes is large enough. We give a number of illustrating examples which are implemented using the computer algebra system Singular and the programming language Julia. We discuss applications of our technique in computational algebraic geometry.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.