-
NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Authors:
Varun Jampani,
Kevis-Kokitsi Maninis,
Andreas Engelhardt,
Arjun Karpur,
Karen Truong,
Kyle Sargent,
Stefan Popov,
André Araujo,
Ricardo Martin-Brualla,
Kaushal Patel,
Daniel Vlasic,
Vittorio Ferrari,
Ameesh Makadia,
Ce Liu,
Yuanzhen Li,
Howard Zhou
Abstract:
Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search…
▽ More
Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose NAVI: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation. Project page: https://navidataset.github.io
△ Less
Submitted 13 October, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Estimating Generic 3D Room Structures from 2D Annotations
Authors:
Denys Rozumnyi,
Stefan Popov,
Kevis-Kokitsi Maninis,
Matthias Nießner,
Vittorio Ferrari
Abstract:
Indoor rooms are among the most common use cases in 3D scene understanding. Current state-of-the-art methods for this task are driven by large annotated datasets. Room layouts are especially important, consisting of structural elements in 3D, such as wall, floor, and ceiling. However, they are difficult to annotate, especially on pure RGB video. We propose a novel method to produce generic 3D room…
▽ More
Indoor rooms are among the most common use cases in 3D scene understanding. Current state-of-the-art methods for this task are driven by large annotated datasets. Room layouts are especially important, consisting of structural elements in 3D, such as wall, floor, and ceiling. However, they are difficult to annotate, especially on pure RGB video. We propose a novel method to produce generic 3D room layouts just from 2D segmentation masks, which are easy to annotate for humans. Based on these 2D annotations, we automatically reconstruct 3D plane equations for the structural elements and their spatial extent in the scene, and connect adjacent elements at the appropriate contact edges. We annotate and publicly release 2246 3D room layouts on the RealEstate10k dataset, containing YouTube videos. We demonstrate the high quality of these 3D layouts annotations with extensive experiments.
△ Less
Submitted 21 December, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
Authors:
Kevis-Kokitsi Maninis,
Stefan Popov,
Matthias Nießner,
Vittorio Ferrari
Abstract:
We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects. We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation. Our method is semi-automatic and works on commonly-available RGB videos, without requiring a depth sensor. Many steps are…
▽ More
We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects. We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation. Our method is semi-automatic and works on commonly-available RGB videos, without requiring a depth sensor. Many steps are performed automatically, and the tasks performed by humans are simple, well-specified, and require only limited reasoning in 3D. This makes them feasible for crowd-sourcing and has allowed us to construct a large-scale dataset by annotating real-estate videos from YouTube. Our dataset CAD-Estate offers 101k instances of 12k unique CAD models placed in the 3D representations of 20k videos. In comparison to Scan2CAD, the largest existing dataset with CAD model annotations on real scenes, CAD-Estate has 7x more instances and 4x more unique CAD models. We showcase the benefits of pre-training a Mask2CAD model on CAD-Estate for the task of automatic 3D object reconstruction and pose estimation, demonstrating that it leads to performance improvements on the popular Scan2CAD benchmark. The dataset is available at https://github.com/google-research/cad-estate.
△ Less
Submitted 14 August, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Symbolic expression generation via Variational Auto-Encoder
Authors:
Sergei Popov,
Mikhail Lazarev,
Vladislav Belavin,
Denis Derkach,
Andrey Ustyuzhanin
Abstract:
There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. A widespread Deep Neural Networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for t…
▽ More
There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. A widespread Deep Neural Networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for the symbolic regression task, and we aim to reduce this gap with our algorithm. In this work, we propose a novel deep learning framework for symbolic expression generation via variational autoencoder (VAE). In a nutshell, we suggest using a VAE to generate mathematical expressions, and our training strategy forces generated formulas to fit a given dataset. Our framework allows encoding apriori knowledge of the formulas into fast-check predicates that speed up the optimization process. We compare our method to modern symbolic regression benchmarks and show that our method outperforms the competitors under noisy conditions. The recovery rate of SEGVAE is 65% on the Ngyuen dataset with a noise level of 10%, which is better than the previously reported SOTA by 20%. We demonstrate that this value depends on the dataset and can be even higher.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers
Authors:
Michał J. Tyszkiewicz,
Kevis-Kokitsi Maninis,
Stefan Popov,
Vittorio Ferrari
Abstract:
We propose a transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos. It relies on two alternative ways to represent its knowledge: as a global 3D grid of features and an array of view-specific 2D grids. We progressively exchange information between the two with a dedicated bidirectional attention mechanism. We exploit knowledge about the image formation pr…
▽ More
We propose a transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos. It relies on two alternative ways to represent its knowledge: as a global 3D grid of features and an array of view-specific 2D grids. We progressively exchange information between the two with a dedicated bidirectional attention mechanism. We exploit knowledge about the image formation process to significantly sparsify the attention weight matrix, making our architecture feasible on current hardware, both in terms of memory and computation. We attach a DETR-style head on top of the 3D feature grid in order to detect the objects in the scene and to predict their 3D pose and 3D shape. Compared to previous methods, our architecture is single stage, end-to-end trainable, and it can reason holistically about a scene from multiple video frames without needing a brittle tracking step. We evaluate our method on the challenging Scan2CAD dataset, where we outperform (1) recent state-of-the-art methods for 3D object pose estimation from RGB videos; and (2) a strong alternative method combining Multi-view Stereo with RGB-D CAD alignment. We plan to release our source code.
△ Less
Submitted 26 August, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Voting-based probabilistic consensuses and their applications in distributed ledgers
Authors:
Serguei Popov,
Sebastian Müller
Abstract:
We review probabilistic models known as majority dynamics (also known as threshold Voter Models) and discuss their possible applications for achieving consensus in cryptocurrency systems. In particular, we show that using this approach straightforwardly for practical consensus in Byzantine setting can be problematic and requires extensive further research. We then discuss the FPC consensus protoco…
▽ More
We review probabilistic models known as majority dynamics (also known as threshold Voter Models) and discuss their possible applications for achieving consensus in cryptocurrency systems. In particular, we show that using this approach straightforwardly for practical consensus in Byzantine setting can be problematic and requires extensive further research. We then discuss the FPC consensus protocol which circumvents the problems mentioned above by using external randomness.
△ Less
Submitted 6 August, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos
Authors:
Kevis-Kokitsi Maninis,
Stefan Popov,
Matthias Nießner,
Vittorio Ferrari
Abstract:
We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects. Our method can process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame. The core idea of our method is to integrate neural network predictions from individual frames with a temporally globa…
▽ More
We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects. Our method can process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame. The core idea of our method is to integrate neural network predictions from individual frames with a temporally global, multi-view constraint optimization formulation. This integration process resolves the scale and depth ambiguities in the per-frame predictions, and generally improves the estimate of all pose parameters. By leveraging multi-view constraints, our method also resolves occlusions and handles objects that are out of view in individual frames, thus reconstructing all objects into a single globally consistent CAD representation of the scene. In comparison to the state-of-the-art single-frame method Mask2CAD that we build on, we achieve substantial improvements on the Scan2CAD dataset (from 11.6% to 30.7% class average accuracy).
△ Less
Submitted 25 January, 2022; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Embedding Words in Non-Vector Space with Unsupervised Graph Learning
Authors:
Max Ryabinin,
Sergei Popov,
Liudmila Prokhorenkova,
Elena Voita
Abstract:
It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our sett…
▽ More
It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our setting, each word is a node in a weighted graph and the distance between words is the shortest path distance between the corresponding nodes. We adopt a recent method learning a representation of data in the form of a differentiable weighted graph and use it to modify the GloVe training algorithm. We show that our graph-based representations substantially outperform vector-based methods on word similarity and analogy tasks. Our analysis reveals that the structure of the learned graphs is hierarchical and similar to that of WordNet, the geometry is highly non-trivial and contains subgraphs with different local topology.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Efficient Full Image Interactive Segmentation by Leveraging Within-image Appearance Similarity
Authors:
Mykhaylo Andriluka,
Stefano Pellegrini,
Stefan Popov,
Vittorio Ferrari
Abstract:
We propose a new approach to interactive full-image semantic segmentation which enables quickly collecting training data for new datasets with previously unseen semantic classes (A demo is available at https://youtu.be/yUk8D5gEX-o). We leverage a key observation: propagation from labeled to unlabeled pixels does not necessarily require class-specific knowledge, but can be done purely based on appe…
▽ More
We propose a new approach to interactive full-image semantic segmentation which enables quickly collecting training data for new datasets with previously unseen semantic classes (A demo is available at https://youtu.be/yUk8D5gEX-o). We leverage a key observation: propagation from labeled to unlabeled pixels does not necessarily require class-specific knowledge, but can be done purely based on appearance similarity within an image. We build on this observation and propose an approach capable of jointly propagating pixel labels from multiple classes without having explicit class-specific appearance models. To enable long-range propagation, our approach first globally measures appearance similarity between labeled and unlabeled pixels across the entire image. Then it locally integrates per-pixel measurements which improves the accuracy at boundaries and removes noisy label switches in homogeneous regions. We also design an efficient manual annotation interface that extends the traditional polygon drawing tools with a suite of additional convenient features (and add automatic propagation to it). Experiments with human annotators on the COCO Panoptic Challenge dataset show that the combination of our better manual interface and our novel automatic propagation mechanism leads to reducing annotation time by more than factor of 2x compared to polygon drawing. We also test our method on the ADE-20k and Fashionista datasets without making any dataset-specific adaptation nor retraining our model, demonstrating that it can generalize to new datasets and visual classes.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
CoReNet: Coherent 3D scene reconstruction from a single RGB image
Authors:
Stefan Popov,
Pablo Bauszat,
Vittorio Ferrari
Abstract:
Advances in deep learning techniques have allowed recent work to reconstruct the shape of a single object given only one RBG image as input. Building on common encoder-decoder architectures for this task, we propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that…
▽ More
Advances in deep learning techniques have allowed recent work to reconstruct the shape of a single object given only one RBG image as input. Building on common encoder-decoder architectures for this task, we propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models, while at the same time encoding fine object details without an excessive memory footprint; (3) a reconstruction loss tailored to capture overall object geometry. Furthermore, we adapt our model to address the harder task of reconstructing multiple objects from a single image. We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space. We also handle occlusions and resolve them by hallucinating the missing object parts in the 3D volume. We validate the impact of our contributions experimentally both on synthetic data from ShapeNet as well as real images from Pix3D. Our method improves over the state-of-the-art single-object methods on both datasets. Finally, we evaluate performance quantitatively on multiple object reconstruction with synthetic scenes assembled from ShapeNet objects.
△ Less
Submitted 5 August, 2020; v1 submitted 27 April, 2020;
originally announced April 2020.
-
Editable Neural Networks
Authors:
Anton Sinitsin,
Vsevolod Plokhotnyuk,
Dmitriy Pyrkin,
Sergei Popov,
Artem Babenko
Abstract:
These days deep neural networks are ubiquitously used in a wide range of tasks, from image classification and machine translation to face identification and self-driving cars. In many applications, a single model error can lead to devastating financial, reputational and even life-threatening consequences. Therefore, it is crucially important to correct model mistakes quickly as they appear. In thi…
▽ More
These days deep neural networks are ubiquitously used in a wide range of tasks, from image classification and machine translation to face identification and self-driving cars. In many applications, a single model error can lead to devastating financial, reputational and even life-threatening consequences. Therefore, it is crucially important to correct model mistakes quickly as they appear. In this work, we investigate the problem of neural network editing $-$ how one can efficiently patch a mistake of the model on a particular sample, without influencing the model behavior on other samples. Namely, we propose Editable Training, a model-agnostic training technique that encourages fast editing of the trained model. We empirically demonstrate the effectiveness of this method on large-scale image classification and machine translation tasks.
△ Less
Submitted 22 July, 2020; v1 submitted 1 April, 2020;
originally announced April 2020.
-
C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds
Authors:
Albert Pumarola,
Stefan Popov,
Francesc Moreno-Noguer,
Vittorio Ferrari
Abstract:
Flow-based generative models have highly desirable properties like exact log-likelihood evaluation and exact latent-variable inference, however they are still in their infancy and have not received as much attention as alternative generative models. In this paper, we introduce C-Flow, a novel conditioning scheme that brings normalizing flows to an entirely new scenario with great possibilities for…
▽ More
Flow-based generative models have highly desirable properties like exact log-likelihood evaluation and exact latent-variable inference, however they are still in their infancy and have not received as much attention as alternative generative models. In this paper, we introduce C-Flow, a novel conditioning scheme that brings normalizing flows to an entirely new scenario with great possibilities for multi-modal data modeling. C-Flow is based on a parallel sequence of invertible mappings in which a source flow guides the target flow at every step, enabling fine-grained control over the generation process. We also devise a new strategy to model unordered 3D point clouds that, in combination with the conditioning scheme, makes it possible to address 3D reconstruction from a single image and its inverse problem of rendering an image given a point cloud. We demonstrate our conditioning method to be very adaptable, being also applicable to image manipulation, style transfer and multi-modal image-to-image mapping in a diversity of domains, including RGB images, segmentation maps, and edge masks.
△ Less
Submitted 3 April, 2020; v1 submitted 15 December, 2019;
originally announced December 2019.
-
Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
Authors:
Sergei Popov,
Stanislav Morozov,
Artem Babenko
Abstract:
Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods…
▽ More
Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods that outperform gradient boosting decision trees (GBDT), which are often the top choice for tabular problems. In this paper, we introduce Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data. In a nutshell, the proposed NODE architecture generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning. With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. We open-source the PyTorch implementation of NODE and believe that it will become a universal framework for machine learning on tabular data.
△ Less
Submitted 19 September, 2019; v1 submitted 13 September, 2019;
originally announced September 2019.
-
FPC-BI: Fast Probabilistic Consensus within Byzantine Infrastructures
Authors:
Serguei Popov,
William J Buchanan
Abstract:
This paper presents a novel leaderless protocol (FPC-BI: Fast Probabilistic Consensus within Byzantine Infrastructures) with a low communicational complexity and which allows a set of nodes to come to a consensus on a value of a single bit. The paper makes the assumption that part of the nodes are Byzantine, and are thus controlled by an adversary who intends to either delay the consensus, or brea…
▽ More
This paper presents a novel leaderless protocol (FPC-BI: Fast Probabilistic Consensus within Byzantine Infrastructures) with a low communicational complexity and which allows a set of nodes to come to a consensus on a value of a single bit. The paper makes the assumption that part of the nodes are Byzantine, and are thus controlled by an adversary who intends to either delay the consensus, or break it (this defines that at least a couple of honest nodes come to different conclusions). We prove that, nevertheless, the protocol works with high probability when its parameters are suitably chosen. Along this the paper also provides explicit estimates on the probability that the protocol finalizes in the consensus state in a given time. This protocol could be applied to reaching consensus in decentralized cryptocurrency systems. A special feature of it is that it makes use of a sequence of random numbers which are either provided by a trusted source or generated by the nodes themselves using some decentralized random number generating protocol. This increases the overall trustworthiness of the infrastructure. A core contribution of the paper is that it uses a very weak consensus to obtain a strong consensus on the value of a bit, and which can relate to the validity of a transaction.
△ Less
Submitted 13 September, 2020; v1 submitted 26 May, 2019;
originally announced May 2019.
-
Large-scale interactive object segmentation with human annotators
Authors:
Rodrigo Benenson,
Stefan Popov,
Vittorio Ferrari
Abstract:
Manually annotating object segmentation masks is very time consuming. Interactive object segmentation methods offer a more efficient alternative where a human annotator and a machine segmentation model collaborate. In this paper we make several contributions to interactive segmentation: (1) we systematically explore in simulation the design space of deep interactive segmentation models and report…
▽ More
Manually annotating object segmentation masks is very time consuming. Interactive object segmentation methods offer a more efficient alternative where a human annotator and a machine segmentation model collaborate. In this paper we make several contributions to interactive segmentation: (1) we systematically explore in simulation the design space of deep interactive segmentation models and report new insights and caveats; (2) we execute a large-scale annotation campaign with real human annotators, producing masks for 2.5M instances on the OpenImages dataset. We plan to release this data publicly, forming the largest existing dataset for instance segmentation. Moreover, by re-annotating part of the COCO dataset, we show that we can produce instance masks 3 times faster than traditional polygon drawing tools while also providing better quality. (3) We present a technique for automatically estimating the quality of the produced masks which exploits indirect signals from the annotation process.
△ Less
Submitted 17 April, 2019; v1 submitted 26 March, 2019;
originally announced March 2019.
-
High-speed PAM4-based Optical SDM Interconnects with Directly Modulated Long-wavelength VCSEL
Authors:
Joris Van Kerrebrouck,
Xiaodan Pang,
Oskars Ozolins,
Rui Lin,
Aleksejs Udalcovs,
Lu Zhang,
Haolin Li,
Silvia Spiga,
Markus-Christian Amann,
Lin Gan,
Ming Tang,
Songnian Fu,
Richard Schatz,
Gunnar Jacobsen,
Sergei Popov,
Deming Liu,
Weijun Tong,
Guy Torfs,
Johan Bauwelinck,
Jiajia Chen,
Xin Yin
Abstract:
This paper reports the demonstration of high-speed PAM-4 transmission using a 1.5-μm single-mode vertical cavity surface emitting laser (SM-VCSEL) over multicore fiber with 7 cores over different distances. We have successfully generated up to 70 Gbaud 4-level pulse amplitude modulation (PAM-4) signals with a VCSEL in optical back-to-back, and transmitted 50 Gbaud PAM-4 signals over both 1-km disp…
▽ More
This paper reports the demonstration of high-speed PAM-4 transmission using a 1.5-μm single-mode vertical cavity surface emitting laser (SM-VCSEL) over multicore fiber with 7 cores over different distances. We have successfully generated up to 70 Gbaud 4-level pulse amplitude modulation (PAM-4) signals with a VCSEL in optical back-to-back, and transmitted 50 Gbaud PAM-4 signals over both 1-km dispersion-uncompensated and 10-km dispersion-compensated in each core, enabling a total data throughput of 700 Gbps over the 7-core fiber. Moreover, 56 Gbaud PAM-4 over 1-km has also been shown, whereby unfortunately not all cores provide the required 3.8 $\times$ 10 $^{-3}$ bit error rate (BER) for the 7% overhead-hard decision forward error correction (7% OH HDFEC). The limited bandwidth of the VCSEL and the adverse chromatic dispersion of the fiber are suppressed with pre-equalization based on accurate end-to-end channel characterizations. With a digital post-equalization, BER performance below the 7% OH-HDFEC limit is achieved over all cores. The demonstrated results show a great potential to realize high-capacity and compact short-reach optical interconnects for data centers.
△ Less
Submitted 13 November, 2018;
originally announced December 2018.
-
The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale
Authors:
Alina Kuznetsova,
Hassan Rom,
Neil Alldrin,
Jasper Uijlings,
Ivan Krasin,
Jordi Pont-Tuset,
Shahab Kamali,
Stefan Popov,
Matteo Malloci,
Alexander Kolesnikov,
Tom Duerig,
Vittorio Ferrari
Abstract:
We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an in…
▽ More
We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias. Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, we provide 15x more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images). The images often show complex scenes with several objects (8 annotated objects per image on average). We annotated visual relationships between them, which support visual relationship detection, an emerging task that requires structured reasoning. We provide in-depth comprehensive statistics about the dataset, we validate the quality of the annotations, we study how the performance of several modern models evolves with increasing amounts of training data, and we demonstrate two applications made possible by having unified annotations of multiple types coexisting in the same images. We hope that the scale, quality, and variety of Open Images V4 will foster further research and innovation even beyond the areas of image classification, object detection, and visual relationship detection.
△ Less
Submitted 21 February, 2020; v1 submitted 2 November, 2018;
originally announced November 2018.
-
Equilibria in the Tangle
Authors:
Serguei Popov,
Olivia Saa,
Paulo Finardi
Abstract:
We analyse the Tangle --- a DAG-valued stochastic process where new vertices get attached to the graph at Poissonian times, and the attachment's locations are chosen by means of random walks on that graph. These new vertices, also thought of as "transactions", are issued by many players (which are the nodes of the network), independently. The main application of this model is that it is used as a…
▽ More
We analyse the Tangle --- a DAG-valued stochastic process where new vertices get attached to the graph at Poissonian times, and the attachment's locations are chosen by means of random walks on that graph. These new vertices, also thought of as "transactions", are issued by many players (which are the nodes of the network), independently. The main application of this model is that it is used as a base for the IOTA cryptocurrency system (www.iota.org). We prove existence of "almost symmetric" Nash equilibria for the system where a part of players tries to optimize their attachment strategies. Then, we also present simulations that show that the "selfish" players will nevertheless cooperate with the network by choosing attachment strategies that are similar to the "recommended" one.
△ Less
Submitted 3 July, 2019; v1 submitted 14 December, 2017;
originally announced December 2017.
-
Revisiting knowledge transfer for training object class detectors
Authors:
Jasper Uijlings,
Stefan Popov,
Vittorio Ferrari
Abstract:
We propose to revisit knowledge transfer for training object detectors on target classes from weakly supervised training images, helped by a set of source classes with bounding-box annotations. We present a unified knowledge transfer framework based on training a single neural network multi-class object detector over all source classes, organized in a semantic hierarchy. This generates proposals w…
▽ More
We propose to revisit knowledge transfer for training object detectors on target classes from weakly supervised training images, helped by a set of source classes with bounding-box annotations. We present a unified knowledge transfer framework based on training a single neural network multi-class object detector over all source classes, organized in a semantic hierarchy. This generates proposals with scores at multiple levels in the hierarchy, which we use to explore knowledge transfer over a broad range of generality, ranging from class-specific (bicycle to motorbike) to class-generic (objectness to any class). Experiments on the 200 object classes in the ILSVRC 2013 detection dataset show that our technique: (1) leads to much better performance on the target classes (70.3% CorLoc, 36.9% mAP) than a weakly supervised baseline which uses manually engineered objectness [11] (50.5% CorLoc, 25.4% mAP). (2) delivers target object detectors reaching 80% of the mAP of their fully supervised counterparts. (3) outperforms the best reported transfer learning results on this dataset (+41% CorLoc and +3% mAP over [18, 46], +16.2% mAP over [32]). Moreover, we also carry out several across-dataset knowledge transfer experiments [27, 24, 35] and find that (4) our technique outperforms the weakly supervised baseline in all dataset pairs by 1.5x-1.9x, establishing its general applicability.
△ Less
Submitted 28 March, 2018; v1 submitted 21 August, 2017;
originally announced August 2017.
-
Analytical Estimation in Differential Optical Transmission Systems Influenced by Equalization Enhanced Phase Noise
Authors:
Tianhua Xu,
Gunnar Jacobsen,
Sergei Popov,
Tiegen Liu,
Yimo Zhang,
Polina Bayvel
Abstract:
An analytical model is presented for assessing the performance of the bit-error-rate (BER) in the differential m-level phase shift keying (m-PSK) transmission systems, where the influence of equalization enhanced phase noise (EEPN) has been considered. Theoretical analysis has been carried out in differential quadrature phase shift keying (DQPSK), differential 8-PSK (D8PSK), and differential 16-PS…
▽ More
An analytical model is presented for assessing the performance of the bit-error-rate (BER) in the differential m-level phase shift keying (m-PSK) transmission systems, where the influence of equalization enhanced phase noise (EEPN) has been considered. Theoretical analysis has been carried out in differential quadrature phase shift keying (DQPSK), differential 8-PSK (D8PSK), and differential 16-PSK (D16PSK) coherent optical transmission systems. The influence of EEPN on the BER performance, in term of signal-to-noise ratio (SNR), are investigated for different fiber dispersion, LO laser linewidths, symbol rates, and modulation formats. Our analytical model achieves a good agreement with previously reported EEPN induced BER floors, and can give an accurate prediction for the DQPSK system, and a leading-order approximation for the D8PSK and the D16PSK systems.
△ Less
Submitted 22 August, 2016;
originally announced September 2016.
-
Analytical Investigations on Carrier Phase Recovery in Dispersion-Unmanaged n-PSK Coherent Optical Communication Systems
Authors:
Tianhua Xu,
Gunnar Jacobsen,
Sergei Popov,
Jie Li,
Tiegen Liu,
Yimo Zhang,
Polina Bayvel
Abstract:
Using coherent optical detection and digital signal processing, laser phase noise and equalization enhanced phase noise can be effectively mitigated using the feed-forward and feed-back carrier phase recovery approaches. In this paper, theoretical analyses of feed-back and feed-forward carrier phase recovery methods have been carried out in the long-haul high-speed n-level phase shift keying (n-PS…
▽ More
Using coherent optical detection and digital signal processing, laser phase noise and equalization enhanced phase noise can be effectively mitigated using the feed-forward and feed-back carrier phase recovery approaches. In this paper, theoretical analyses of feed-back and feed-forward carrier phase recovery methods have been carried out in the long-haul high-speed n-level phase shift keying (n-PSK) optical fiber communication systems, involving a one-tap normalized least-mean-square (LMS) algorithm, a block-wise average algorithm, and a Viterbi-Viterbi algorithm. The analytical expressions for evaluating the estimated carrier phase and for predicting the bit-error-rate (BER) performance (such as the BER floors) have been presented and discussed in the n-PSK coherent optical transmission systems by considering both the laser phase noise and the equalization enhanced phase noise. The results indicate that the Viterbi-Viterbi carrier phase recovery algorithm outperforms the one-tap normalized LMS and the block-wise average algorithms for small phase noise variance (or effective phase noise variance), while the one-tap normalized LMS algorithm shows a better performance than the other two algorithms for large phase noise variance (or effective phase noise variance). In addition, the one-tap normalized LMS algorithm is more sensitive to the level of modulation formats.
△ Less
Submitted 21 September, 2016; v1 submitted 22 August, 2016;
originally announced August 2016.
-
Phase Noise Influence in Optical OFDM Systems employing RF Pilot Tone for Phase Noise Cancellation
Authors:
Gunnar Jacobsen,
Leonid G. Kazovsky,
Tianhua Xu,
Sergei Popov,
Jie Li,
Yimo Zhang,
Ari T. Friberg
Abstract:
For coherent and direct-detection Orthogonal Frequency Division Multiplexed (OFDM) systems employing radio frequency (RF) pilot tone phase noise cancellation the influence of laser phase noise is evaluated. Novel analytical results for the common phase error and for the (modulation dependent) inter carrier interference are evaluated based upon Gaussian statistics for the laser phase noise. In the…
▽ More
For coherent and direct-detection Orthogonal Frequency Division Multiplexed (OFDM) systems employing radio frequency (RF) pilot tone phase noise cancellation the influence of laser phase noise is evaluated. Novel analytical results for the common phase error and for the (modulation dependent) inter carrier interference are evaluated based upon Gaussian statistics for the laser phase noise. In the evaluation it is accounted for that the laser phase noise is filtered in the correlation signal detection. Numerical results are presented for OFDM systems with 4 and 16 PSK modulation, 200 OFDM bins and baud rate of 1 GS/s. It is found that about 225 km transmission is feasible for the coherent 4PSK-OFDM system over normal (G.652) fiber.
△ Less
Submitted 19 July, 2016;
originally announced July 2016.
-
Phase Noise Influence in Long-range Coherent Optical OFDM Systems with Delay Detection, IFFT Multiplexing and FFT Demodulation
Authors:
Gunnar Jacobsen,
Tianhua Xu,
Sergei Popov,
Sergey Sergeyev,
Yimo Zhang
Abstract:
We present a study of the influence of dispersion induced phase noise for CO-OFDM systems using FFT multiplexing/IFFT demultiplexing techniques (software based). The software based system provides a method for a rigorous evaluation of the phase noise variance caused by Common Phase Error (CPE) and Inter-Carrier Interference (ICI) including - for the first time to our knowledge - in explicit form t…
▽ More
We present a study of the influence of dispersion induced phase noise for CO-OFDM systems using FFT multiplexing/IFFT demultiplexing techniques (software based). The software based system provides a method for a rigorous evaluation of the phase noise variance caused by Common Phase Error (CPE) and Inter-Carrier Interference (ICI) including - for the first time to our knowledge - in explicit form the effect of equalization enhanced phase noise (EEPN). This, in turns, leads to an analytic BER specification. Numerical results focus on a CO-OFDM system with 10-25 GS/s QPSK channel modulation. A worst case constellation configuration is identified for the phase noise influence and the resulting BER is compared to the BER of a conventional single channel QPSK system with the same capacity as the CO-OFDM implementation. Results are evaluated as a function of transmission distance. For both types of systems, the phase noise variance increases significantly with increasing transmission distance. For a total capacity of 400 (1000) Gbit/s, the transmission distance to have the BER < 10^-2 for the worst case CO-OFDM design is less than 800 and 460 km, respectively, whereas for a single channel QPSK system it is less than 1400 and 560 km.
△ Less
Submitted 21 July, 2016;
originally announced July 2016.
-
Analysis of chromatic dispersion compensation and carrier phase recovery in long-haul optical transmission system influenced by equalization enhanced phase noise
Authors:
Tianhua Xu,
Gunnar Jacobsen,
Sergei Popov,
Jie Li,
Sergey Sergeyev,
Ari T. Friberg,
Tiegen Liu,
Yimo Zhang
Abstract:
The performance of long-haul coherent optical fiber transmission system is significantly affected by the equalization enhanced phase noise (EEPN), due to the interaction between the electronic dispersion compensation (EDC) and the laser phase noise. In this paper, we present a comprehensive study on different chromatic dispersion (CD) compensation and carrier phase recovery (CPR) approaches, in th…
▽ More
The performance of long-haul coherent optical fiber transmission system is significantly affected by the equalization enhanced phase noise (EEPN), due to the interaction between the electronic dispersion compensation (EDC) and the laser phase noise. In this paper, we present a comprehensive study on different chromatic dispersion (CD) compensation and carrier phase recovery (CPR) approaches, in the n-level phase shift keying (n-PSK) and the n-level quadrature amplitude modulation (n-QAM) coherent optical transmission systems, considering the impacts of EEPN. Four CD compensation methods are considered: the time-domain equalization (TDE), the frequency-domain equalization (FDE), the least mean square (LMS) adaptive equalization are applied for EDC, and the dispersion compensating fiber (DCF) is employed for optical dispersion compensation (ODC). Meanwhile, three carrier phase recovery methods are also involved: a one-tap normalized least mean square (NLMS) algorithm, a block-wise average (BWA) algorithm, and a Viterbi-Viterbi (VV) algorithm. Numerical simulations have been carried out in a 28-Gbaud dual-polarization quadrature phase shift keying (DP-QPSK) coherent transmission system, and the results indicate that the origin of EEPN depends on the choice of chromatic dispersion compensation methods, and the effects of EEPN also behave moderately different in accordance to different carrier phase recovery scenarios.
△ Less
Submitted 29 March, 2017; v1 submitted 19 June, 2016;
originally announced July 2016.
-
Influence of Pre- and Post-compensation of Chromatic Dispersion on Equalization Enhanced Phase Noise in Coherent Multilevel Systems
Authors:
Gunnar Jacobsen,
Marisol Lidón,
Tianhua Xu,
Sergei Popov,
Ari T. Friberg,
Yimo Zhang
Abstract:
In this paper we present a comparative study in order to specify the influence of equalization enhanced phase noise (EEPN) for pre- and post-compensation of chromatic dispersion in high capacity and high constellation systems. This is - to our knowledge - the first detailed study in this area for pre-compensation systems. Our main results show that the local oscillator phase noise determines the E…
▽ More
In this paper we present a comparative study in order to specify the influence of equalization enhanced phase noise (EEPN) for pre- and post-compensation of chromatic dispersion in high capacity and high constellation systems. This is - to our knowledge - the first detailed study in this area for pre-compensation systems. Our main results show that the local oscillator phase noise determines the EEPN influence in post-compensation implementations whereas the transmitter laser determines the EEPN in pre-compensation implementations. As a result of significance for the implementation of practical longer-range systems it is to be emphasized that the use of chromatic dispersion equalization in the optical domain - e.g. by the use of dispersion compensation fibers - eliminates the EEPN entirely. Thus, this seems a good option for such systems operating at high constellations in the future.
△ Less
Submitted 19 July, 2016; v1 submitted 17 July, 2016;
originally announced July 2016.
-
Phase Noise Influence in Coherent Optical OFDM Systems with RF Pilot Tone: Digital IFFT Multiplexing and FFT Demodulation
Authors:
Gunnar Jacobsen,
Tianhua Xu,
Sergei Popov,
Jie Li,
Ari T. Friberg,
Yimo Zhang
Abstract:
We present a comparative study of the influence of dispersion induced phase noise for CO-OFDM systems using Tx channel multiplexing and Rx matched filter (analogue hardware based); and FFT multiplexing/IFFT demultiplexing techniques (software based). An RF carrier pilot tone is used to mitigate the phase noise influence. From the analysis, it appears that the phase noise influence for the two OFDM…
▽ More
We present a comparative study of the influence of dispersion induced phase noise for CO-OFDM systems using Tx channel multiplexing and Rx matched filter (analogue hardware based); and FFT multiplexing/IFFT demultiplexing techniques (software based). An RF carrier pilot tone is used to mitigate the phase noise influence. From the analysis, it appears that the phase noise influence for the two OFDM implementations is very similar. The software based system provides a method for a rigorous evaluation of the phase noise variance caused by Common Phase Error (CPE) and Inter-Carrier Interference (ICI) and this, in turns, leads to a BER specification. Numerical results focus on a CO-OFDM system with 1GS/s QPSK channel modulation. Worst case BER results are evaluated and compared to the BER of a QPSK system with the same capacity as the OFDM implementation. Results are evaluated as a function of transmission distance, and for the QPSK system the influence of equalization enhanced phase noise (EEPN) is included. For both types of systems, the phase noise variance increases significantly with increasing transmission distance. An important and novel observation is that the two types of systems have very closely the same BER as a function of transmission distance for the same capacity. For the high capacity QPSK implementation, the increase in BER is due to EEPN, whereas for the OFDM approach it is due to the dispersion caused walk-off of the RF pilot tone relative to the OFDM signal channels. For a total capacity of 400 Gb/s, the transmission distance to have the BER < 10-4 is less than 277 km.
△ Less
Submitted 19 July, 2016; v1 submitted 17 July, 2016;
originally announced July 2016.
-
Dynamic physical layer equalization in optical communication networks
Authors:
Tianhua Xu,
Gunnar Jacobsen,
Jie Li,
Mark Leeson,
Sergei Popov
Abstract:
In optical transport networks, signal lightpaths between two terminal nodes can be different due to current network conditions. Thus the transmission distance and accumulated dispersion in the lightpath cannot be predicted. Therefore, the adaptive compensation of dynamic dispersion is necessary in such networks to enable flexible routing and switching. In this paper, we present a detailed analysis…
▽ More
In optical transport networks, signal lightpaths between two terminal nodes can be different due to current network conditions. Thus the transmission distance and accumulated dispersion in the lightpath cannot be predicted. Therefore, the adaptive compensation of dynamic dispersion is necessary in such networks to enable flexible routing and switching. In this paper, we present a detailed analysis on the adaptive dispersion compensation using the least-mean-square (LMS) algorithm in coherent optical communication networks. It is found that the variable-step-size LMS equalizer can achieve the same performance with a lower complexity, compared to the traditional LMS algorithm.
△ Less
Submitted 7 May, 2018; v1 submitted 13 June, 2016;
originally announced June 2016.
-
Digital Adaptive Carrier Phase Estimation in Multi-Level Phase Shift Keying Coherent Optical Communication Systems
Authors:
Tianhua Xu,
Tiegen Liu,
Yimo Zhang,
Gunnar Jacobsen,
Jie Li,
Sergei Popov
Abstract:
The analysis of adaptive carrier phase estimation is investigated in long-haul high speed n-level phase shift keying (n-PSK) optical fiber communication systems based on the one-tap normalized least-mean-square (LMS) algorithm. The close-form expressions for the estimated carrier phase and the bit-error-rate floor have been derived in the n-PSK coherent optical transmission systems. The results sh…
▽ More
The analysis of adaptive carrier phase estimation is investigated in long-haul high speed n-level phase shift keying (n-PSK) optical fiber communication systems based on the one-tap normalized least-mean-square (LMS) algorithm. The close-form expressions for the estimated carrier phase and the bit-error-rate floor have been derived in the n-PSK coherent optical transmission systems. The results show that the one-tap normalized LMS algorithm performs pretty well in the carrier phase estimation, but will be less effective with the increment of modulation levels, in the compensation of both intrinsic laser phase noise and equalization enhanced phase noise.
△ Less
Submitted 20 July, 2016; v1 submitted 13 March, 2016;
originally announced April 2016.
-
Carrier Phase Estimation in Dispersion-Unmanaged Optical Transmission Systems
Authors:
Tianhua Xu,
Polina Bayvel,
Tiegen Liu,
Yimo Zhang,
Gunnar Jacobsen,
Jie Li,
Sergei Popov
Abstract:
The study on carrier phase estimation (CPE) approaches, involving a one-tap normalized least-mean-square (NLMS) algorithm, a block-wise average algorithm, and a Viterbi-Viterbi algorithm has been carried out in the long-haul high-capacity dispersion-unmanaged coherent optical systems. The close-form expressions and analytical predictions for bit-error-rate behaviors in these CPE methods have been…
▽ More
The study on carrier phase estimation (CPE) approaches, involving a one-tap normalized least-mean-square (NLMS) algorithm, a block-wise average algorithm, and a Viterbi-Viterbi algorithm has been carried out in the long-haul high-capacity dispersion-unmanaged coherent optical systems. The close-form expressions and analytical predictions for bit-error-rate behaviors in these CPE methods have been analyzed by considering both the laser phase noise and the equalization enhanced phase noise. It is found that the Viterbi-Viterbi algorithm outperforms the one-tap NLMS and the block-wise average algorithms for a small phase noise variance (or effective phase noise variance), while the three CPE methods converge to a similar performance for a large phase noise variance (or effective phase noise variance). In addition, the differences between the three CPE approaches become smaller for higher-level modulation formats.
△ Less
Submitted 29 March, 2017; v1 submitted 2 March, 2016;
originally announced March 2016.
-
Close-Form Expression of One-Tap Normalized LMS Carrier Phase Recovery in Optical Communication Systems
Authors:
Tianhua Xu,
Gunnar Jacobsen,
Sergei Popov,
Jie Li,
Tiegen Liu,
Yimo Zhang
Abstract:
The performance of long-haul high speed coherent optical fiber communication systems is significantly degraded by the laser phase noise and the equalization enhanced phase noise (EEPN). In this paper, the analysis of the one-tap normalized least-mean-square (LMS) carrier phase recovery (CPR) is carried out and the close-form expression is investigated for quadrature phase shift keying (QPSK) coher…
▽ More
The performance of long-haul high speed coherent optical fiber communication systems is significantly degraded by the laser phase noise and the equalization enhanced phase noise (EEPN). In this paper, the analysis of the one-tap normalized least-mean-square (LMS) carrier phase recovery (CPR) is carried out and the close-form expression is investigated for quadrature phase shift keying (QPSK) coherent optical fiber communication systems, in compensating both laser phase noise and equalization enhanced phase noise. Numerical simulations have also been implemented to verify the theoretical analysis. It is found that the one-tap normalized least-mean-square algorithm gives the same analytical expression for predicting CPR bit-error-rate (BER) floors as the traditional differential carrier phase recovery, when both the laser phase noise and the equalization enhanced phase noise are taken into account.
△ Less
Submitted 5 October, 2016; v1 submitted 22 February, 2016;
originally announced February 2016.