-
Quantum-State-Specific Reaction Rate Measurements for the Photo-induced Reaction Ca$^+$ + O$_2$ $\rightarrow$ CaO$^+$ + O
Authors:
Philipp C. Schmid,
Mikhail I. Miller,
James Greenberg,
Thanh L. Nguyen,
John F. Stanton,
H. J. Lewandowski
Abstract:
Atoms and molecules often react at different rates depending on their internal quantum states. Thus, controlling which internal states are populated can be used to manipulate the reactivity and can lead to a more detailed understanding of reaction mechanisms. We demonstrate this control of reactions by studying the excited state reaction reaction Ca$^+$ + O$_2$ $\rightarrow$ CaO$^+$ + O. This reac…
▽ More
Atoms and molecules often react at different rates depending on their internal quantum states. Thus, controlling which internal states are populated can be used to manipulate the reactivity and can lead to a more detailed understanding of reaction mechanisms. We demonstrate this control of reactions by studying the excited state reaction reaction Ca$^+$ + O$_2$ $\rightarrow$ CaO$^+$ + O. This reaction is exothermic only if Ca$^+$ is in one of its excited electronic states. Using laser-cooling and electrodynamic trapping, we cool and trap Ca$^+$ at millikevin temperatures for several minutes. We can then change the fraction of time they spend in each of the two excited states by adjusting the detunings of the cooling lasers. This allows us to disentangle the reactions that begin with Ca$^+$ in the $^2$P$_{1/2}$-state from the ones where Ca$^+$ is in the $^2$D$_{3/2}$-state. Using time-of-flight mass spectrometry, we determine independent reaction rate constants for Ca$^+$ in both electronically excited quantum states.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Authors:
Joseph Roth,
Sourish Chaudhuri,
Ondrej Klejch,
Radhika Marvin,
Andrew Gallagher,
Liat Kaver,
Sharadh Ramaswamy,
Arkadiusz Stopczynski,
Cordelia Schmid,
Zhonghua Xi,
Caroline Pantofaru
Abstract:
Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual dataset for this task has constrained algorithm evaluations with respect to data diversity, environments, and accuracy. This has made com…
▽ More
Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual dataset for this task has constrained algorithm evaluations with respect to data diversity, environments, and accuracy. This has made comparisons and improvements difficult. In this paper, we present the AVA Active Speaker detection dataset (AVA-ActiveSpeaker) that will be released publicly to facilitate algorithm development and enable comparisons. The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible. This dataset contains about 3.65 million human labeled frames or about 38.5 hours of face tracks, and the corresponding audio. We also present a new audio-visual approach for active speaker detection, and analyze its performance, demonstrating both its strength and the contributions of the dataset.
△ Less
Submitted 24 May, 2019; v1 submitted 4 January, 2019;
originally announced January 2019.
-
Adaptive Density Estimation for Generative Models
Authors:
Thomas Lucas,
Konstantin Shmelkov,
Karteek Alahari,
Cordelia Schmid,
Jakob Verbeek
Abstract:
Unsupervised learning of generative models has seen tremendous progress over recent years, in particular due to generative adversarial networks (GANs), variational autoencoders, and flow-based models. GANs have dramatically improved sample quality, but suffer from two drawbacks: (i) they mode-drop, i.e., do not cover the full support of the train data, and (ii) they do not allow for likelihood eva…
▽ More
Unsupervised learning of generative models has seen tremendous progress over recent years, in particular due to generative adversarial networks (GANs), variational autoencoders, and flow-based models. GANs have dramatically improved sample quality, but suffer from two drawbacks: (i) they mode-drop, i.e., do not cover the full support of the train data, and (ii) they do not allow for likelihood evaluations on held-out data. In contrast, likelihood-based training encourages models to cover the full support of the train data, but yields poorer samples. These mutual shortcomings can in principle be addressed by training generative latent variable models in a hybrid adversarial-likelihood manner. However, we show that commonly made parametric assumptions create a conflict between them, making successful hybrid models non trivial. As a solution, we propose to use deep invertible transformations in the latent variable decoder. This approach allows for likelihood computations in image space, is more efficient than fully invertible models, and can take full advantage of adversarial training. We show that our model significantly improves over existing hybrid models: offering GAN-like samples, IS and FID scores that are competitive with fully adversarial models, and improved likelihood scores.
△ Less
Submitted 3 January, 2020; v1 submitted 4 January, 2019;
originally announced January 2019.
-
Active learning for efficiently training emulators of computationally expensive mathematical models
Authors:
Alexandra G. Ellis,
Rowan Iskandar,
Christopher H. Schmid,
John B. Wong,
Thomas A. Trikalinos
Abstract:
An emulator is a fast-to-evaluate statistical approximation of a detailed mathematical model (simulator). When used in lieu of simulators, emulators can expedite tasks that require many repeated evaluations, such as sensitivity analyses, policy optimization, model calibration, and value-of-information analyses. Emulators are developed using the output of simulators at specific input values (design…
▽ More
An emulator is a fast-to-evaluate statistical approximation of a detailed mathematical model (simulator). When used in lieu of simulators, emulators can expedite tasks that require many repeated evaluations, such as sensitivity analyses, policy optimization, model calibration, and value-of-information analyses. Emulators are developed using the output of simulators at specific input values (design points). Developing an emulator that closely approximates the simulator can require many design points, which becomes computationally expensive. We describe a self-terminating active learning algorithm to efficiently develop emulators tailored to a specific emulation task, and compare it with algorithms that optimize geometric criteria (random latin hypercube sampling and maximum projection designs) and other active learning algorithms (treed Gaussian Processes that optimize typical active learning criteria). We compared the algorithms' root mean square error (RMSE) and maximum absolute deviation from the simulator (MAX) for seven benchmark functions and in a prostate cancer screening model. In the empirical analyses, in simulators with greatly-varying smoothness over the input domain, active learning algorithms resulted in emulators with smaller RMSE and MAX for the same number of design points. In all other cases, all algorithms performed comparably. The proposed algorithm attained satisfactory performance in all analyses, had smaller variability than the treed Gaussian Processes (it is deterministic), and, on average, had similar or better performance as the treed Gaussian Processes in 6 out of 7 benchmark functions and in the prostate cancer model.
△ Less
Submitted 3 January, 2020; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Detecting unseen visual relations using analogies
Authors:
Julia Peyre,
Ivan Laptev,
Cordelia Schmid,
Josef Sivic
Abstract:
We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as "person riding dog", where training examples of the individual entities are available but their combinations are unseen at training. This is an important set-up due to the combinatorial nature of visual relations : collecting sufficient training data for all possible triplets would be ver…
▽ More
We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as "person riding dog", where training examples of the individual entities are available but their combinations are unseen at training. This is an important set-up due to the combinatorial nature of visual relations : collecting sufficient training data for all possible triplets would be very hard. The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with (ii) a visual phrase embedding that represents the relation triplet. Second, we learn how to transfer visual phrase embeddings from existing training triplets to unseen test triplets using analogies between relations that involve similar objects. Third, we demonstrate the benefits of our approach on three challenging datasets : on HICO-DET, our model achieves significant improvement over a strong baseline for both frequent and unseen triplets, and we observe similar improvement for the retrieval of unseen triplets with out-of-vocabulary predicates on the COCO-a dataset as well as the challenging unusual triplets in the UnRel dataset.
△ Less
Submitted 22 September, 2019; v1 submitted 13 December, 2018;
originally announced December 2018.
-
A Structured Model For Action Detection
Authors:
Yubo Zhang,
Pavel Tokmakov,
Martial Hebert,
Cordelia Schmid
Abstract:
A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand. While this is an obviously attractive approach, it is not applicable in all scenarios. We claim that action detection is one such cha…
▽ More
A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand. While this is an obviously attractive approach, it is not applicable in all scenarios. We claim that action detection is one such challenging problem - the models that need to be trained are large, and labeled data is expensive to obtain. To address this limitation, we propose to incorporate domain knowledge into the structure of the model, simplifying optimization. In particular, we augment a standard I3D network with a tracking module to aggregate long term motion patterns, and use a graph convolutional network to reason about interactions between actors and objects. Evaluated on the challenging AVA dataset, the proposed approach improves over the I3D baseline by 5.5% mAP and over the state-of-the-art by 4.8% mAP.
△ Less
Submitted 5 June, 2019; v1 submitted 9 December, 2018;
originally announced December 2018.
-
Modulated Policy Hierarchies
Authors:
Alexander Pashevich,
Danijar Hafner,
James Davidson,
Rahul Sukthankar,
Cordelia Schmid
Abstract:
Solving tasks with sparse rewards is a main challenge in reinforcement learning. While hierarchical controllers are an intuitive approach to this problem, current methods often require manual reward shaping, alternating training phases, or manually defined sub tasks. We introduce modulated policy hierarchies (MPH), that can learn end-to-end to solve tasks from sparse rewards. To achieve this, we s…
▽ More
Solving tasks with sparse rewards is a main challenge in reinforcement learning. While hierarchical controllers are an intuitive approach to this problem, current methods often require manual reward shaping, alternating training phases, or manually defined sub tasks. We introduce modulated policy hierarchies (MPH), that can learn end-to-end to solve tasks from sparse rewards. To achieve this, we study different modulation signals and exploration for hierarchical controllers. Specifically, we find that communicating via bit-vectors is more efficient than selecting one out of multiple skills, as it enables mixing between them. To facilitate exploration, MPH uses its different time scales for temporally extended intrinsic motivation at each level of the hierarchy. We evaluate MPH on the robotics tasks of pushing and sparse block stacking, where it outperforms recent baselines.
△ Less
Submitted 30 November, 2018;
originally announced December 2018.
-
Déjà Vu: an empirical evaluation of the memorization properties of ConvNets
Authors:
Alexandre Sablayrolles,
Matthijs Douze,
Cordelia Schmid,
Hervé Jégou
Abstract:
Convolutional neural networks memorize part of their training data, which is why strategies such as data augmentation and drop-out are employed to mitigate overfitting. This paper considers the related question of "membership inference", where the goal is to determine if an image was used during training. We consider it under three complementary angles. We show how to detect which dataset was used…
▽ More
Convolutional neural networks memorize part of their training data, which is why strategies such as data augmentation and drop-out are employed to mitigate overfitting. This paper considers the related question of "membership inference", where the goal is to determine if an image was used during training. We consider it under three complementary angles. We show how to detect which dataset was used to train a model, and in particular whether some validation images were used at train time. We then analyze explicit memorization and extend classical random label experiments to the problem of learning a model that predicts if an image belongs to an arbitrary set. Finally, we propose a new approach to infer membership when a few of the top layers are not available or have been fine-tuned, and show that lower layers still carry information about the training samples. To support our findings, we conduct large-scale experiments on Imagenet and subsets of YFCC-100M with modern architectures such as VGG and Resnet.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
On the Importance of Visual Context for Data Augmentation in Scene Understanding
Authors:
Nikita Dvornik,
Julien Mairal,
Cordelia Schmid
Abstract:
Performing data augmentation for learning deep neural networks is known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. While simple image transformations can already improve predictive performance in most vision tasks, larger gains can be obtained by leveraging task-spec…
▽ More
Performing data augmentation for learning deep neural networks is known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. While simple image transformations can already improve predictive performance in most vision tasks, larger gains can be obtained by leveraging task-specific prior knowledge. In this work, we consider object detection, semantic and instance segmentation and augment the training images by blending objects in existing scenes, using instance segmentation annotations. We observe that randomly pasting objects on images hurts the performance, unless the object is placed in the right context. To resolve this issue, we propose an explicit context model by using a convolutional neural network, which predicts whether an image region is suitable for placing a given object or not. In our experiments, we show that our approach is able to improve object detection, semantic and instance segmentation on the PASCAL VOC12 and COCO datasets, with significant gains in a limited annotation scenario, i.e. when only one category is annotated. We also show that the method is not limited to datasets that come with expensive pixel-wise instance annotations and can be used when only bounding boxes are available, by employing weakly-supervised learning for instance masks approximation.
△ Less
Submitted 19 September, 2019; v1 submitted 6 September, 2018;
originally announced September 2018.
-
Actor-Centric Relation Network
Authors:
Chen Sun,
Abhinav Shrivastava,
Carl Vondrick,
Kevin Murphy,
Rahul Sukthankar,
Cordelia Schmid
Abstract:
Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level and model temporal context with 3D ConvNets. Here, we go one step further and model spatio-temporal relations to capture the interactions between human actors, relevant objects and scene elements essential to differentiate similar human actions. Our approach is weakly supervised and mi…
▽ More
Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level and model temporal context with 3D ConvNets. Here, we go one step further and model spatio-temporal relations to capture the interactions between human actors, relevant objects and scene elements essential to differentiate similar human actions. Our approach is weakly supervised and mines the relevant elements automatically with an actor-centric relational network (ACRN). ACRN computes and accumulates pair-wise relation information from actor and global scene features, and generates relation features for action classification. It is implemented as neural networks and can be trained jointly with an existing action detection system. We show that ACRN outperforms alternative approaches which capture relation information, and that the proposed framework improves upon the state-of-the-art performance on JHMDB and AVA. A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action.
△ Less
Submitted 28 July, 2018;
originally announced July 2018.
-
End-to-End Incremental Learning
Authors:
Francisco M. Castro,
Manuel J. Marín-Jiménez,
Nicolás Guil,
Cordelia Schmid,
Karteek Alahari
Abstract:
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new cla…
▽ More
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model -a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and ImageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
△ Less
Submitted 3 September, 2018; v1 submitted 25 July, 2018;
originally announced July 2018.
-
How good is my GAN?
Authors:
Konstantin Shmelkov,
Cordelia Schmid,
Karteek Alahari
Abstract:
Generative adversarial networks (GANs) are one of the most popular methods for generating images today. While impressive results have been validated by visual inspection, a number of quantitative criteria have emerged only recently. We argue here that the existing ones are insufficient and need to be in adequation with the task at hand. In this paper we introduce two measures based on image classi…
▽ More
Generative adversarial networks (GANs) are one of the most popular methods for generating images today. While impressive results have been validated by visual inspection, a number of quantitative criteria have emerged only recently. We argue here that the existing ones are insufficient and need to be in adequation with the task at hand. In this paper we introduce two measures based on image classification---GAN-train and GAN-test, which approximate the recall (diversity) and precision (quality of the image) of GANs respectively. We evaluate a number of recent GAN approaches based on these two measures and demonstrate a clear difference in performance. Furthermore, we observe that the increasing difficulty of the dataset, from CIFAR10 over CIFAR100 to ImageNet, shows an inverse correlation with the quality of the GANs, as clearly evident from our measures.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
Modeling Visual Context is Key to Augmenting Object Detection Datasets
Authors:
Nikita Dvornik,
Julien Mairal,
Cordelia Schmid
Abstract:
Performing data augmentation for learning deep neural networks is well known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. For object detection, classical approaches for data augmentation consist of generating images obtained by basic geometrical transformations and col…
▽ More
Performing data augmentation for learning deep neural networks is well known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. For object detection, classical approaches for data augmentation consist of generating images obtained by basic geometrical transformations and color changes of original training images. In this work, we go one step further and leverage segmentation annotations to increase the number of object instances present on training data. For this approach to be successful, we show that modeling appropriately the visual context surrounding objects is crucial to place them in the right environment. Otherwise, we show that the previous strategy actually hurts. With our context model, we achieve significant mean average precision improvements when few labeled examples are available on the VOC'12 benchmark.
△ Less
Submitted 19 July, 2018;
originally announced July 2018.
-
Effects of Predictive Real-Time Traffic Signal Information
Authors:
Vadim Sokolov,
David W. Etherington,
Christian Schmid,
Dominik Karbowski,
Aymeric Rousseau,
Muhammad Imran
Abstract:
This paper analyzes the impact of providing car drivers with predictive information on traffic signal timing in real-time, including time-to-green and green-wave speed recommendations. Over a period of six months, the behavior of these 121 drivers in everyday urban driving was analyzed with and without access to live traffic signal information. In a first period, drivers had the information provid…
▽ More
This paper analyzes the impact of providing car drivers with predictive information on traffic signal timing in real-time, including time-to-green and green-wave speed recommendations. Over a period of six months, the behavior of these 121 drivers in everyday urban driving was analyzed with and without access to live traffic signal information. In a first period, drivers had the information providing service disabled in order to establish a baseline behavior; after that initial phase, the service was activated. In both cases, data from smartphone and vehicle sensors was collected, including speed, acceleration, fuel rate, acceleration and brake pedal positions. We estimated the changes in the driving behavior which result from drivers' receiving the traffic signal timing information by carefully comparing distributions of acceleration/deceleration patterns through statistical analysis. Our analysis demonstrates that there is a positive effect of providing traffic signal information timing to the drivers.
△ Less
Submitted 9 November, 2018; v1 submitted 7 July, 2018;
originally announced July 2018.
-
A flexible model for training action localization with varying levels of supervision
Authors:
Guilhem Chéron,
Jean-Baptiste Alayrac,
Ivan Laptev,
Cordelia Schmid
Abstract:
Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is a clear need to minimize the amount of manual supervision. In this work we propose a unifying framework that can handle and combine varying types of less-demand…
▽ More
Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is a clear need to minimize the amount of manual supervision. In this work we propose a unifying framework that can handle and combine varying types of less-demanding weak supervision. Our model is based on discriminative clustering and integrates different types of supervision as constraints on the optimization. We investigate applications of such a model to training setups with alternative supervisory signals ranging from video-level class labels to the full per-frame annotation of action bounding boxes. Experiments on the challenging UCF101-24 and DALY datasets demonstrate competitive performance of our method at a fraction of supervision used by previous methods. The flexibility of our model enables joint learning from data with different levels of annotation. Experimental results demonstrate a significant gain by adding a few fully supervised examples to otherwise weakly labeled videos.
△ Less
Submitted 27 November, 2018; v1 submitted 29 June, 2018;
originally announced June 2018.
-
Modeling Spatio-Temporal Human Track Structure for Action Localization
Authors:
Guilhem Chéron,
Anton Osokin,
Ivan Laptev,
Cordelia Schmid
Abstract:
This paper addresses spatio-temporal localization of human actions in video. In order to localize actions in time, we propose a recurrent localization network (RecLNet) designed to model the temporal structure of actions on the level of person tracks. Our model is trained to simultaneously recognize and localize action classes in time and is based on two layer gated recurrent units (GRU) applied s…
▽ More
This paper addresses spatio-temporal localization of human actions in video. In order to localize actions in time, we propose a recurrent localization network (RecLNet) designed to model the temporal structure of actions on the level of person tracks. Our model is trained to simultaneously recognize and localize action classes in time and is based on two layer gated recurrent units (GRU) applied separately to two streams, i.e. appearance and optical flow streams. When used together with state-of-the-art person detection and tracking, our model is shown to improve substantially spatio-temporal action localization in videos. The gain is shown to be mainly due to improved temporal localization. We evaluate our method on two recent datasets for spatio-temporal action localization, UCF101-24 and DALY, demonstrating a significant improvement of the state of the art.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Synthetic simulations of the extragalactic sky seen by eROSITA. I. Pre-launch selection functions from Monte-Carlo simulations
Authors:
N. Clerc,
M. E. Ramos-Ceja,
J. Ridl,
G. Lamer,
H. Brunner,
F. Hofmann,
J. Comparat,
F. Pacaud,
F. Käfer,
T. H. Reiprich,
A. Merloni,
C. Schmid,
T. Brand,
J. Wilms,
P. Friedrich,
A. Finoguenov,
T. Dauser,
I. Kreykenbohm
Abstract:
Studies of galaxy clusters provide stringent constraints on models of structure formation. Provided that selection effects are under control, large X-ray surveys are well suited to derive cosmological parameters, in particular those governing the dark energy equation of state. We forecast the capabilities of the all-sky eROSITA (the extended ROentgen Survey with an Imaging Telescope Array) survey…
▽ More
Studies of galaxy clusters provide stringent constraints on models of structure formation. Provided that selection effects are under control, large X-ray surveys are well suited to derive cosmological parameters, in particular those governing the dark energy equation of state. We forecast the capabilities of the all-sky eROSITA (the extended ROentgen Survey with an Imaging Telescope Array) survey to be achieved by the early 2020s. We bring special attention to modeling the entire chain from photon emission to source detection and cataloguing. The selection function of galaxy clusters for the upcoming eROSITA mission is investigated by means of extensive and dedicated Monte-Carlo simulations. Employing a combination of accurate instrument characterization and of state-of-the-art source detection technique, we determine a cluster detection efficiency based on the cluster fluxes and sizes. Using this eROSITA cluster selection function, we find that eROSITA will detect a total of $\sim 10^5$ clusters in the extra-galactic sky. This number of clusters will allow eROSITA to put stringent constraints on cosmological models. We show that incomplete assumptions on selection effects, such as neglecting the distribution of cluster sizes, induce a bias in the derived value of cosmological parameters. Synthetic simulations of the eROSITA sky capture the essential characteristics impacting the next-generation galaxy cluster surveys and they highlight parameters requiring tight monitoring in order to avoid biases in cosmological analyses.
△ Less
Submitted 22 June, 2018;
originally announced June 2018.
-
Spreading vectors for similarity search
Authors:
Alexandre Sablayrolles,
Matthijs Douze,
Cordelia Schmid,
Hervé Jégou
Abstract:
Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods. State-of-the-art techniques learn parameters of quantizers on training data for optimal performance, thus adapting quantizers to the data. In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net which last layer forms a fixed parameter-free quanti…
▽ More
Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods. State-of-the-art techniques learn parameters of quantizers on training data for optimal performance, thus adapting quantizers to the data. In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net which last layer forms a fixed parameter-free quantizer, such as pre-defined points of a hyper-sphere. As a proxy objective, we design and train a neural network that favors uniformity in the spherical latent space, while preserving the neighborhood structure after the mapping. We propose a new regularizer derived from the Kozachenko--Leonenko differential entropy estimator to enforce uniformity and combine it with a locality-aware triplet loss. Experiments show that our end-to-end approach outperforms most learned quantization methods, and is competitive with the state of the art on widely adopted benchmarks. Furthermore, we show that training without the quantization step results in almost no difference in accuracy, but yields a generic catalyzer that can be applied with any subsequent quantizer.
△ Less
Submitted 30 August, 2019; v1 submitted 8 June, 2018;
originally announced June 2018.
-
Unsupervised Learning of Artistic Styles with Archetypal Style Analysis
Authors:
Daan Wynen,
Cordelia Schmid,
Julien Mairal
Abstract:
In this paper, we introduce an unsupervised learning approach to automatically discover, summarize, and manipulate artistic styles from large collections of paintings. Our method is based on archetypal analysis, which is an unsupervised learning technique akin to sparse coding with a geometric interpretation. When applied to deep image representations from a collection of artworks, it learns a dic…
▽ More
In this paper, we introduce an unsupervised learning approach to automatically discover, summarize, and manipulate artistic styles from large collections of paintings. Our method is based on archetypal analysis, which is an unsupervised learning technique akin to sparse coding with a geometric interpretation. When applied to deep image representations from a collection of artworks, it learns a dictionary of archetypal styles, which can be easily visualized. After training the model, the style of a new image, which is characterized by local statistics of deep visual features, is approximated by a sparse convex combination of archetypes. This enables us to interpret which archetypal styles are present in the input image, and in which proportion. Finally, our approach allows us to manipulate the coefficients of the latent archetypal decomposition, and achieve various special effects such as style enhancement, transfer, and interpolation between multiple archetypes.
△ Less
Submitted 2 October, 2018; v1 submitted 28 May, 2018;
originally announced May 2018.
-
Actor and Observer: Joint Modeling of First and Third-Person Videos
Authors:
Gunnar A. Sigurdsson,
Abhinav Gupta,
Cordelia Schmid,
Ali Farhadi,
Karteek Alahari
Abstract:
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such models for human action recognition has not been achievable due to the lack of data. This paper takes a st…
▽ More
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such models for human action recognition has not been achievable due to the lack of data. This paper takes a step in this direction, with the introduction of Charades-Ego, a large-scale dataset of paired first-person and third-person videos, involving 112 people, with 4000 paired videos. This enables learning the link between the two, actor and observer perspectives. Thereby, we address one of the biggest bottlenecks facing egocentric vision research, providing a link from first-person to the abundant third-person data on the web. We use this data to learn a joint representation of first and third-person videos, with only weak supervision, and show its effectiveness for transferring knowledge from the third-person to the first-person domain.
△ Less
Submitted 25 April, 2018;
originally announced April 2018.
-
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
Authors:
Gunnar A. Sigurdsson,
Abhinav Gupta,
Cordelia Schmid,
Ali Farhadi,
Karteek Alahari
Abstract:
In Actor and Observer we introduced a dataset linking the first and third-person video understanding domains, the Charades-Ego Dataset. In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Chara…
▽ More
In Actor and Observer we introduced a dataset linking the first and third-person video understanding domains, the Charades-Ego Dataset. In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Charades-Ego furthermore shares activity classes, scripts, and methodology with the Charades dataset, that consist of additional 82.3 hours of third-person video with 66,500 activity instances. Charades-Ego has temporal annotations and textual descriptions, making it suitable for egocentric video classification, localization, captioning, and new tasks utilizing the cross-modal nature of the data.
△ Less
Submitted 30 April, 2018; v1 submitted 25 April, 2018;
originally announced April 2018.
-
BodyNet: Volumetric Inference of 3D Human Body Shapes
Authors:
Gül Varol,
Duygu Ceylan,
Bryan Russell,
Jimei Yang,
Ersin Yumer,
Ivan Laptev,
Cordelia Schmid
Abstract:
Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue f…
▽ More
Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.
△ Less
Submitted 18 August, 2018; v1 submitted 13 April, 2018;
originally announced April 2018.
-
LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images
Authors:
Gregory Rogez,
Philippe Weinzaepfel,
Cordelia Schmid
Abstract:
We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D poses of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our Localization-Classification-R…
▽ More
We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D poses of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our Localization-Classification-Regression architecture, named LCR-Net, contains 3 main components: 1) the pose proposal generator that suggests candidate poses at different locations in the image; 2) a classifier that scores the different pose proposals; and 3) a regressor that refines pose proposals both in 2D and 3D. All three stages share the convolutional feature layers and are trained jointly. The final pose estimation is obtained by integrating over neighboring pose hypotheses, which is shown to improve over a standard non maximum suppression algorithm. Our method recovers full-body 2D and 3D poses, hallucinating plausible body parts when the persons are partially occluded or truncated by the image boundary. Our approach significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment. Moreover, it shows promising results on real images for both single and multi-person subsets of the MPII 2D pose benchmark and demonstrates satisfying 3D pose results even for multi-person images.
△ Less
Submitted 13 January, 2019; v1 submitted 1 March, 2018;
originally announced March 2018.
-
Image-based Synthesis for Deep 3D Human Pose Estimation
Authors:
Grégory Rogez,
Cordelia Schmid
Abstract:
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based sy…
▽ More
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D motion capture data. Given a candidate 3D pose, our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a $K$-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms most of the published works in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for real-world images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images. Compared to data generated from more classical rendering engines, our synthetic images do not require any domain adaptation or fine-tuning stage.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Learning to Segment Moving Objects
Authors:
Pavel Tokmakov,
Cordelia Schmid,
Karteek Alahari
Abstract:
We study the problem of segmenting moving objects in unconstrained videos. Given a video, the task is to segment all the objects that exhibit independent motion in at least one frame. We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to c…
▽ More
We study the problem of segmenting moving objects in unconstrained videos. Given a video, the task is to segment all the objects that exhibit independent motion in at least one frame. We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to correct errors in motion estimation, and (iii) temporal consistency, which imposes additional constraints on the segmentation. The framework is a two-stream neural network with an explicit memory module. The two streams encode appearance and motion cues in a video sequence respectively, while the memory module captures the evolution of objects over time, exploiting the temporal consistency. The motion stream is a convolutional neural network trained on synthetic videos to segment independently moving objects in the optical flow field. The module to build a 'visual memory' in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences.
For every pixel in a frame of a test video, our approach assigns an object or background label based on the learned spatio-temporal features as well as the 'visual memory' specific to the video. We evaluate our method extensively on three benchmarks, DAVIS, Freiburg-Berkeley motion segmentation dataset and SegTrack. In addition, we provide an extensive ablation study to investigate both the choice of the training data and the influence of each component in the proposed framework.
△ Less
Submitted 1 December, 2017;
originally announced December 2017.
-
Incremental Learning of Object Detectors without Catastrophic Forgetting
Authors:
Konstantin Shmelkov,
Cordelia Schmid,
Karteek Alahari
Abstract:
Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i.e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data. They suffer from "catastrophic forgetting" - an abrupt degradation of performance on the original set of classes, when the traini…
▽ More
Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i.e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data. They suffer from "catastrophic forgetting" - an abrupt degradation of performance on the original set of classes, when the training objective is adapted to the new classes. We present a method to address this issue, and learn object detectors incrementally, when neither the original training data nor annotations for the original classes in the new training set are available. The core of our proposed solution is a loss function to balance the interplay between predictions on the new classes and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the updated networks. This incremental learning can be performed multiple times, for a new set of classes in each step, with a moderate drop in performance compared to the baseline network trained on the ensemble of data. We present object detection results on the PASCAL VOC 2007 and COCO datasets, along with a detailed empirical analysis of the approach.
△ Less
Submitted 23 August, 2017;
originally announced August 2017.
-
BlitzNet: A Real-Time Deep Network for Scene Understanding
Authors:
Nikita Dvornik,
Konstantin Shmelkov,
Julien Mairal,
Cordelia Schmid
Abstract:
Real-time scene understanding has become crucial in many applications such as autonomous driving. In this paper, we propose a deep architecture, called BlitzNet, that jointly performs object detection and semantic segmentation in one forward pass, allowing real-time computations. Besides the computational gain of having a single network to perform several tasks, we show that object detection and s…
▽ More
Real-time scene understanding has become crucial in many applications such as autonomous driving. In this paper, we propose a deep architecture, called BlitzNet, that jointly performs object detection and semantic segmentation in one forward pass, allowing real-time computations. Besides the computational gain of having a single network to perform several tasks, we show that object detection and semantic segmentation benefit from each other in terms of accuracy. Experimental results for VOC and COCO datasets show state-of-the-art performance for object detection and segmentation among real time systems.
△ Less
Submitted 9 August, 2017;
originally announced August 2017.
-
Exponential Random Graph Models with Big Networks: Maximum Pseudolikelihood Estimation and the Parametric Bootstrap
Authors:
Christian S. Schmid,
Bruce A. Desmarais
Abstract:
With the growth of interest in network data across fields, the Exponential Random Graph Model (ERGM) has emerged as the leading approach to the statistical analysis of network data. ERGM parameter estimation requires the approximation of an intractable normalizing constant. Simulation methods represent the state-of-the-art approach to approximating the normalizing constant, leading to estimation b…
▽ More
With the growth of interest in network data across fields, the Exponential Random Graph Model (ERGM) has emerged as the leading approach to the statistical analysis of network data. ERGM parameter estimation requires the approximation of an intractable normalizing constant. Simulation methods represent the state-of-the-art approach to approximating the normalizing constant, leading to estimation by Monte Carlo maximum likelihood (MCMLE). MCMLE is accurate when a large sample of networks is used to approximate the normalizing constant. However, MCMLE is computationally expensive, and may be prohibitively so if the size of the network is on the order of 1,000 nodes (i.e., one million potential ties) or greater. When the network is large, one option is maximum pseudolikelihood estimation (MPLE). The standard MPLE is simple and fast, but generally underestimates standard errors. We show that a resampling method---the parametric bootstrap---results in accurate coverage probabilities for confidence intervals. We find that bootstrapped MPLE can be run in 1/5th the time of MCMLE. We study the relative performance of MCMLE and MPLE with simulation studies, and illustrate the two different approaches by applying them to a network of bills introduced in the United State Senate.
△ Less
Submitted 8 August, 2017;
originally announced August 2017.
-
Weakly-supervised learning of visual relations
Authors:
Julia Peyre,
Ivan Laptev,
Cordelia Schmid,
Josef Sivic
Abstract:
This paper introduces a novel approach for modeling visual relations between pairs of objects. We call relation a triplet of the form (subject, predicate, object) where the predicate is typically a preposition (eg. 'under', 'in front of') or a verb ('hold', 'ride') that links a pair of objects (subject, object). Learning such relations is challenging as the objects have different spatial configura…
▽ More
This paper introduces a novel approach for modeling visual relations between pairs of objects. We call relation a triplet of the form (subject, predicate, object) where the predicate is typically a preposition (eg. 'under', 'in front of') or a verb ('hold', 'ride') that links a pair of objects (subject, object). Learning such relations is challenging as the objects have different spatial configurations and appearances depending on the relation in which they occur. Another major challenge comes from the difficulty to get annotations, especially at box-level, for all possible triplets, which makes both learning and evaluation difficult. The contributions of this paper are threefold. First, we design strong yet flexible visual features that encode the appearance and spatial configuration for pairs of objects. Second, we propose a weakly-supervised discriminative clustering model to learn relations from image-level labels only. Third we introduce a new challenging dataset of unusual relations (UnRel) together with an exhaustive annotation, that enables accurate evaluation of visual relation retrieval. We show experimentally that our model results in state-of-the-art results on the visual relationship dataset significantly improving performance on previously unseen relations (zero-shot learning), and confirm this observation on our newly introduced UnRel dataset.
△ Less
Submitted 29 July, 2017;
originally announced July 2017.
-
High resolution ion trap time-of-flight mass spectrometer for cold trapped ion experiments
Authors:
Philipp C. Schmid,
James Greenberg,
Mikhail I. Miller,
Kevin Loeffler,
Heather J. Lewandowski
Abstract:
Trapping molecular ions that have been sympathetically cooled with laser-cooled atomic ions is a useful platform for exploring cold ion chemistry. We designed and characterized a new experimental apparatus for probing chemical reaction dynamics between molecular cations and neutral radicals at temperatures below 1 K. The ions are trapped in a linear quadrupole radio-frequency trap and sympathetica…
▽ More
Trapping molecular ions that have been sympathetically cooled with laser-cooled atomic ions is a useful platform for exploring cold ion chemistry. We designed and characterized a new experimental apparatus for probing chemical reaction dynamics between molecular cations and neutral radicals at temperatures below 1 K. The ions are trapped in a linear quadrupole radio-frequency trap and sympathetically cooled by co-trapped, laser-cooled, atomic ions. The ion trap is coupled to a time-of-flight mass spectrometer to readily identify product ion species, as well as to accurately determine trapped ion numbers. We discuss, and present in detail, the design of this ion trap time-of-flight mass spectrometer, as well as the electronics required for driving the trap and mass spectrometer. Furthermore, we measure the performance of this system, which yields mass resolutions of $m/Δm \geq 1100$ over a wide mass range, and discuss its relevance for future measurements in chemical reaction kinetics and dynamics.
△ Less
Submitted 21 July, 2017;
originally announced July 2017.
-
Detecting Parts for Action Localization
Authors:
Nicolas Chesneau,
Grégory Rogez,
Karteek Alahari,
Cordelia Schmid
Abstract:
In this paper, we propose a new framework for action localization that tracks people in videos and extracts full-body human tubes, i.e., spatio-temporal regions localizing actions, even in the case of occlusions or truncations. This is achieved by training a novel human part detector that scores visible parts while regressing full-body bounding boxes. The core of our method is a convolutional neur…
▽ More
In this paper, we propose a new framework for action localization that tracks people in videos and extracts full-body human tubes, i.e., spatio-temporal regions localizing actions, even in the case of occlusions or truncations. This is achieved by training a novel human part detector that scores visible parts while regressing full-body bounding boxes. The core of our method is a convolutional neural network which learns part proposals specific to certain body parts. These are then combined to detect people robustly in each frame. Our tracking algorithm connects the image detections temporally to extract full-body human tubes. We apply our new tube extraction method on the problem of human action localization, on the popular JHMDB dataset, and a very recent challenging dataset DALY (Daily Action Localization in YouTube), showing state-of-the-art results.
△ Less
Submitted 21 July, 2017; v1 submitted 19 July, 2017;
originally announced July 2017.
-
Developing the Path Signature Methodology and its Application to Landmark-based Human Action Recognition
Authors:
Weixin Yang,
Terry Lyons,
Hao Ni,
Cordelia Schmid,
Lianwen Jin
Abstract:
Landmark-based human action recognition in videos is a challenging task in computer vision. One key step is to design a generic approach that generates discriminative features for the spatial structure and temporal dynamics. To this end, we regard the evolving landmark data as a high-dimensional path and apply non-linear path signature techniques to provide an expressive, robust, non-linear, and i…
▽ More
Landmark-based human action recognition in videos is a challenging task in computer vision. One key step is to design a generic approach that generates discriminative features for the spatial structure and temporal dynamics. To this end, we regard the evolving landmark data as a high-dimensional path and apply non-linear path signature techniques to provide an expressive, robust, non-linear, and interpretable representation for the sequential events. We do not extract signature features from the raw path, rather we propose path disintegrations and path transformations as preprocessing steps. Path disintegrations turn a high-dimensional path linearly into a collection of lower-dimensional paths; some of these paths are in pose space while others are defined over a multiscale collection of temporal intervals. Path transformations decorate the paths with additional coordinates in standard ways to allow the truncated signatures of transformed paths to expose additional features. For spatial representation, we apply the signature transform to vectorize the paths that arise out of pose disintegration, and for temporal representation, we apply it again to describe this evolving vectorization. Finally, all the features are collected together to constitute the input vector of a linear single-hidden-layer fully-connected network for classification. Experimental results on four datasets demonstrated that the proposed feature set with only a linear shallow network and Dropconnect is effective and achieves comparable state-of-the-art results to the advanced deep networks, and meanwhile, is capable of interpretation.
△ Less
Submitted 12 December, 2019; v1 submitted 13 July, 2017;
originally announced July 2017.
-
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Authors:
Chunhui Gu,
Chen Sun,
David A. Ross,
Carl Vondrick,
Caroline Pantofaru,
Yeqing Li,
Sudheendra Vijayanarasimhan,
George Toderici,
Susanna Ricco,
Rahul Sukthankar,
Cordelia Schmid,
Jitendra Malik
Abstract:
This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual…
▽ More
This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. We will release the dataset publicly.
AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding.
△ Less
Submitted 30 April, 2018; v1 submitted 23 May, 2017;
originally announced May 2017.
-
SCNet: Learning Semantic Correspondence
Authors:
Kai Han,
Rafael S. Rezende,
Bumsub Ham,
Kwan-Yee K. Wong,
Minsu Cho,
Cordelia Schmid,
Jean Ponce
Abstract:
This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category. Previous approaches focus on either combining a spatial regularizer with hand-crafted features, or learning a correspondence model for appearance only. We propose instead a convolutional neural network architecture, called SCNet, for learning…
▽ More
This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category. Previous approaches focus on either combining a spatial regularizer with hand-crafted features, or learning a correspondence model for appearance only. We propose instead a convolutional neural network architecture, called SCNet, for learning a geometrically plausible model for semantic correspondence. SCNet uses region proposals as matching primitives, and explicitly incorporates geometric consistency in its loss function. It is trained on image pairs obtained from the PASCAL VOC 2007 keypoint dataset, and a comparative evaluation on several standard benchmarks demonstrates that the proposed approach substantially outperforms both recent deep learning architectures and previous methods based on hand-crafted features.
△ Less
Submitted 17 August, 2017; v1 submitted 11 May, 2017;
originally announced May 2017.
-
Action Tubelet Detector for Spatio-Temporal Action Localization
Authors:
Vicky Kalogeiton,
Philippe Weinzaepfel,
Vittorio Ferrari,
Cordelia Schmid
Abstract:
Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level that are then linked or tracked across time. In this paper, we leverage the temporal continuity of videos instead of operating at the frame level. We propose the ACtion Tubelet detector (ACT-detector) that takes as input a sequence of frames and outputs tubelets, i.e., sequences of bou…
▽ More
Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level that are then linked or tracked across time. In this paper, we leverage the temporal continuity of videos instead of operating at the frame level. We propose the ACtion Tubelet detector (ACT-detector) that takes as input a sequence of frames and outputs tubelets, i.e., sequences of bounding boxes with associated scores. The same way state-of-the-art object detectors rely on anchor boxes, our ACT-detector is based on anchor cuboids. We build upon the SSD framework. Convolutional features are extracted for each frame, while scores and regressions are based on the temporal stacking of these features, thus exploiting information from a sequence. Our experimental results show that leveraging sequences of frames significantly improves detection performance over using individual frames. The gain of our tubelet detector can be explained by both more accurate scores and more precise localization. Our ACT-detector outperforms the state-of-the-art methods for frame-mAP and video-mAP on the J-HMDB and UCF-101 datasets, in particular at high overlap thresholds.
△ Less
Submitted 21 August, 2017; v1 submitted 4 May, 2017;
originally announced May 2017.
-
SfM-Net: Learning of Structure and Motion from Video
Authors:
Sudheendra Vijayanarasimhan,
Susanna Ricco,
Cordelia Schmid,
Rahul Sukthankar,
Katerina Fragkiadaki
Abstract:
We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), different…
▽ More
We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates. The model can be trained with various degrees of supervision: 1) self-supervised by the re-projection photometric error (completely unsupervised), 2) supervised by ego-motion (camera motion), or 3) supervised by depth (e.g., as provided by RGBD sensors). SfM-Net extracts meaningful depth estimates and successfully estimates frame-to-frame camera rotations and translations. It often successfully segments the moving objects in the scene, even though such supervision is never provided.
△ Less
Submitted 25 April, 2017;
originally announced April 2017.
-
Learning Video Object Segmentation with Visual Memory
Authors:
Pavel Tokmakov,
Karteek Alahari,
Cordelia Schmid
Abstract:
This paper addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a "visual memory" in video, i.…
▽ More
This paper addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a "visual memory" in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given a video frame as input, our approach assigns each pixel an object or background label based on the learned spatio-temporal features as well as the "visual memory" specific to the video, acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows to propagate spatial information over time. We evaluate our method extensively on two benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show state-of-the-art results. For example, our approach outperforms the top method on the DAVIS dataset by nearly 6%. We also provide an extensive ablative analysis to investigate the influence of each component in the proposed framework.
△ Less
Submitted 12 July, 2017; v1 submitted 19 April, 2017;
originally announced April 2017.
-
Proposal Flow: Semantic Correspondences from Object Proposals
Authors:
Bumsub Ham,
Minsu Cho,
Cordelia Schmid,
Jean Ponce
Abstract:
Finding image correspondences remains a challenging problem in the presence of intra-class variations and large changes in scene layout. Semantic flow methods are designed to handle images depicting different instances of the same object or scene category. We introduce a novel approach to semantic flow, dubbed proposal flow, that establishes reliable correspondences using object proposals. Unlike…
▽ More
Finding image correspondences remains a challenging problem in the presence of intra-class variations and large changes in scene layout. Semantic flow methods are designed to handle images depicting different instances of the same object or scene category. We introduce a novel approach to semantic flow, dubbed proposal flow, that establishes reliable correspondences using object proposals. Unlike prevailing semantic flow approaches that operate on pixels or regularly sampled local regions, proposal flow benefits from the characteristics of modern object proposals, that exhibit high repeatability at multiple scales, and can take advantage of both local and geometric consistency constraints among proposals. We also show that the corresponding sparse proposal flow can effectively be transformed into a conventional dense flow field. We introduce two new challenging datasets that can be used to evaluate both general semantic flow techniques and region-based approaches such as proposal flow. We use these benchmarks to compare different matching algorithms, object proposals, and region features within proposal flow, to the state of the art in semantic flow. This comparison, along with experiments on standard datasets, demonstrates that proposal flow significantly outperforms existing semantic flow methods in various settings.
△ Less
Submitted 21 March, 2017;
originally announced March 2017.
-
Learning from Synthetic Humans
Authors:
Gül Varol,
Javier Romero,
Xavier Martin,
Naureen Mahmood,
Michael J. Black,
Ivan Laptev,
Cordelia Schmid
Abstract:
Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In…
▽ More
Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.
△ Less
Submitted 19 January, 2018; v1 submitted 5 January, 2017;
originally announced January 2017.
-
Learning Motion Patterns in Videos
Authors:
Pavel Tokmakov,
Karteek Alahari,
Cordelia Schmid
Abstract:
The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved. We address this challenging task by learning motion patterns in videos. The core of our approach is a fully convolutional network, which is learned entirely from synthetic video sequences, and their ground-truth optical flow and motion segmentation. This encoder-decoder style archite…
▽ More
The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved. We address this challenging task by learning motion patterns in videos. The core of our approach is a fully convolutional network, which is learned entirely from synthetic video sequences, and their ground-truth optical flow and motion segmentation. This encoder-decoder style architecture first learns a coarse representation of the optical flow field features, and then refines it iteratively to produce motion labels at the original high-resolution. We further improve this labeling with an objectness map and a conditional random field, to account for errors in optical flow, and also to focus on moving "things" rather than "stuff". The output label of each pixel denotes whether it has undergone independent motion, i.e., irrespective of camera motion. We demonstrate the benefits of this learning framework on the moving object segmentation task, where the goal is to segment all objects in motion. Our approach outperforms the top method on the recently released DAVIS benchmark dataset, comprising real-world sequences, by 5.6%. We also evaluate on the Berkeley motion segmentation database, achieving state-of-the-art results.
△ Less
Submitted 10 April, 2017; v1 submitted 21 December, 2016;
originally announced December 2016.
-
Little String Defects and Bala-Carter Theory
Authors:
Nathan Haouzi,
Christian Schmid
Abstract:
We give a physical realization of the Bala-Carter labels that classify nilpotent orbits of semi-simple Lie algebras, for the case $\mathfrak{g}=A,D,E$. We start from type IIB string theory compactified on an $ADE$ singularity and study the six-dimensional (2,0) $\mathfrak{g}$-type little string on a Riemann surface with punctures. The defects are introduced as D-branes wrapping the 2-cycles of the…
▽ More
We give a physical realization of the Bala-Carter labels that classify nilpotent orbits of semi-simple Lie algebras, for the case $\mathfrak{g}=A,D,E$. We start from type IIB string theory compactified on an $ADE$ singularity and study the six-dimensional (2,0) $\mathfrak{g}$-type little string on a Riemann surface with punctures. The defects are introduced as D-branes wrapping the 2-cycles of the singularity. At low energies, the little string becomes the (2,0) conformal field theory of type $\mathfrak{g}$. As an application, we derive the full list of $E_n$ little string defects, and their Bala-Carter label in the CFT limit. Furthermore, we investigate new relations between the quiver gauge theory describing the D-brane defects at low energies, and the weighted Dynkin diagrams of $\mathfrak{g}$. We also give a physical version of the dimension formula of a nilpotent orbit based on its weighted Dynkin diagram.
△ Less
Submitted 16 December, 2016; v1 submitted 6 December, 2016;
originally announced December 2016.
-
Areas of Attention for Image Captioning
Authors:
Marco Pedersoli,
Thomas Lucas,
Cordelia Schmid,
Jakob Verbeek
Abstract:
We propose "Areas of Attention", a novel attention-based model for automatic image captioning. Our approach models the dependencies between image regions, caption words, and the state of an RNN language model, using three pairwise interactions. In contrast to previous attention-based approaches that associate image regions only to the RNN state, our method allows a direct association between capti…
▽ More
We propose "Areas of Attention", a novel attention-based model for automatic image captioning. Our approach models the dependencies between image regions, caption words, and the state of an RNN language model, using three pairwise interactions. In contrast to previous attention-based approaches that associate image regions only to the RNN state, our method allows a direct association between caption words and image regions. During training these associations are inferred from image-level captions, akin to weakly-supervised object detector training. These associations help to improve captioning by localizing the corresponding regions during testing. We also propose and compare different ways of generating attention areas: CNN activation grids, object proposals, and spatial transformers nets applied in a convolutional fashion. Spatial transformers give the best results. They allow for image specific attention areas, and can be trained jointly with the rest of the network. Our attention mechanism and spatial transformer attention areas together yield state-of-the-art results on the MSCOCO dataset.o meaningful latent semantic structure in the generated captions.
△ Less
Submitted 25 August, 2017; v1 submitted 3 December, 2016;
originally announced December 2016.
-
Einstein's $R^{\hat{0} \hat{0}}$ equation for non-relativistic sources derived from Einstein's inertial motion and the Newtonian law for relative acceleration
Authors:
Christoph Schmid
Abstract:
With Einstein's inertial motion (free-falling and non-rotating relative to gyroscopes), geodesics for non-relativistic particles can intersect repeatedly, allowing one to compute the space-time curvature $R^{\hat{0} \hat{0}}$ exactly. Einstein's $R^{\hat{0} \hat{0}}$ for strong gravitational fields and for relativistic source-matter is identical with the Newtonian expression for the relative radia…
▽ More
With Einstein's inertial motion (free-falling and non-rotating relative to gyroscopes), geodesics for non-relativistic particles can intersect repeatedly, allowing one to compute the space-time curvature $R^{\hat{0} \hat{0}}$ exactly. Einstein's $R^{\hat{0} \hat{0}}$ for strong gravitational fields and for relativistic source-matter is identical with the Newtonian expression for the relative radial acceleration of neighboring free-falling test-particles, spherically averaged.--- Einstein's field equations follow from Newtonian experiments, local Lorentz-covariance, and energy-momentum conservation combined with the Bianchi identity.
△ Less
Submitted 7 September, 2016;
originally announced September 2016.
-
Little String Origin of Surface Defects
Authors:
Nathan Haouzi,
Christian Schmid
Abstract:
We derive the codimension-two defects of 4d $\mathcal{N} = 4$ Super Yang-Mills (SYM) theory from the (2, 0) little string. The origin of the little string is type IIB theory compactified on an ADE singularity. The defects are D-branes wrapping the 2-cycles of the singularity. We use this construction to make contact with the description of SYM defects due to Gukov and Witten [arXiv:hep-th/0612073]…
▽ More
We derive the codimension-two defects of 4d $\mathcal{N} = 4$ Super Yang-Mills (SYM) theory from the (2, 0) little string. The origin of the little string is type IIB theory compactified on an ADE singularity. The defects are D-branes wrapping the 2-cycles of the singularity. We use this construction to make contact with the description of SYM defects due to Gukov and Witten [arXiv:hep-th/0612073]. Furthermore, we derive from a geometric perspective the complete nilpotent orbit classification of codimension-two defects, and the connection to ADE-type Toda CFT. The only data needed to specify the defects is a set of weights of the algebra obeying certain constraints, which we give explicitly. We highlight the differences between the defect classification in the little string theory and its (2, 0) CFT limit.
△ Less
Submitted 18 December, 2016; v1 submitted 25 August, 2016;
originally announced August 2016.
-
Einstein's equations from Einstein's inertial motion and Newton's law for relative acceleration
Authors:
Christoph Schmid
Abstract:
We show that Einstein's $R^{\hat{0} \hat{0}}$ equation for nonrelativistic matter and strong gravitational fields is identical with Newton's equation for relative radial acceleration of neighbouring freefalling particles, spherically averaged. These laws are explicitely identical with primary observer's (1) space-time slicing by radial 4-geodesics, (2) radially parallel Local Ortho-Normal Bases, L…
▽ More
We show that Einstein's $R^{\hat{0} \hat{0}}$ equation for nonrelativistic matter and strong gravitational fields is identical with Newton's equation for relative radial acceleration of neighbouring freefalling particles, spherically averaged. These laws are explicitely identical with primary observer's (1) space-time slicing by radial 4-geodesics, (2) radially parallel Local Ortho-Normal Bases, LONBs, (3) Riemann normal 3-coordinates. Hats on indices denote LONBs. General relativity follows from Newton's law of relative acceleration, Einstein's inertial motion, Lorentz covariance, and energy-momentum conservation combined with Bianchi identity. The gravitational field equation of Newton-Gauss and Einstein's $R^{\hat{0} \hat{0}}$ equation are identical and linear in gravitational field for an inertial primary observer.---
Einstein's equivalence between fictitious forces and gravitational forces is formulated as equivalence theorem in the equations of motion. With this, the gravitational field equation of 19th-century Newtonian physics and Einstein's equation for $R^{\hat{0} \hat{0}}$ are identical and bilinear in the gravitational forces for non-inertial primary observers.---
$R^{\hat{0} \hat{0}} = - div \vec{E}_g$ and $2 R^{\hat{i} \hat{0}} = - (curl \vec{B}_g)^{\hat{i}}$ hold exactly for inertial primary observers, if one uses our LONB's. The gravitational $\vec{E}_g, \vec{B}_g $ are measured exactly with quasistatic particles via $(d/dt) p_{\hat{i}}$ and $(d/dt) S_{\hat{i}}$ in correspondence with the electromagnetic $\vec{E}$ and $\vec{B}$. The $(\vec{E}_g, \vec{B}_g)$ are identical with the observer's Ricci connection along his worldline, $(ω_{\hat{a} \hat{b}})_{\hat{0}}$.
△ Less
Submitted 28 July, 2016;
originally announced July 2016.
-
MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
Authors:
Grégory Rogez,
Cordelia Schmid
Abstract:
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based sy…
▽ More
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a K-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms the state of the art in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for in-the-wild images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images.
△ Less
Submitted 28 October, 2016; v1 submitted 7 July, 2016;
originally announced July 2016.
-
The Detailed Science Case for the Maunakea Spectroscopic Explorer: the Composition and Dynamics of the Faint Universe
Authors:
Alan McConnachie,
Carine Babusiaux,
Michael Balogh,
Simon Driver,
Pat Côté,
Helene Courtois,
Luke Davies,
Laura Ferrarese,
Sarah Gallagher,
Rodrigo Ibata,
Nicolas Martin,
Aaron Robotham,
Kim Venn,
Eva Villaver,
Jo Bovy,
Alessandro Boselli,
Matthew Colless,
Johan Comparat,
Kelly Denny,
Pierre-Alain Duc,
Sara Ellison,
Richard de Grijs,
Mirian Fernandez-Lorenzo,
Ken Freeman,
Raja Guhathakurta
, et al. (152 additional authors not shown)
Abstract:
MSE is an 11.25m aperture observatory with a 1.5 square degree field of view that will be fully dedicated to multi-object spectroscopy. More than 3200 fibres will feed spectrographs operating at low (R ~ 2000 - 3500) and moderate (R ~ 6000) spectral resolution, and approximately 1000 fibers will feed spectrographs operating at high (R ~ 40000) resolution. MSE is designed to enable transformational…
▽ More
MSE is an 11.25m aperture observatory with a 1.5 square degree field of view that will be fully dedicated to multi-object spectroscopy. More than 3200 fibres will feed spectrographs operating at low (R ~ 2000 - 3500) and moderate (R ~ 6000) spectral resolution, and approximately 1000 fibers will feed spectrographs operating at high (R ~ 40000) resolution. MSE is designed to enable transformational science in areas as diverse as tomographic mapping of the interstellar and intergalactic media; the in-situ chemical tagging of thick disk and halo stars; connecting galaxies to their large scale structure; measuring the mass functions of cold dark matter sub-halos in galaxy and cluster-scale hosts; reverberation mapping of supermassive black holes in quasars; next generation cosmological surveys using redshift space distortions and peculiar velocities. MSE is an essential follow-up facility to current and next generations of multi-wavelength imaging surveys, including LSST, Gaia, Euclid, WFIRST, PLATO, and the SKA, and is designed to complement and go beyond the science goals of other planned and current spectroscopic capabilities like VISTA/4MOST, WHT/WEAVE, AAT/HERMES and Subaru/PFS. It is an ideal feeder facility for E-ELT, TMT and GMT, and provides the missing link between wide field imaging and small field precision astronomy. MSE is optimized for high throughput, high signal-to-noise observations of the faintest sources in the Universe with high quality calibration and stability being ensured through the dedicated operational mode of the observatory. (abridged)
△ Less
Submitted 31 May, 2016;
originally announced June 2016.
-
Human Action Localization with Sparse Spatial Supervision
Authors:
Philippe Weinzaepfel,
Xavier Martin,
Cordelia Schmid
Abstract:
We introduce an approach for spatio-temporal human action localization using sparse spatial supervision. Our method leverages the large amount of annotated humans available today and extracts human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Given these high-quality human tubes and temporal supervision, we select positive and negative tubes with very…
▽ More
We introduce an approach for spatio-temporal human action localization using sparse spatial supervision. Our method leverages the large amount of annotated humans available today and extracts human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Given these high-quality human tubes and temporal supervision, we select positive and negative tubes with very sparse spatial supervision, i.e., only one spatially annotated frame per instance. The selected tubes allow us to effectively learn a spatio-temporal action detector based on dense trajectories or CNNs. We conduct experiments on existing action localization benchmarks: UCF-Sports, J-HMDB and UCF-101. Our results show that our approach, despite using sparse spatial supervision, performs on par with methods using full supervision, i.e., one bounding box annotation per frame. To further validate our method, we introduce DALY (Daily Action Localization in YouTube), a dataset for realistic action localization in space and time. It contains high quality temporal and spatial annotations for 3.6k instances of 10 actions in 31 hours of videos (3.3M frames). It is an order of magnitude larger than existing datasets, with more diversity in appearance and long untrimmed videos.
△ Less
Submitted 23 May, 2017; v1 submitted 17 May, 2016;
originally announced May 2016.
-
Long-term Temporal Convolutions for Action Recognition
Authors:
Gül Varol,
Ivan Laptev,
Cordelia Schmid
Abstract:
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, however, are typically learned at the level of a few video frames failing to model actions at their full temporal extent. In this work we learn video representatio…
▽ More
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, however, are typically learned at the level of a few video frames failing to model actions at their full temporal extent. In this work we learn video representations using neural networks with long-term temporal convolutions (LTC). We demonstrate that LTC-CNN models with increased temporal extents improve the accuracy of action recognition. We also study the impact of different low-level representations, such as raw values of video pixels and optical flow vector fields and demonstrate the importance of high-quality optical flow estimation for learning accurate action models. We report state-of-the-art results on two challenging benchmarks for human action recognition UCF101 (92.7%) and HMDB51 (67.2%).
△ Less
Submitted 2 June, 2017; v1 submitted 15 April, 2016;
originally announced April 2016.
-
Optimisation of the Read-out Electronics of Muon Drift-Tube Chambers for Very High Background Rates at HL-LHC and Future Colliders
Authors:
Sebastian Nowak,
Sergey Abovyan,
Philipp Gadow,
Katharina Ecker,
David Fink,
Markus Fras,
Oliver Kortner,
Hubert Kroha,
Felix Mueller,
Robert Richter,
Clemens Schmid,
Korbinian Schmidt-Sommerfeld,
Yazhou Zhao
Abstract:
In the ATLAS Muon Spectrometer, Monitored Drift Tube (MDT) chambers and sMDT chambers with half of the tube diameter of the MDTs are used for precision muon track reconstruction. The sMDT chambers are designed for operation at high counting rates due to neutron and gamma background irradiation expected for the HL-LHC and future hadron colliders. The existing MDT read-out electronics uses bipolar s…
▽ More
In the ATLAS Muon Spectrometer, Monitored Drift Tube (MDT) chambers and sMDT chambers with half of the tube diameter of the MDTs are used for precision muon track reconstruction. The sMDT chambers are designed for operation at high counting rates due to neutron and gamma background irradiation expected for the HL-LHC and future hadron colliders. The existing MDT read-out electronics uses bipolar signal shaping which causes an undershoot of opposite polarity and same charge after a signal pulse. At high counting rates and short electronics dead time used for the sMDTs, signal pulses pile up on the undershoot of preceding background pulses leading to a reduction of the signal amplitude and a jitter in the drift time measurement and, therefore, to a degradation of drift tube efficiency and spatial resolution. In order to further increase the rate capability of sMDT tubes, baseline restoration can be used in the read-out electronics to suppress the pile-up effects. A discrete bipolar shaping circuit with baseline restoration has been developed and used for reading out sMDT tubes under irradiation with a 24 MBq 90Sr source. The measurements results show a substantial improvement of the performance of the sMDT tubes at high counting rates.
△ Less
Submitted 29 March, 2016;
originally announced March 2016.