Search | arXiv e-print repository

Continuous diffusion for categorical data

Authors: Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, Rémi Leblond, Will Grathwohl, Jonas Adler

Abstract: Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous natur… ▽ More Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks. △ Less

Submitted 15 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: 26 pages, 8 figures; corrections and additional information about hyperparameters

arXiv:2211.04236 [pdf, other]

Self-conditioned Embedding Diffusion for Text Generation

Authors: Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, Rémi Leblond

Abstract: Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows… ▽ More Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 15 pages

arXiv:2209.11142 [pdf, other]

A Generalist Neural Algorithmic Learner

Authors: Borja Ibarz, Vitaly Kurin, George Papamakarios, Kyriacos Nikiforou, Mehdi Bennani, Róbert Csordás, Andrew Dudzik, Matko Bošnjak, Alex Vitvitskyi, Yulia Rubanova, Andreea Deac, Beatrice Bevilacqua, Yaroslav Ganin, Charles Blundell, Petar Veličković

Abstract: The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms… ▽ More The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models. △ Less

Submitted 3 December, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: To appear at LoG 2022 (Spotlight talk). 23 pages, 11 figures

arXiv:2105.02769 [pdf, other]

Computer-Aided Design as Language

Authors: Yaroslav Ganin, Sergey Bartunov, Yujia Li, Ethan Keller, Stefano Saliceti

Abstract: Computer-Aided Design (CAD) applications are used in manufacturing to model everything from coffee mugs to sports cars. These programs are complex and require years of training and experience to master. A component of all CAD models particularly difficult to make are the highly structured 2D sketches that lie at the heart of every 3D construction. In this work, we propose a machine learning model… ▽ More Computer-Aided Design (CAD) applications are used in manufacturing to model everything from coffee mugs to sports cars. These programs are complex and require years of training and experience to master. A component of all CAD models particularly difficult to make are the highly structured 2D sketches that lie at the heart of every 3D construction. In this work, we propose a machine learning model capable of automatically generating such sketches. Through this, we pave the way for developing intelligent tools that would help engineers create better designs with less effort. Our method is a combination of a general-purpose language modeling technique alongside an off-the-shelf data serialization protocol. We show that our approach has enough flexibility to accommodate the complexity of the domain and performs well for both unconditional synthesis and image-to-sketch translation. △ Less

Submitted 6 May, 2021; originally announced May 2021.

Comments: 24 pages, 11 figures, 3 tables

arXiv:2002.10880 [pdf, other]

PolyGen: An Autoregressive Generative Model of 3D Meshes

Authors: Charlie Nash, Yaroslav Ganin, S. M. Ali Eslami, Peter W. Battaglia

Abstract: Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models th… ▽ More Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture. Our model can condition on a range of inputs, including object classes, voxels, and images, and because the model is probabilistic it can produce samples that capture uncertainty in ambiguous scenarios. We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task. We also evaluate the conditional models on surface reconstruction metrics against alternative methods, and demonstrate competitive performance despite not training directly on this task. △ Less

Submitted 23 February, 2020; originally announced February 2020.

arXiv:1910.01007 [pdf, other]

Unsupervised Doodling and Painting with Improved SPIRAL

Authors: John F. J. Mellor, Eunbyung Park, Yaroslav Ganin, Igor Babuschkin, Tejas Kulkarni, Dan Rosenbaum, Andy Ballard, Theophane Weber, Oriol Vinyals, S. M. Ali Eslami

Abstract: We investigate using reinforcement learning agents as generative models of images (extending arXiv:1804.01118). A generative agent controls a simulated painting environment, and is trained with rewards provided by a discriminator network simultaneously trained to assess the realism of the agent's samples, either unconditional or reconstructions. Compared to prior work, we make a number of improvem… ▽ More We investigate using reinforcement learning agents as generative models of images (extending arXiv:1804.01118). A generative agent controls a simulated painting environment, and is trained with rewards provided by a discriminator network simultaneously trained to assess the realism of the agent's samples, either unconditional or reconstructions. Compared to prior work, we make a number of improvements to the architectures of the agents and discriminators that lead to intriguing and at times surprising results. We find that when sufficiently constrained, generative agents can learn to produce images with a degree of visual abstraction, despite having only ever seen real photographs (no human brush strokes). And given enough time with the painting environment, they can produce images with considerable realism. These results show that, under the right circumstances, some aspects of human drawing can emerge from simulated embodiment, without the need for external supervision, imitation or social cues. Finally, we note the framework's potential for use in creative applications. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: See https://learning-to-paint.github.io for an interactive version of this paper, with videos

ACM Class: I.2; I.4

arXiv:1804.01118 [pdf, other]

Synthesizing Programs for Images using Reinforced Adversarial Learning

Authors: Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S. M. Ali Eslami, Oriol Vinyals

Abstract: Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep… ▽ More Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are limited by hand-crafted likelihood or distance functions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and rendered data, trained with a distributed reinforcement learning setup without any supervision. A surprising finding is that using the discriminator's output as a reward signal is the key to allow the agent to make meaningful progress at matching the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial inverse graphics agent on challenging real world (MNIST, Omniglot, CelebA) and synthetic 3D datasets. △ Less

Submitted 3 April, 2018; originally announced April 2018.

Comments: 12 pages, 13 figures

arXiv:1712.04120 [pdf, other]

GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Authors: Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio

Abstract: Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet samp… ▽ More Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet sampling from them generally requires an iterative procedure such as blocked Gibbs-sampling that may require many steps to draw samples from the joint distribution $p(x, z)$. We propose a novel approach to learning the joint distribution between the data and a latent code which uses an adversarially learned iterative procedure to gradually refine the joint distribution, $p(x, z)$, to better match with the data distribution on each step. GibbsNet is the best of both worlds both in theory and in practice. Achieving the speed and simplicity of a directed latent variable model, it is guaranteed (assuming the adversarial game reaches the virtual training criteria global minimum) to produce samples from $p(x, z)$ with only a few sampling iterations. Achieving the expressiveness and flexibility of an undirected latent variable model, GibbsNet does away with the need for an explicit $p(z)$ and has the ability to do attribute prediction, class-conditional generation, and joint image-attribute modeling in a single model which is not trained for any of these specific tasks. We show empirically that GibbsNet is able to learn a more complex $p(z)$ and show that this leads to improved inpainting and iterative refinement of $p(x, z)$ for dozens of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps. △ Less

Submitted 11 December, 2017; originally announced December 2017.

Comments: NIPS 2017

arXiv:1607.07215 [pdf, other]

DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation

Authors: Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, Victor Lempitsky

Abstract: In this work, we consider the task of generating highly-realistic images of a given face with a redirected gaze. We treat this problem as a specific instance of conditional image generation and suggest a new deep architecture that can handle this task very well as revealed by numerical comparison with prior art and a user study. Our deep architecture performs coarse-to-fine warping with an additio… ▽ More In this work, we consider the task of generating highly-realistic images of a given face with a redirected gaze. We treat this problem as a specific instance of conditional image generation and suggest a new deep architecture that can handle this task very well as revealed by numerical comparison with prior art and a user study. Our deep architecture performs coarse-to-fine warping with an additional intensity correction of individual pixels. All these operations are performed in a feed-forward manner, and the parameters associated with different operations are learned jointly in the end-to-end fashion. After learning, the resulting neural network can synthesize images with manipulated gaze, while the redirection angle can be selected arbitrarily from a certain range and provided as an input to the network. △ Less

Submitted 26 July, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

Comments: Fixed typos, 14 + 2 + 2 pages, ECCV 2016

arXiv:1512.05300 [pdf, other]

Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification

Authors: Evgeniya Ustinova, Yaroslav Ganin, Victor Lempitsky

Abstract: In this work we propose a new architecture for person re-identification. As the task of re-identification is inherently associated with embedding learning and non-rigid appearance description, our architecture is based on the deep bilinear convolutional network (Bilinear-CNN) that has been proposed recently for fine-grained classification of highly non-rigid objects. While the last stages of the o… ▽ More In this work we propose a new architecture for person re-identification. As the task of re-identification is inherently associated with embedding learning and non-rigid appearance description, our architecture is based on the deep bilinear convolutional network (Bilinear-CNN) that has been proposed recently for fine-grained classification of highly non-rigid objects. While the last stages of the original Bilinear-CNN architecture completely removes the geometric information from consideration by performing orderless pooling, we observe that a better embedding can be learned by performing bilinear pooling in a more local way, where each pooling is confined to a predefined region. Our architecture thus represents a compromise between traditional convolutional networks and bilinear CNNs and strikes a balance between rigid matching and completely ignoring spatial information. We perform the experimental validation of the new architecture on the three popular benchmark datasets (Market-1501, CUHK01, CUHK03), comparing it to baselines that include Bilinear-CNN as well as prior art. The new architecture outperforms the baseline on all three datasets, while performing better than state-of-the-art on two out of three. The code and the pretrained models of the approach can be found at https://github.com/madkn/MultiregionBilinearCNN-ReId. △ Less

Submitted 6 September, 2017; v1 submitted 16 December, 2015; originally announced December 2015.

Comments: in AVSS 2017

arXiv:1505.07818 [pdf, other]

Domain-Adversarial Training of Neural Networks

Authors: Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Victor Lempitsky

Abstract: We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test… ▽ More We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains. The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages. We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application. △ Less

Submitted 26 May, 2016; v1 submitted 28 May, 2015; originally announced May 2015.

Comments: Published in JMLR: http://jmlr.org/papers/v17/15-239.html

Journal ref: Journal of Machine Learning Research 2016, vol. 17, p. 1-35

arXiv:1412.6553 [pdf, other]

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Authors: Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, Victor Lempitsky

Abstract: We propose a simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning. Given a layer, we use non-linear least squares to compute a low-rank CP-decomposition of the 4D convolution kernel tensor into a sum of a small number of rank-one tensors. At the second step, this decomposition is used to… ▽ More We propose a simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning. Given a layer, we use non-linear least squares to compute a low-rank CP-decomposition of the 4D convolution kernel tensor into a sum of a small number of rank-one tensors. At the second step, this decomposition is used to replace the original convolutional layer with a sequence of four convolutional layers with small kernels. After such replacement, the entire network is fine-tuned on the training data using standard backpropagation process. We evaluate this approach on two CNNs and show that it is competitive with previous approaches, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks. Thus, for the 36-class character classification CNN, our approach obtains a 8.5x CPU speedup of the whole network with only minor accuracy drop (1% from 91% to 90%). For the standard ImageNet architecture (AlexNet), the approach speeds up the second convolution layer by a factor of 4x at the cost of $1\%$ increase of the overall top-5 classification error. △ Less

Submitted 24 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

arXiv:1409.7495 [pdf, other]

Unsupervised Domain Adaptation by Backpropagation

Authors: Yaroslav Ganin, Victor Lempitsky

Abstract: Top-performing deep architectures are trained on massive amounts of labeled data. In the absence of labeled data for a certain task, domain adaptation often provides an attractive option given that labeled data of similar nature but from a different domain (e.g. synthetic images) are available. Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large… ▽ More Top-performing deep architectures are trained on massive amounts of labeled data. In the absence of labeled data for a certain task, domain adaptation often provides an attractive option given that labeled data of similar nature but from a different domain (e.g. synthetic images) are available. Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of "deep" features that are (i) discriminative for the main learning task on the source domain and (ii) invariant with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a simple new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation. Overall, the approach can be implemented with little effort using any of the deep-learning packages. The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets. △ Less

Submitted 27 February, 2015; v1 submitted 26 September, 2014; originally announced September 2014.

arXiv:1406.6558 [pdf, other]

$ N^4 $-Fields: Neural Network Nearest Neighbor Fields for Image Transforms

Authors: Yaroslav Ganin, Victor Lempitsky

Abstract: We propose a new architecture for difficult image processing operations, such as natural edge detection or thin object segmentation. The architecture is based on a simple combination of convolutional neural networks with the nearest neighbor search. We focus our attention on the situations when the desired image transformation is too hard for a neural network to learn explicitly. We show that in… ▽ More We propose a new architecture for difficult image processing operations, such as natural edge detection or thin object segmentation. The architecture is based on a simple combination of convolutional neural networks with the nearest neighbor search. We focus our attention on the situations when the desired image transformation is too hard for a neural network to learn explicitly. We show that in such situations, the use of the nearest neighbor search on top of the network output allows to improve the results considerably and to account for the underfitting effect during the neural network training. The approach is validated on three challenging benchmarks, where the performance of the proposed architecture matches or exceeds the state-of-the-art. △ Less

Submitted 3 July, 2014; v1 submitted 25 June, 2014; originally announced June 2014.

arXiv:1303.1968 [pdf, ps, other]

doi 10.1088/1742-6596/428/1/012002

Mott localization in the correlated superconductor Cs3C60 resulting from the molecular Jahn-Teller effect

Authors: Katalin Kamaras, Gyongyi Klupp, Peter Matus, Alexey Y. Ganin, Alec McLennan, Matthew J. Rosseinsky, Yasuhiro Takabayashi, Martin T. McDonald, Kosmas Prassides

Abstract: Cs3C60 is a correlated superconductor under pressure, but an insulator under ambient conditions. The mechanism causing this insulating behavior is the combination of Mott localization and the dynamic Jahn-Teller effect. We show evidence from infrared spectroscopy for the dynamic Jahn-Teller distortion. The continuous change with temperature of the splitting of infrared lines is typical Jahn-Teller… ▽ More Cs3C60 is a correlated superconductor under pressure, but an insulator under ambient conditions. The mechanism causing this insulating behavior is the combination of Mott localization and the dynamic Jahn-Teller effect. We show evidence from infrared spectroscopy for the dynamic Jahn-Teller distortion. The continuous change with temperature of the splitting of infrared lines is typical Jahn-Teller behavior, reflecting the change in population of solid-state conformers. We conclude that the electronic and magnetic solid-state properties of the insulating state are controlled by molecular phenomena. We estimate the time scale of the dynamic Jahn-Teller effect to be above 10^(-11) s and the energy difference between the conformers less than 20 cm-1. △ Less

Submitted 8 March, 2013; originally announced March 2013.

Comments: 6 pages, 4 figures, 1 supplementary movie; XXI International Symposium on the Jahn-Teller Effect, Tsukuba, Japan, August 26-31, 2012

Journal ref: Journal of Physics: Conference Series 428 (2013) 012002

arXiv:1204.5971 [pdf, ps, other]

doi 10.1103/PhysRevB.86.075406

Raman response of Stage-1 graphite intercalation compounds revisited

Authors: J. C. Chacón-Torres, A. Y. Ganin, M. J. Rosseinsky, T. Pichler

Abstract: We present a detailed in-situ Raman analysis of stage-1 KC8, CaC6, and LiC6 graphite intercalation compounds (GIC) to unravel their intrinsic finger print. Four main components were found between 1200 cm-1 and 1700 cm-1, and each of them were assigned to a corresponding vibrational mode. From a detailed line shape analysis of the intrinsic Fano-lines of the G- and D-line response we precisely dete… ▽ More We present a detailed in-situ Raman analysis of stage-1 KC8, CaC6, and LiC6 graphite intercalation compounds (GIC) to unravel their intrinsic finger print. Four main components were found between 1200 cm-1 and 1700 cm-1, and each of them were assigned to a corresponding vibrational mode. From a detailed line shape analysis of the intrinsic Fano-lines of the G- and D-line response we precisely determine the position (ωph), line width (Γph) and asymmetry (q) from each component. The comparison to the theoretical calculated line width and position of each component allow us to extract the electron-phonon coupling constant of these compounds. A coupling constant λph < 0.06 was obtained. This highlights that Raman active modes alone are not sufficient to explain the superconductivity within the electron-phonon coupling mechanism in CaC6 and KC8. △ Less

Submitted 13 July, 2012; v1 submitted 26 April, 2012; originally announced April 2012.

Comments: 6 pages, 3 figures, 2 tables

Journal ref: Phys. Rev. B 86, 075406 (2012)

arXiv:1202.0375 [pdf, other]

doi 10.1103/PhysRevB.85.064519

Anomalous dependence of the c-axis polarized Fe B$_{1g}$ phonon mode with Fe and Se concentrations in Fe$_{1+y}$Te$_{1-x}$Se$_x$

Authors: Y. J. Um, A. Subedi, P. Toulemonde, A. Y. Ganin, L. Boeri, M. Rahlenbeck, Y. Liu, C. T. Lin, S. J. E. Carlsson, A. Sulpice, M. J. Rosseinsky, B. Keimer, M. Le Tacon

Abstract: We report an investigation of the lattice dynamical properties in a range of Fe$_{1+y}$Te$_{1-x}$Se$_{x}$ compounds, with special emphasis on the c-axis polarized vibration of Fe with B$_{1g}$ symmetry, a Raman active mode common to all families of Fe-based superconductors. We have carried out a systematic study of the temperature dependence of this phonon mode as a function of Se $x$ and excess F… ▽ More We report an investigation of the lattice dynamical properties in a range of Fe$_{1+y}$Te$_{1-x}$Se$_{x}$ compounds, with special emphasis on the c-axis polarized vibration of Fe with B$_{1g}$ symmetry, a Raman active mode common to all families of Fe-based superconductors. We have carried out a systematic study of the temperature dependence of this phonon mode as a function of Se $x$ and excess Fe $y$ concentrations. In parent compound Fe$_{1+y}$Te, we observe an unconventional broadening of the phonon between room temperature and magnetic ordering temperature $T_N$. The situation smoothly evolves towards a regular anharmonic behavior as Te is substituted for Se and long range magnetic order is replaced by superconductivity. Irrespective to Se contents, excess Fe is shown to provide an additional damping channel for the B$_{1g}$ phonon at low temperatures. We performed Density Functional Theory (DFT) ab-initio calculations within the local density approximation (LDA) to calcuate the phonon frequencies including magnetic polarization and Fe non-stoichiometry in the Virtual Crystal Approximation (VCA). We obtained a good agreement with the measured phonon frequencies in the Fe-deficient samples, while the effects of Fe excess are poorly reproduced. This may be due to excess Fe-induced local magnetism and low energy magnetic fluctuations that can not be treated accurately within these approaches. As recently revealed by neutron scattering and $μ$-SR studies, these phenomena occur in the temperature range where anomalous decay of the B$_{1g}$ phonon is observed, and suggests a peculiar coupling of this mode with local moments and spin fluctuations in Fe$_{1+y}$Te$_{1-x}$Se$_{x}$. △ Less

Submitted 23 February, 2012; v1 submitted 2 February, 2012; originally announced February 2012.

Comments: 11 pages, 7 figures, 4 tables, to appear in Phys. Rev. B

Journal ref: Phys, Rev. B 85, 064519 (2012)

arXiv:1102.0488 [pdf]

doi 10.1039/C1SC00070E

Cation vacancy order in the K0.8+xFe1.6-ySe2 system: five-fold cell expansion accommodates 20% tetrahedral vacancies

Authors: J. Bacsa, A. Y. Ganin, Y. Takabayashi, K. E. Christensen, K. Prassides, M. J. Rosseinsky, J. B. Claridge

Abstract: Ordering of the tetrahedral site vacancies in two crystals of refined compositions K0.93(1)Fe1.52(1)Se2 and K0.862(3)Fe1.563(4)Se2 produces a fivefold expansion of the parent ThCr2Si2 unit cell in the ab plane which can accommodate 20% vacancies on a single site within the square FeSe layer. The iron charge state is maintained close to +2 by coupling of the level of alkali metal and iron vacancies… ▽ More Ordering of the tetrahedral site vacancies in two crystals of refined compositions K0.93(1)Fe1.52(1)Se2 and K0.862(3)Fe1.563(4)Se2 produces a fivefold expansion of the parent ThCr2Si2 unit cell in the ab plane which can accommodate 20% vacancies on a single site within the square FeSe layer. The iron charge state is maintained close to +2 by coupling of the level of alkali metal and iron vacancies, producing a potential doping mechanism which can operate at both average and local structure levels. △ Less

Submitted 18 February, 2011; v1 submitted 2 February, 2011; originally announced February 2011.

Comments: 5 pages 3 figures accepted for publication in Chemical Science Chem. Sci., DOI:10.1039/C1SC00070E

Journal ref: Chem. Sci., 2011, 2 (6), 1054 - 1058

arXiv:1007.3914 [pdf, ps, other]

doi 10.1103/PhysRevB.82.104514

Anisotropic fluctuations and quasiparticle excitations in FeSe_0.5Te_0.5

Authors: A. Serafin, A. I. Coldea, A. Y. Ganin, M. J. Rosseinsky, K. Prassides, D. Vignolles, A. Carrington

Abstract: We present data for the temperature dependence of the magnetic penetration depth lambda(T), heat capacity C(T), resistivity R(T) and magnetic torque ?tau for highly homogeneous single crystal samples of Fe1:0Se0:44(4)Te0:56(4). lambda(T) was measured down to 200mK in zero field. We find lambda(T) follows a power law lambda~T^n with n = 2.2 +/- 0.1. This is similar to some 122 iron-arsenides and li… ▽ More We present data for the temperature dependence of the magnetic penetration depth lambda(T), heat capacity C(T), resistivity R(T) and magnetic torque ?tau for highly homogeneous single crystal samples of Fe1:0Se0:44(4)Te0:56(4). lambda(T) was measured down to 200mK in zero field. We find lambda(T) follows a power law lambda~T^n with n = 2.2 +/- 0.1. This is similar to some 122 iron-arsenides and likely results from a sign-changing pairing state combined with strong scattering. Magnetic fields of up to B =55T or 14T were used for the ? tau(B) and C(T)/R(T) measurements respectively. The specific heat, resistivity and torque measurements were used to map out the (H,T) phase diagram in this material. All three measurements were conducted on exactly the same single crystal sample so that the different information revealed by these probes is clearly distinguished. Heat capacity data strongly resemble those found for the high Tc cuprates, where strong fluctuation effects wipe-out the phase transition at Hc2. Unusually, here we find the fluctuation effects appear to be strongly anisotropic. △ Less

Submitted 1 October, 2010; v1 submitted 22 July, 2010; originally announced July 2010.

Comments: 10 pages, 9 figures, submitted to PRB

Journal ref: Phys. Rev. B 82, 104514 (2010)

arXiv:1006.3411 [pdf, ps, other]

doi 10.1103/PhysRevB.82.140508

Two-electronic component behavior in the multiband FeSe$_{0.42}$Te$_{0.58}$ superconductor

Authors: D. Arcon, P. Jeglic, A. Zorko, A. Potocnik, A. Y. Ganin, Y. Takabayashi, M. J. Rosseinsky, K. Prassides

Abstract: We report X-band EPR and $^{125}$Te and $^{77}$Se NMR measurements on single-crystalline superconducting FeSe$_{0.42}$Te$_{0.58}$ ($T_c$ = 11.5(1) K). The data provide evidence for the coexistence of intrinsic localized and itinerant electronic states. In the normal state, localized moments couple to itinerant electrons in the Fe(Se,Te) layers and affect the local spin susceptibility and spin fluc… ▽ More We report X-band EPR and $^{125}$Te and $^{77}$Se NMR measurements on single-crystalline superconducting FeSe$_{0.42}$Te$_{0.58}$ ($T_c$ = 11.5(1) K). The data provide evidence for the coexistence of intrinsic localized and itinerant electronic states. In the normal state, localized moments couple to itinerant electrons in the Fe(Se,Te) layers and affect the local spin susceptibility and spin fluctuations. Below $T_c$, spin fluctuations become rapidly suppressed and an unconventional superconducting state emerges in which $1/T_1$ is reduced at a much faster rate than expected for conventional $s$- or $s_\pm$-wave symmetry. We suggest that the localized states arise from the strong electronic correlations within one of the Fe-derived bands. The multiband electronic structure together with the electronic correlations thus determine the normal and superconducting states of the FeSe$_{1-x}$Te$_x$ family, which appears much closer to other high-$T_c$ superconductors than previously anticipated. △ Less

Submitted 17 June, 2010; originally announced June 2010.

Comments: 5 pages, 4 figures

Journal ref: Physical Review B 82, 140508(R) (2010)

arXiv:0912.3152 [pdf, ps, other]

doi 10.1103/PhysRevLett.104.097002

Strong electron correlations in the normal state of FeSe0.42Te0.58

Authors: A. Tamai, A. Y. Ganin, E. Rozbicki, J. Bacsa, W. Meevasana, P. D. C. King, M. Caffio, R. Schaub, S. Margadonna, K. Prassides, M. J. Rosseinsky, F. Baumberger

Abstract: We investigate the normal state of the '11' iron-based superconductor FeSe0.42Te0.58 by angle resolved photoemission. Our data reveal a highly renormalized quasiparticle dispersion characteristic of a strongly correlated metal. We find sheet dependent effective carrier masses between ~ 3 - 16 m_e corresponding to a mass enhancement over band structure values of m*/m_band ~ 6 - 20. This is nearly… ▽ More We investigate the normal state of the '11' iron-based superconductor FeSe0.42Te0.58 by angle resolved photoemission. Our data reveal a highly renormalized quasiparticle dispersion characteristic of a strongly correlated metal. We find sheet dependent effective carrier masses between ~ 3 - 16 m_e corresponding to a mass enhancement over band structure values of m*/m_band ~ 6 - 20. This is nearly an order of magnitude higher than the renormalization reported previously for iron-arsenide superconductors of the '1111' and '122' families but fully consistent with the bulk specific heat. △ Less

Submitted 10 February, 2010; v1 submitted 16 December, 2009; originally announced December 2009.

Comments: 5 pages, 4 figures, to appear in Phys. Rev. Lett

Journal ref: Phys. Rev. Lett. 104, 097002 (2010)

Showing 1–21 of 21 results for author: Ganin, Y