Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Faieta, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.11710  [pdf, other

    cs.CV

    Controlled and Conditional Text to Image Generation with Diffusion Prior

    Authors: Pranav Aggarwal, Hareesh Ravi, Naveen Marri, Sachin Kelkar, Fengbin Chen, Vinh Khuc, Midhun Harikumar, Ritiz Tambi, Sudharshan Reddy Kakumanu, Purvak Lapsiya, Alvin Ghouas, Sarah Saber, Malavika Ramprasad, Baldo Faieta, Ajinkya Kale

    Abstract: Denoising Diffusion models have shown remarkable performance in generating diverse, high quality images from text. Numerous techniques have been proposed on top of or in alignment with models like Stable Diffusion and Imagen that generate images directly from text. A lesser explored approach is DALLE-2's two step process comprising a Diffusion Prior that generates a CLIP image embedding from text… ▽ More

    Submitted 1 August, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

  2. arXiv:2208.04807  [pdf, other

    cs.CV

    HyperNST: Hyper-Networks for Neural Style Transfer

    Authors: Dan Ruta, Andrew Gilbert, Saeid Motiian, Baldo Faieta, Zhe Lin, John Collomosse

    Abstract: We present HyperNST; a neural style transfer (NST) technique for the artistic stylization of images, based on Hyper-networks and the StyleGAN2 architecture. Our contribution is a novel method for inducing style transfer parameterized by a metric space, pre-trained for style-based visual search (SBVS). We show for the first time that such space may be used to drive NST, enabling the application and… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

  3. arXiv:2203.05321  [pdf, other

    cs.CV cs.CL

    StyleBabel: Artistic Style Tagging and Captioning

    Authors: Dan Ruta, Andrew Gilbert, Pranav Aggarwal, Naveen Marri, Ajinkya Kale, Jo Briggs, Chris Speed, Hailin Jin, Baldo Faieta, Alex Filipkowski, Zhe Lin, John Collomosse

    Abstract: We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. StyleBabel was collected via an iterative method, inspired by `Grounded Theory': a qualitative approach that enables annotation while co… ▽ More

    Submitted 11 March, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

  4. arXiv:2104.12836  [pdf, other

    cs.CV

    Multimodal Contrastive Training for Visual Representation Learning

    Authors: Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, Baldo Faieta

    Abstract: We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing visual pre-training methods, which solve a proxy prediction task in a single domain, our method exploits intrinsic data properties within each modality and semantic information from cross-modal correlation sim… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

  5. arXiv:2103.09776  [pdf, other

    cs.CV

    ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity

    Authors: Dan Ruta, Saeid Motiian, Baldo Faieta, Zhe Lin, Hailin Jin, Alex Filipkowski, Andrew Gilbert, John Collomosse

    Abstract: We present ALADIN (All Layer AdaIN); a novel architecture for searching images based on the similarity of their artistic style. Representation learning is critical to visual search, where distance in the learned search embedding reflects image similarity. Learning an embedding that discriminates fine-grained variations in style is hard, due to the difficulty of defining and labelling style. ALADIN… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

  6. arXiv:1905.13339  [pdf, other

    cs.CV cs.IR

    Multitask Text-to-Visual Embedding with Titles and Clickthrough Data

    Authors: Pranav Aggarwal, Zhe Lin, Baldo Faieta, Saeid Motiian

    Abstract: Text-visual (or called semantic-visual) embedding is a central problem in vision-language research. It typically involves mapping of an image and a text description to a common feature space through a CNN image encoder and a RNN language encoder. In this paper, we propose a new method for learning text-visual embedding using both image titles and click-through data from an image search engine. We… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: 4 pages. Language and Vision Workshop, in conjunction with CVPR 2019