Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Kajic, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  2. arXiv:2406.14774  [pdf, other

    cs.LG cs.CL cs.CV

    Evaluating Numerical Reasoning in Text-to-Image Models

    Authors: Ivana Kajić, Olivia Wiles, Isabela Albuquerque, Matthias Bauer, Su Wang, Jordi Pont-Tuset, Aida Nematzadeh

    Abstract: Text-to-image generative models are capable of producing high-quality images that often faithfully depict concepts described using natural language. In this work, we comprehensively evaluate a range of text-to-image models on numerical reasoning tasks of varying difficulty, and show that even the most advanced models have only rudimentary numerical skills. Specifically, their ability to correctly… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2405.16759  [pdf, other

    cs.CV cs.LG

    Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

    Authors: Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

    Abstract: We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignm… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2404.16820  [pdf, other

    cs.CV

    Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

    Authors: Olivia Wiles, Chuhan Zhang, Isabela Albuquerque, Ivana Kajić, Su Wang, Emanuele Bugliarello, Yasumasa Onoe, Chris Knutsen, Cyrus Rashtchian, Jordi Pont-Tuset, Aida Nematzadeh

    Abstract: While text-to-image (T2I) generative models have become ubiquitous, they do not necessarily generate images that align with a given prompt. While previous work has evaluated T2I alignment by proposing metrics, benchmarks, and templates for collecting human judgements, the quality of these components is not systematically measured. Human-rated prompt sets are generally small and the reliability of… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Data and code will be released at: https://github.com/google-deepmind/gecko_benchmark_t2i

  5. arXiv:2303.07172  [pdf, other

    cs.AI cs.CV cs.LG

    Evaluating Visual Number Discrimination in Deep Neural Networks

    Authors: Ivana Kajić, Aida Nematzadeh

    Abstract: The ability to discriminate between large and small quantities is a core aspect of basic numerical competence in both humans and animals. In this work, we examine the extent to which the state-of-the-art neural networks designed for vision exhibit this basic ability. Motivated by studies in animal and infant numerical cognition, we use the numerical bisection procedure to test number discriminatio… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  6. arXiv:2211.01480  [pdf, other

    cs.MA cs.CL cs.HC

    Over-communicate no more: Situated RL agents learn concise communication protocols

    Authors: Aleksandra Kalinowska, Elnaz Davoodi, Florian Strub, Kory W Mathewson, Ivana Kajic, Michael Bowling, Todd D Murphey, Patrick M Pilarski

    Abstract: While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other. Much research on communication emergence uses reinforcement learning (RL) and explores unsituated communication in one-step referential tasks -- the tasks are not temporally interactive and lac… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  7. arXiv:2205.12191  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization

    Authors: Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh

    Abstract: Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA). The quality of such models is commonly assessed by measuring their performance on unseen data that typically comes from the same distribution as the training data. However, when evaluated under out-of-distribu… ▽ More

    Submitted 1 April, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Findings of EACL 2023. Aishwarya, Ivana, Emanuele and Aida had equal first author contributions. Elnaz and Anita had equal contributions. Aida and Aishwarya had equal senior contributions

  8. arXiv:2004.01097  [pdf, other

    cs.LG cs.CL cs.MA stat.ML

    Learning to cooperate: Emergent communication in multi-agent navigation

    Authors: Ivana Kajić, Eser Aygün, Doina Precup

    Abstract: Emergent communication in artificial agents has been studied to understand language evolution, as well as to develop artificial systems that learn to communicate with humans. We show that agents performing a cooperative navigation task in various gridworld environments learn an interpretable communication protocol that enables them to efficiently, and in many cases, optimally, solve the task. An a… ▽ More

    Submitted 30 June, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: Accepted to CogSci 2020