Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Khan, F F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.03940  [pdf, other

    cs.CV

    How Well Can Vision Language Models See Image Details?

    Authors: Chenhui Gou, Abdulwahab Felemban, Faizan Farooq Khan, Deyao Zhu, Jianfei Cai, Hamid Rezatofighi, Mohamed Elhoseiny

    Abstract: Large Language Model-based Vision-Language Models (LLM-based VLMs) have demonstrated impressive results in various vision-language understanding tasks. However, how well these VLMs can see image detail beyond the semantic level remains unclear. In our study, we introduce a pixel value prediction task (PVP) to explore "How Well Can Vision Language Models See Image Details?" and to assist VLMs in pe… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  2. arXiv:2402.02453  [pdf, other

    cs.CV

    AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art

    Authors: Faizan Farooq Khan, Diana Kim, Divyansh Jha, Youssef Mohamed, Hanna H Chang, Ahmed Elgammal, Luba Elliott, Mohamed Elhoseiny

    Abstract: Discovering the creative potentials of a random signal to various artistic expressions in aesthetic and conceptual richness is a ground for the recent success of generative machine learning as a way of art creation. To understand the new artistic medium better, we conduct a comprehensive analysis to position AI-generated art within the context of human art heritage. Our comparative analysis is bas… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  3. arXiv:2304.05390  [pdf, other

    cs.CV cs.AI cs.LG

    HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

    Authors: Eslam Mohamed Bakr, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny

    Abstract: In recent years, Text-to-Image (T2I) models have been extensively studied, especially with the emergence of diffusion models that achieve state-of-the-art results on T2I synthesis tasks. However, existing benchmarks heavily rely on subjective human evaluation, limiting their ability to holistically assess the model's capabilities. Furthermore, there is a significant gap between efforts in developi… ▽ More

    Submitted 23 November, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: ICCV 2023

  4. arXiv:2204.07660  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

    Authors: Youssef Mohamed, Faizan Farooq Khan, Kilichbek Haydarov, Mohamed Elhoseiny

    Abstract: Datasets that capture the connection between vision, language, and affection are limited, causing a lack of understanding of the emotional aspect of human intelligence. As a step in this direction, the ArtEmis dataset was recently introduced as a large-scale dataset of emotional reactions to images along with language explanations of these chosen emotions. We observed a significant emotional bias… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: 8 pages, Accepted at CVPR 22, for more details see https://www.artemisdataset-v2.org

  5. Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor

    Authors: Anchit Gupta, Faizan Farooq Khan, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C. V. Jawahar

    Abstract: This paper proposes a video editor based on OpenShot with several state-of-the-art facial video editing algorithms as added functionalities. Our editor provides an easy-to-use interface to apply modern lip-syncing algorithms interactively. Apart from lip-syncing, the editor also uses audio and facial re-enactment to generate expressive talking faces. The manual control improves the overall experie… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

    Comments: 9 pages, 7 figures, accepted in ICVGIP 2021

  6. arXiv:2109.08043  [pdf

    cs.CV

    Generating Dataset For Large-scale 3D Facial Emotion Recognition

    Authors: Faizan Farooq Khan, Syed Zulqarnain Gilani

    Abstract: The tremendous development in deep learning has led facial expression recognition (FER) to receive much attention in the past few years. Although 3D FER has an inherent edge over its 2D counterpart, work on 2D images has dominated the field. The main reason for the slow development of 3D FER is the unavailability of large training and large test datasets. Recognition accuracies have already satura… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.