Skip to main content

Showing 1–7 of 7 results for author: Fazly, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.13594  [pdf, other

    cs.CL cs.AI

    Graph Guided Question Answer Generation for Procedural Question-Answering

    Authors: Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez

    Abstract: In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural t… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to EACL 2024 as long paper. 25 pages including appendix

    MSC Class: I.2.7

  2. arXiv:2310.08312  [pdf, other

    cs.CV cs.LG

    GePSAn: Generative Procedure Step Anticipation in Cooking Videos

    Authors: Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations i… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: published at ICCV 2023

  3. arXiv:2211.00113  [pdf, other

    cs.LG cs.CV

    SAGE: Saliency-Guided Mixup with Optimal Rearrangements

    Authors: Avery Ma, Nikita Dvornik, Ran Zhang, Leila Pishdad, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: Data augmentation is a key element for training accurate models by reducing overfitting and improving generalization. For image classification, the most popular data augmentation techniques range from simple photometric and geometrical transformations, to more complex methods that use visual saliency to craft new training examples. As augmentation methods get more complex, their ability to increas… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Accepted at British Machine Vision Conference (BMVC) 2022. Code: https://github.com/SamsungLabs/SAGE

  4. arXiv:2210.14862  [pdf, other

    cs.CV cs.CL cs.LG

    Visual Semantic Parsing: From Images to Abstract Meaning Representation

    Authors: Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

    Abstract: The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs o… ▽ More

    Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: published in CoNLL 2022

  5. arXiv:2210.04996  [pdf, other

    cs.CV cs.AI

    Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization

    Authors: Nikita Dvornik, Isma Hadji, Hai Pham, Dhaivat Bhatt, Brais Martinez, Afsaneh Fazly, Allan D. Jepson

    Abstract: In this work, we consider the problem of weakly-supervised multi-step localization in instructional videos. An established approach to this problem is to rely on a given list of steps. However, in reality, there is often more than one way to execute a procedure successfully, by following the set of steps in slightly varying orders. Thus, for successful localization in a given video, recent works r… ▽ More

    Submitted 31 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: ECCV'22, oral

    Journal ref: ECCV 2022

  6. arXiv:2204.09268  [pdf, other

    cs.LG cs.CL cs.CV cs.IR

    Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations

    Authors: Leila Pishdad, Ran Zhang, Konstantinos G. Derpanis, Allan Jepson, Afsaneh Fazly

    Abstract: Probabilistic embeddings have proven useful for capturing polysemous word meanings, as well as ambiguity in image matching. In this paper, we study the advantages of probabilistic embeddings in a cross-modal setting (i.e., text and images), and propose a simple approach that replaces the standard vector point embeddings in extant image-text matching models with probabilistic distributions that are… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: 13 pages, 7 figures

  7. arXiv:1911.01474  [pdf, other

    cs.HC cs.AI cs.CL cs.CV

    VASTA: A Vision and Language-assisted Smartphone Task Automation System

    Authors: Alborz Rezazadeh Sereshkeh, Gary Leung, Krish Perumal, Caleb Phillips, Minfan Zhang, Afsaneh Fazly, Iqbal Mohomed

    Abstract: We present VASTA, a novel vision and language-assisted Programming By Demonstration (PBD) system for smartphone task automation. Development of a robust PBD automation system requires overcoming three key challenges: first, how to make a particular demonstration robust to positional and visual changes in the user interface (UI) elements; secondly, how to recognize changes in the automation paramet… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

    Comments: Submitted to ACM IUI'20, 10 figures, 11 pages