Skip to main content

Showing 1–3 of 3 results for author: Abdelsalam, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11393  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

    Authors: Kalliopi Basioti, Mohamed A. Abdelsalam, Federico Fancellu, Vladimir Pavlovic, Afsaneh Fazly

    Abstract: Controllable Image Captioning (CIC) aims at generating natural language descriptions for an image, conditioned on information provided by end users, e.g., regions, entities or events of interest. However, available image-language datasets mainly contain captions that describe the entirety of an image, making them ineffective for training CIC models that can potentially attend to any subset of regi… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2310.08312  [pdf, other

    cs.CV cs.LG

    GePSAn: Generative Procedure Step Anticipation in Cooking Videos

    Authors: Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations i… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: published at ICCV 2023

  3. arXiv:2210.14862  [pdf, other

    cs.CV cs.CL cs.LG

    Visual Semantic Parsing: From Images to Abstract Meaning Representation

    Authors: Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

    Abstract: The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs o… ▽ More

    Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: published in CoNLL 2022