Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Kesen, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12498  [pdf, other

    cs.CL cs.CV

    Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning

    Authors: Mustafa Dogan, Ilker Kesen, Iacer Calixto, Aykut Erdem, Erkut Erdem

    Abstract: The linguistic capabilities of Multimodal Large Language Models (MLLMs) are critical for their effective application across diverse tasks. This study aims to evaluate the performance of MLLMs on the VALSE benchmark, focusing on the efficacy of few-shot In-Context Learning (ICL), and Chain-of-Thought (CoT) prompting. We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in mode… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Preprint. 33 pages, 17 Figures, 3 Tables

  2. arXiv:2404.16621  [pdf, other

    cs.LG cs.AI cs.CL

    Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

    Authors: Emre Can Acikgoz, Osman Batur İnce, Rayene Bench, Arda Anıl Boz, İlker Kesen, Aykut Erdem, Erkut Erdem

    Abstract: The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  3. arXiv:2311.07022  [pdf, other

    cs.CL cs.AI cs.CV

    ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

    Authors: Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, Erkut Erdem

    Abstract: With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm foo… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Preprint. 48 pages, 22 figures, 10 tables

  4. arXiv:2211.04576  [pdf, other

    cs.CL cs.AI

    Detecting Euphemisms with Literal Descriptions and Visual Imagery

    Authors: İlker Kesen, Aykut Erdem, Erkut Erdem, Iacer Calixto

    Abstract: This paper describes our two-stage system for the Euphemism Detection shared task hosted by the 3rd Workshop on Figurative Language Processing in conjunction with EMNLP 2022. Euphemisms tone down expressions about sensitive or unpleasant issues like addiction and death. The ambiguous nature of euphemistic words or expressions makes it challenging to detect their actual meaning within a context. In… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 7 pages, 1 table, 1 figure. Accepted to the 3rd Workshop on Figurative Language Processing at EMNLP 2022. https://github.com/ilkerkesen/euphemism

  5. arXiv:2012.04293  [pdf, other

    cs.AI cs.CL cs.CV

    CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions

    Authors: Tayfun Ates, M. Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, Deniz Yuret

    Abstract: Humans are able to perceive, understand and reason about causal events. Developing models with similar physical and causal understanding capabilities is a long-standing goal of artificial intelligence. As a step towards this direction, we introduce CRAFT, a new video question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 58K video and q… ▽ More

    Submitted 1 March, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Accepted to Findings of ACL 2022

  6. arXiv:2003.12739  [pdf, other

    cs.CV cs.CL cs.LG

    Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters

    Authors: İlker Kesen, Ozan Arkan Can, Erkut Erdem, Aykut Erdem, Deniz Yuret

    Abstract: How to best integrate linguistic and perceptual processing in multi-modal tasks that involve language and vision is an important open problem. In this work, we argue that the common practice of using language in a top-down manner, to direct visual attention over high-level visual features, may not be optimal. We hypothesize that the use of language to also condition the bottom-up processing from p… ▽ More

    Submitted 23 June, 2022; v1 submitted 28 March, 2020; originally announced March 2020.

    Comments: 13 pages, 6 figures, 6 tables. Appeared in MULA Workshop at CVPR 2022

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4610-4620