Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Ben-Ari, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18065  [pdf, other

    cs.CV cs.AI

    EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition

    Authors: Issar Tzachor, Boaz Lerner, Matan Levy, Michael Green, Tal Berkovitz Shalev, Gavriel Habib, Dvir Samuel, Noam Korngut Zailer, Or Shimshi, Nir Darshan, Rami Ben-Ari

    Abstract: The task of Visual Place Recognition (VPR) is to predict the location of a query image from a database of geo-tagged images. Recent studies in VPR have highlighted the significant advantage of employing pre-trained foundation models like DINOv2 for the VPR task. However, these models are often deemed inadequate for VPR without further fine-tuning on task-specific data. In this paper, we propose a… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2405.18025  [pdf, other

    cs.CV cs.AI

    Unveiling the Power of Diffusion Features For Personalized Segmentation and Retrieval

    Authors: Dvir Samuel, Rami Ben-Ari, Matan Levy, Nir Darshan, Gal Chechik

    Abstract: Personalized retrieval and segmentation aim to locate specific instances within a dataset based on an input image and a short description of the reference instance. While supervised methods are effective, they require extensive labeled data for training. Recently, self-supervised foundation models have been introduced to these tasks showing comparable results to supervised methods. However, a sign… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2312.12540  [pdf, other

    cs.CV

    Regularized Newton Raphson Inversion for Text-to-Image Diffusion Models

    Authors: Dvir Samuel, Barak Meiri, Nir Darshan, Shai Avidan, Gal Chechik, Rami Ben-Ari

    Abstract: Diffusion inversion is the problem of taking an image and a text prompt that describes it and finding a noise latent that would generate the image. Most current inversion techniques operate by approximately solving an implicit equation and may converge slowly or yield poor reconstructed images. Here, we formulate the problem as finding the roots of an implicit equation and design a method to solve… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  4. arXiv:2312.11078  [pdf, other

    cs.CV

    Advancing Image Retrieval with Few-Shot Learning and Relevance Feedback

    Authors: Boaz Lerner, Nir Darshan, Rami Ben-Ari

    Abstract: With such a massive growth in the number of images stored, efficient search in a database has become a crucial endeavor managed by image retrieval systems. Image Retrieval with Relevance Feedback (IRRF) involves iterative human interaction during the retrieval process, yielding more meaningful outcomes. This process can be generally cast as a binary classification problem with only {\it few} label… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: A short version of this paper was presented in ICCV-Out Of Distribution Generalization on Computer Vision (OOD-CV) Workshop 2023. See also https://github.com/eccv22-ood-workshop/eccv22-ood-workshop.github.io/blob/new/camera_ready/CameraReady%2053.pdf

  5. arXiv:2307.06751  [pdf, other

    cs.CV

    Watch Where You Head: A View-biased Domain Gap in Gait Recognition and Unsupervised Adaptation

    Authors: Gavriel Habib, Noa Barzilay, Or Shimshi, Rami Ben-Ari, Nir Darshan

    Abstract: Gait Recognition is a computer vision task aiming to identify people by their walking patterns. Although existing methods often show high performance on specific datasets, they lack the ability to generalize to unseen scenarios. Unsupervised Domain Adaptation (UDA) tries to adapt a model, pre-trained in a supervised manner on a source domain, to an unlabelled target domain. There are only a few wo… ▽ More

    Submitted 10 December, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted to WACV 2024

  6. arXiv:2306.08687  [pdf, other

    cs.CV cs.AI

    Norm-guided latent space exploration for text-to-image generation

    Authors: Dvir Samuel, Rami Ben-Ari, Nir Darshan, Haggai Maron, Gal Chechik

    Abstract: Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, the latent space of initial seeds is still not well understood and its structure was shown to impact the generation of various concepts. Specifically, simple operations like interpolation and finding the centroid of a set of seeds perform poorly when using sta… ▽ More

    Submitted 5 November, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  7. arXiv:2305.20062  [pdf, other

    cs.CV

    Chatting Makes Perfect: Chat-based Image Retrieval

    Authors: Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski

    Abstract: Chats emerge as an effective user-friendly approach for information retrieval, and are successfully employed in many domains, such as customer service, healthcare, and finance. However, existing image retrieval approaches typically address the case of a single query-to-image round, and the use of chats for image retrieval has been mostly overlooked. In this work, we introduce ChatIR: a chat-based… ▽ More

    Submitted 5 October, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Camera Ready version for NeurIPS 2023

  8. arXiv:2304.14530  [pdf, other

    cs.CV cs.LG

    Generating images of rare concepts using pre-trained diffusion models

    Authors: Dvir Samuel, Rami Ben-Ari, Simon Raviv, Nir Darshan, Gal Chechik

    Abstract: Text-to-image diffusion models can synthesize high-quality images, but they have various limitations. Here we highlight a common failure mode of these models, namely, generating uncommon concepts and structured concepts like hand palms. We show that their limitation is partly due to the long-tail nature of their training data: web-crawled data sets are strongly unbalanced, causing models to under-… ▽ More

    Submitted 27 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Accepted to AAAI 2024

  9. arXiv:2303.09429  [pdf, other

    cs.CV

    Data Roaming and Quality Assessment for Composed Image Retrieval

    Authors: Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski

    Abstract: The task of Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively. However, current CoIR datasets are orders of magnitude smaller compared to other vision and language (V&L) datasets. Additionally, some of these datasets have noticeable issues, such as queries containing redundant modalities. To address thes… ▽ More

    Submitted 20 December, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Camera Ready version for AAAI 2024

  10. Learnable Optimal Sequential Grouping for Video Scene Detection

    Authors: Daniel Rotman, Yevgeny Yaroker, Elad Amrani, Udi Barzelay, Rami Ben-Ari

    Abstract: Video scene detection is the task of dividing videos into temporal semantic chapters. This is an important preliminary step before attempting to analyze heterogeneous video content. Recently, Optimal Sequential Grouping (OSG) was proposed as a powerful unsupervised solution to solve a formulation of the video scene detection problem. In this work, we extend the capabilities of OSG to the learning… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Journal ref: In Proceedings of the 28th ACM International Conference on Multimedia, pp. 1958-1966. 2020

  11. arXiv:2111.14792  [pdf, other

    cs.CV

    Classification-Regression for Chart Comprehension

    Authors: Matan Levy, Rami Ben-Ari, Dani Lischinski

    Abstract: Chart question answering (CQA) is a task used for assessing chart comprehension, which is fundamentally different from understanding natural images. CQA requires analyzing the relationships between the textual and the visual components of a chart, in order to answer general questions or infer numerical values. Most existing CQA datasets and models are based on simplifying assumptions that often en… ▽ More

    Submitted 11 July, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: ECCV 2022

  12. arXiv:2004.10141  [pdf, other

    cs.CV

    TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition

    Authors: Rami Ben-Ari, Mor Shpigel, Ophir Azulai, Udi Barzelay, Daniel Rotman

    Abstract: Classification of new class entities requires collecting and annotating hundreds or thousands of samples that is often prohibitively costly. Few-shot learning suggests learning to classify new classes using just a few examples. Only a small number of studies address the challenge of few-shot learning on spatio-temporal patterns such as videos. In this paper, we present the Temporal Aware Embedding… ▽ More

    Submitted 17 July, 2021; v1 submitted 21 April, 2020; originally announced April 2020.

    Journal ref: Published in Learning from Limited and Imperfect Data (L2ID) Workshop - CVPR 2021

  13. arXiv:2003.03186  [pdf, other

    cs.CV

    Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning

    Authors: Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein

    Abstract: One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods often choose to ignore… ▽ More

    Submitted 10 December, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: Accepted to AAAI 2021

    ACM Class: I.2.10; I.4; I.5

  14. arXiv:1905.11137  [pdf, other

    cs.CV

    Learning to Detect and Retrieve Objects from Unlabeled Videos

    Authors: Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein

    Abstract: Learning an object detector or retrieval requires a large data set with manual annotations. Such data sets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose to exploit the natural correlation in narrations and the visual presence of objects in video, to learn an object detector and retrieval without any manual labeling involved.… ▽ More

    Submitted 19 October, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: ICCV 2019 Workshop on Multi-modal Video Analysis and Moments in Time Challenge

    ACM Class: I.2.10; I.4; I.5

  15. Weakly and Semi Supervised Detection in Medical Imaging via Deep Dual Branch Net

    Authors: Ran Bakalo, Jacob Goldberger, Rami Ben-Ari

    Abstract: This study presents a novel deep learning architecture for multi-class classification and localization of abnormalities in medical imaging illustrated through experiments on mammograms. The proposed network combines two learning branches. One branch is for region classification with a newly added normal-region class. Second branch is region detection branch for ranking regions relative to one anot… ▽ More

    Submitted 19 March, 2020; v1 submitted 29 April, 2019; originally announced April 2019.

    Journal ref: Neurocomputing, Volume 421, 15 January 2021, Pages 15-25

  16. arXiv:1904.12319  [pdf, other

    cs.CV

    Classification and Detection in Mammograms with Weak Supervision via Dual Branch Deep Neural Net

    Authors: Ran Bakalo, Rami Ben-Ari, Jacob Goldberger

    Abstract: The high cost of generating expert annotations, poses a strong limitation for supervised machine learning methods in medical imaging. Weakly supervised methods may provide a solution to this tangle. In this study, we propose a novel deep learning architecture for multi-class classification of mammograms according to the severity of their containing anomalies, having only a global tag over the imag… ▽ More

    Submitted 28 April, 2019; originally announced April 2019.

    Comments: Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2019 (oral)