Zum Hauptinhalt springen

Showing 1–30 of 30 results for author: Fried, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12570  [pdf, other

    cs.CL cs.LG

    Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

    Authors: Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom Manevich, Barak Peleg, Ben Aviram, Chen Almagor, Clara Fridman, Dan Padnos, Daniel Gissin, Daniel Jannai, Dor Muhlgay, Dor Zimberg, Edden M Gerber, Elad Dolev, Eran Krakovsky, Erez Safahi, Erez Schwartz, Gal Cohen, Gal Shachaf, Haim Rozenblum, Hofit Bata, Ido Blass, Inbal Magar , et al. (36 additional authors not shown)

    Abstract: We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, wi… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Webpage: https://www.ai21.com/jamba

  2. arXiv:2406.14551  [pdf, other

    cs.CV

    Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation

    Authors: Eyal Michaeli, Ohad Fried

    Abstract: Fine-grained visual classification (FGVC) involves classifying closely related sub-classes. This task is difficult due to the subtle differences between classes and the high intra-class variance. Moreover, FGVC datasets are typically small and challenging to gather, thus highlighting a significant need for effective data augmentation. Recent advancements in text-to-image diffusion models offer new… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review. Code is available at https://github.com/EyalMichaeli/SaSPA-Aug

  3. arXiv:2406.14510  [pdf, other

    cs.CV cs.AI cs.GR

    V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data

    Authors: Rotem Shalev-Arkushin, Aharon Azulay, Tavi Halperin, Eitan Richardson, Amit H. Bermano, Ohad Fried

    Abstract: Diffusion-based generative models have recently shown remarkable image and video editing capabilities. However, local video editing, particularly removal of small attributes like glasses, remains a challenge. Existing methods either alter the videos excessively, generate unrealistic artifacts, or fail to perform the requested edit consistently throughout the video. In this work, we focus on consis… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.06508  [pdf, other

    cs.CV cs.AI cs.GR

    Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

    Authors: Sigal Raab, Inbar Gat, Nathan Sala, Guy Tevet, Rotem Shalev-Arkushin, Ohad Fried, Amit H. Bermano, Daniel Cohen-Or

    Abstract: Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models, which enables manipulating the latent feature space; hence, they primarily center on handlin… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Video: https://www.youtube.com/watch?v=s5oo3sKV0YU, Project page: https://monkeyseedocg.github.io, Code: https://github.com/MonkeySeeDoCG/MoMo-code

  5. arXiv:2406.01594  [pdf, other

    cs.CV cs.GR cs.LG

    DiffUHaul: A Training-Free Method for Object Dragging in Images

    Authors: Omri Avrahami, Rinon Gal, Gal Chechik, Ohad Fried, Dani Lischinski, Arash Vahdat, Weili Nie

    Abstract: Text-to-image diffusion models have proven effective for solving many image editing tasks. However, the seemingly straightforward task of seamlessly relocating objects within a scene remains surprisingly challenging. Existing methods addressing this problem often struggle to function reliably in real-world scenarios due to lacking spatial reasoning. In this work, we propose a training-free method,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page is available at https://omriavrahami.com/diffuhaul/

  6. arXiv:2312.04145  [pdf, other

    cs.CV cs.GR cs.LG

    Diffusing Colors: Image Colorization with Text Guided Diffusion

    Authors: Nir Zabari, Aharon Azulay, Alexey Gorkor, Tavi Halperin, Ohad Fried

    Abstract: The colorization of grayscale images is a complex and subjective task with significant challenges. Despite recent progress in employing large-scale datasets with deep neural networks, difficulties with controllability and visual quality persist. To tackle these issues, we present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts. This integrat… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: SIGGRAPH Asia 2023

  7. arXiv:2311.10093  [pdf, other

    cs.CV cs.GR cs.LG

    The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

    Authors: Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

    Abstract: Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, the users that use these models struggle with the generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development, asset design, advertising, and more. Current methods typically rely on multiple pre-existing images… ▽ More

    Submitted 5 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to SIGGRAPH 2024. Project page is available at https://omriavrahami.com/the-chosen-one/

  8. arXiv:2306.00950  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Differential Diffusion: Giving Each Pixel Its Strength

    Authors: Eran Levin, Ohad Fried

    Abstract: Diffusion models have revolutionized image generation and editing, producing state-of-the-art results in conditioned and unconditioned image synthesis. While current techniques enable user control over the degree of change in an image edit, the controllability is limited to global changes over an entire edited region. This paper introduces a novel framework that enables customization of the amount… ▽ More

    Submitted 28 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Project Page: https://differential-diffusion.github.io/

    ACM Class: I.3.3

  9. arXiv:2305.16311  [pdf, other

    cs.CV cs.GR cs.LG

    Break-A-Scene: Extracting Multiple Concepts from a Single Image

    Authors: Omri Avrahami, Kfir Aberman, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

    Abstract: Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images with variations in backgrounds and poses, and struggle when adapted to a different scenario. In this work, we introduce the task of textual scene decomposition:… ▽ More

    Submitted 4 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: SIGGRAPH Asia 2023. Project page: at: https://omriavrahami.com/break-a-scene/ Video: https://www.youtube.com/watch?v=-9EA-BhizgM

  10. arXiv:2303.10762  [pdf, other

    cs.CV

    Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis

    Authors: Sergey Sinitsa, Ohad Fried

    Abstract: The generation of high-quality images has become widely accessible and is a rapidly evolving process. As a result, anyone can generate images that are indistinguishable from real ones. This leads to a wide range of applications, including malicious usage with deceptive intentions. Despite advances in detection techniques for generated images, a robust detection method still eludes us. Furthermore,… ▽ More

    Submitted 11 July, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

  11. arXiv:2212.01470  [pdf, other

    cs.CV

    Prediction of Scene Plausibility

    Authors: Or Nachmias, Ohad Fried, Ariel Shamir

    Abstract: Understanding the 3D world from 2D images involves more than detection and segmentation of the objects within the scene. It also includes the interpretation of the structure and arrangement of the scene elements. Such understanding is often rooted in recognizing the physical world and its limitations, and in prior knowledge as to how similar typical scenes are arranged. In this research we pose a… ▽ More

    Submitted 6 December, 2022; v1 submitted 2 December, 2022; originally announced December 2022.

  12. arXiv:2212.00773  [pdf, other

    cs.CV

    FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection

    Authors: Gil Knafo, Ohad Fried

    Abstract: Video synthesis methods rapidly improved in recent years, allowing easy creation of synthetic humans. This poses a problem, especially in the era of social media, as synthetic videos of speaking humans can be used to spread misinformation in a convincing manner. Thus, there is a pressing need for accurate and robust deepfake detection methods, that can detect forgery techniques not seen during tra… ▽ More

    Submitted 7 February, 2024; v1 submitted 1 December, 2022; originally announced December 2022.

  13. arXiv:2211.16488  [pdf, other

    cs.CV

    Taming Normalizing Flows

    Authors: Shimon Malnick, Shai Avidan, Ohad Fried

    Abstract: We propose an algorithm for taming Normalizing Flow models - changing the probability that the model will produce a specific image or image category. We focus on Normalizing Flows because they can calculate the exact generation probability likelihood for a given image. We demonstrate taming using models that generate human faces, a subdomain with many interesting privacy and bias considerations. O… ▽ More

    Submitted 3 April, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

  14. arXiv:2211.14802  [pdf, other

    cs.CV

    Neural Font Rendering

    Authors: Daniel Anderson, Ariel Shamir, Ohad Fried

    Abstract: Recent advances in deep learning techniques and applications have revolutionized artistic creation and manipulation in many domains (text, images, music); however, fonts have not yet been integrated with deep learning architectures in a manner that supports their multi-scale nature. In this work we aim to bridge this gap, proposing a network architecture capable of rasterizing glyphs in multiple s… ▽ More

    Submitted 29 November, 2022; v1 submitted 27 November, 2022; originally announced November 2022.

  15. SpaText: Spatio-Textual Representation for Controllable Image Generation

    Authors: Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin

    Abstract: Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image gen… ▽ More

    Submitted 19 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Project page available at: https://omriavrahami.com/spatext

  16. arXiv:2211.13807  [pdf, other

    cs.CV

    GEFF: Improving Any Clothes-Changing Person ReID Model using Gallery Enrichment with Face Features

    Authors: Daniel Arkushin, Bar Cohen, Shmuel Peleg, Ohad Fried

    Abstract: In the Clothes-Changing Re-Identification (CC-ReID) problem, given a query sample of a person, the goal is to determine the correct identity based on a labeled gallery in which the person appears in different clothes. Several models tackle this challenge by extracting clothes-independent features. However, the performance of these models is still lower for the clothes-changing setting compared to… ▽ More

    Submitted 21 November, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

  17. arXiv:2211.13613  [pdf, other

    cs.CV cs.CL

    Ham2Pose: Animating Sign Language Notation into Pose Sequences

    Authors: Rotem Shalev-Arkushin, Amit Moryossef, Ohad Fried

    Abstract: Translating spoken languages into Sign languages is necessary for open communication between the hearing and hearing-impaired communities. To achieve this goal, we propose the first method for animating a text written in HamNoSys, a lexical Sign language notation, into signed pose sequences. As HamNoSys is universal by design, our proposed method offers a generic solution invariant to the target S… ▽ More

    Submitted 1 April, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

  18. arXiv:2206.02779  [pdf, other

    cs.CV cs.GR cs.LG

    Blended Latent Diffusion

    Authors: Omri Avrahami, Ohad Fried, Dani Lischinski

    Abstract: The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models has finally enabled text-based interfaces for creating and editing images. Handling generic images requires a diverse underlying generative model, hence the latest works utilize diffusion models, which were shown to surpass GANs in terms of diversity. One major drawback of… ▽ More

    Submitted 30 April, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted to SIGGRAPH 2023. Project page: https://omriavrahami.com/blended-latent-diffusion-page/

  19. arXiv:2203.16626  [pdf, other

    cs.CV cs.GR

    DDNeRF: Depth Distribution Neural Radiance Fields

    Authors: David Dadon, Ohad Fried, Yacov Hel-Or

    Abstract: In recent years, the field of implicit neural representation has progressed significantly. Models such as neural radiance fields (NeRF), which uses relatively small neural networks, can represent high-quality scenes and achieve state-of-the-art results for novel view synthesis. Training these types of networks, however, is still computationally very expensive. We present depth distribution neural… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  20. arXiv:2203.15926  [pdf, other

    cs.CV cs.GR

    Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images

    Authors: Ayush Tewari, Mallikarjun B R, Xingang Pan, Ohad Fried, Maneesh Agrawala, Christian Theobalt

    Abstract: Learning 3D generative models from a dataset of monocular images enables self-supervised 3D reasoning and controllable synthesis. State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis. Images are synthesized by rendering the volumes from a given camera. These models can disentangle the 3D scene from the camera viewpoint in any generated image.… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  21. arXiv:2203.15065  [pdf, other

    cs.CV

    DeepShadow: Neural Shape from Shadow

    Authors: Asaf Karnieli, Ohad Fried, Yacov Hel-Or

    Abstract: This paper presents DeepShadow, a one-shot method for recovering the depth map and surface normals from photometric stereo shadow maps. Previous works that try to recover the surface normals from photometric stereo images treat cast shadows as a disturbance. We show that the self and cast shadows not only do not disturb 3D reconstruction, but can be used alone, as a strong learning signal, to reco… ▽ More

    Submitted 30 October, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: ECCV 2022. Project page available at https://asafkar.github.io/deepshadow/

  22. Blended Diffusion for Text-driven Editing of Natural Images

    Authors: Omri Avrahami, Dani Lischinski, Ohad Fried

    Abstract: Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, wit… ▽ More

    Submitted 28 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: CVPR 2022. Code is available at: https://omriavrahami.com/blended-diffusion-page/

  23. arXiv:2106.03847  [pdf, other

    cs.LG cs.CV cs.GR cs.NE

    GAN Cocktail: mixing GANs without dataset access

    Authors: Omri Avrahami, Dani Lischinski, Ohad Fried

    Abstract: Today's generative models are capable of synthesizing high-fidelity images, but each model specializes on a specific target domain. This raises the need for model merging: combining two or more pretrained generative models into a single unified one. In this work we tackle the problem of model merging, given two constraints that often come up in the real world: (1) no access to the original trainin… ▽ More

    Submitted 11 July, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: ECCV 2022. Project page is available at: https://omriavrahami.com/GAN-cocktail-page/

  24. Endless Loops: Detecting and Animating Periodic Patterns in Still Images

    Authors: Tavi Halperin, Hanit Hakim, Orestis Vantzos, Gershon Hochman, Netai Benaim, Lior Sassy, Michael Kupchik, Ofir Bibi, Ohad Fried

    Abstract: We present an algorithm for producing a seamless animated loop from a single image. The algorithm detects periodic structures, such as the windows of a building or the steps of a staircase, and generates a non-trivial displacement vector field that maps each segment of the structure onto a neighboring segment along a user- or auto-selected main direction of motion. This displacement field is used,… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: SIGGRAPH 2021. Project page: https://pub.res.lightricks.com/endless-loops/ . Video: https://youtu.be/8ZYUvxWuD2Y

    Journal ref: ACM Trans. Graph., Vol. 40, No. 4, Article 142. Publication date: August 2021

  25. arXiv:2011.10688  [pdf, other

    cs.CV cs.GR cs.MM

    Iterative Text-based Editing of Talking-heads Using Neural Retargeting

    Authors: Xinwei Yao, Ohad Fried, Kayvon Fatahalian, Maneesh Agrawala

    Abstract: We present a text-based tool for editing talking-head video that enables an iterative editing workflow. On each iteration users can edit the wording of the speech, further refine mouth motions if necessary to reduce artifacts and manipulate non-verbal aspects of the performance by inserting mouth gestures (e.g. a smile) or changing the overall performance style (e.g. energetic, mumble). Our tool r… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Project Website is https://davidyao.me/projects/text2vid

  26. arXiv:2004.03805  [pdf, other

    cs.CV cs.GR

    State of the Art on Neural Rendering

    Authors: Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner, Rohit Pandey, Sean Fanello, Gordon Wetzstein, Jun-Yan Zhu, Christian Theobalt, Maneesh Agrawala, Eli Shechtman, Dan B Goldman, Michael Zollhöfer

    Abstract: Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: Eurographics 2020 survey paper

  27. arXiv:2003.09764  [pdf, other

    cs.CV

    Lifespan Age Transformation Synthesis

    Authors: Roy Or-El, Soumyadip Sengupta, Ohad Fried, Eli Shechtman, Ira Kemelmacher-Shlizerman

    Abstract: We address the problem of single photo age progression and regression-the prediction of how a person might look in the future, or how they looked in the past. Most existing aging methods are limited to changing the texture, overlooking transformations in head shape that occur during the human aging and growth process. This limits the applicability of previous methods to aging of adults to slightly… ▽ More

    Submitted 24 July, 2020; v1 submitted 21 March, 2020; originally announced March 2020.

    Comments: ECCV 2020 Camera-Ready version. Main Changes: 1. Added Ethics & Bias statement in the supplementary material 2. Comparison figures to PyGAN [46] and S2GAN [13] were removed due to copyright issues. These figures can be found in the project's webpage (link is provided in the paper). 3. Added links to the code and dataset (Github)

  28. arXiv:1906.01524  [pdf, other

    cs.CV cs.GR cs.LG

    Text-based Editing of Talking-head Video

    Authors: Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, Maneesh Agrawala

    Abstract: Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video wi… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: A version with higher resolution images can be downloaded from the authors' website

  29. arXiv:1902.04285  [pdf, other

    cs.GR cs.SD

    Puppet Dubbing

    Authors: Ohad Fried, Maneesh Agrawala

    Abstract: Dubbing puppet videos to make the characters (e.g. Kermit the Frog) convincingly speak a new speech track is a popular activity with many examples of well-known puppets speaking lines from films or singing rap songs. But manually aligning puppet mouth movements to match a new speech track is tedious as each syllable of the speech must match a closed-open-closed segment of mouth movement for the du… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

    Journal ref: Eurographics Symposium on Rendering, 2019

  30. arXiv:1807.03130  [pdf, other

    cs.CV

    Unsupervised Natural Image Patch Learning

    Authors: Dov Danon, Hadar Averbuch-Elor, Ohad Fried, Daniel Cohen-Or

    Abstract: Learning a metric of natural image patches is an important tool for analyzing images. An efficient means is to train a deep network to map an image patch to a vector space, in which the Euclidean distance reflects patch similarity. Previous attempts learned such an embedding in a supervised manner, requiring the availability of many annotated images. In this paper, we present an unsupervised embed… ▽ More

    Submitted 28 June, 2018; originally announced July 2018.