Zum Hauptinhalt springen

Showing 1–12 of 12 results for author: Harari, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.00498  [pdf, other

    cs.CV

    How Effective are Self-Supervised Models for Contact Identification in Videos

    Authors: Malitha Gunawardhana, Limalka Sadith, Liel David, Daniel Harari, Muhammad Haris Khan

    Abstract: The exploration of video content via Self-Supervised Learning (SSL) models has unveiled a dynamic field of study, emphasizing both the complex challenges and unique opportunities inherent in this area. Despite the growing body of research, the ability of SSL models to detect physical contacts in videos remains largely unexplored, particularly the effectiveness of methods such as downstream supervi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 15 pages, 6 figures

  2. arXiv:2403.02782  [pdf, other

    cs.CV

    Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

    Authors: Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Renqiang Min, Daniel Harari, Muhammad Haris Khan

    Abstract: In this paper, we explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual outcome, as depicted in real-life instructional videos. Existing works have attained partial success by extensively leveraging various sources of information av… ▽ More

    Submitted 15 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures, (supplementary material: 9 pages, 5 figures), accepted to CVPR 2024

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 , Pages 18816-18826

  3. arXiv:2110.08744  [pdf

    cs.AI q-bio.NC

    A model for full local image interpretation

    Authors: Guy Ben-Yosef, Liav Assif, Daniel Harari, Shimon Ullman

    Abstract: We describe a computational model of humans' ability to provide a detailed interpretation of components in a scene. Humans can identify in an image meaningful components almost everywhere, and identifying these components is an essential part of the visual process, and of understanding the surrounding scene and its potential meaning to the viewer. Detailed interpretation is beyond the scope of cur… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: Published in the Proceedings of the 37th Annual Meeting of the Cognitive Science Society (CogSci), 2015

    Journal ref: https://cogsci.mindmodeling.org/2015/papers/0048/

  4. arXiv:2109.13445  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC stat.ML

    Emergent Neural Network Mechanisms for Generalization to Objects in Novel Orientations

    Authors: Avi Cooper, Xavier Boix, Daniel Harari, Spandan Madan, Hanspeter Pfister, Tomotake Sasaki, Pawan Sinha

    Abstract: The capability of Deep Neural Networks (DNNs) to recognize objects in orientations outside the distribution of the training data is not well understood. We present evidence that DNNs are capable of generalizing to objects in novel orientations by disseminating orientation-invariance obtained from familiar objects seen from many viewpoints. This capability strengthens when training the DNN with an… ▽ More

    Submitted 13 July, 2023; v1 submitted 27 September, 2021; originally announced September 2021.

  5. arXiv:2006.05249  [pdf

    q-bio.NC cs.AI cs.CV

    What takes the brain so long: Object recognition at the level of minimal images develops for up to seconds of presentation time

    Authors: Hanna Benoni, Daniel Harari, Shimon Ullman

    Abstract: Rich empirical evidence has shown that visual object recognition in the brain is fast and effortless, with relevant brain signals reported to start as early as 80 ms. Here we study the time trajectory of the recognition process at the level of minimal recognizable images (termed MIRC). These are images that can be recognized reliably, but in which a minute change of the image (reduction by either… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: 7 pages, 2 figures, 1 table

  6. arXiv:1812.05455  [pdf

    cs.CV

    Using Motion and Internal Supervision in Object Recognition

    Authors: Daniel Harari

    Abstract: In this thesis we address two related aspects of visual object recognition: the use of motion information, and the use of internal supervision, to help unsupervised learning. These two aspects are inter-related in the current study, since image motion is used for internal supervision, via the detection of spatiotemporal events of active-motion and the use of tracking. Most current work in object r… ▽ More

    Submitted 13 December, 2018; originally announced December 2018.

    Comments: PhD dissertation, 87 pages, 51 figures, 7 tables

  7. arXiv:1804.04604  [pdf, other

    cs.CV cs.AI cs.RO q-bio.NC

    Discovery and usage of joint attention in images

    Authors: Daniel Harari, Joshua B. Tenenbaum, Shimon Ullman

    Abstract: Joint visual attention is characterized by two or more individuals looking at a common target at the same time. The ability to identify joint attention in scenes, the people involved, and their common target, is fundamental to the understanding of social interactions, including others' intentions and goals. In this work we deal with the extraction of joint attention events, and the use of such eve… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: 6 pages, 3 figures

  8. arXiv:1804.03576  [pdf, other

    cs.CV

    Large Field and High Resolution: Detecting Needle in Haystack

    Authors: Hadar Gorodissky, Daniel Harari, Shimon Ullman

    Abstract: The growing use of convolutional neural networks (CNN) for a broad range of visual tasks, including tasks involving fine details, raises the problem of applying such networks to a large field of view, since the amount of computations increases significantly with the number of pixels. To deal effectively with this difficulty, we develop and compare methods of using CNNs for the task of small target… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: 15 pages, 7 figures

  9. arXiv:1611.09819  [pdf

    q-bio.NC cs.AI cs.CV cs.LG

    Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

    Authors: Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman

    Abstract: Humans are remarkably adept at interpreting the gaze direction of other individuals in their surroundings. This skill is at the core of the ability to engage in joint visual attention, which is essential for establishing social interactions. How accurate are humans in determining the gaze direction of others in lifelike scenes, when they can move their heads and eyes freely, and what are the sourc… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

    Comments: Daniel Harari and Tao Gao contributed equally to this work

    Report number: Center for Brains, Minds and Machines Memo No. 059

  10. arXiv:1610.09625  [pdf

    q-bio.NC cs.CV cs.LG

    Discovering containment: from infants to machines

    Authors: Shimon Ullman, Nimrod Dorfman, Daniel Harari

    Abstract: Current artificial learning systems can recognize thousands of visual categories, or play Go at a champion"s level, but cannot explain infants learning, in particular the ability to learn complex concepts without guidance, in a specific order. A notable example is the category of 'containers' and the notion of containment, one of the earliest spatial relations to be learned, starting already at 2.… ▽ More

    Submitted 30 October, 2016; originally announced October 2016.

    Journal ref: Cognition 183 (2019) 67-81

  11. arXiv:1603.08079  [pdf, other

    cs.CV cs.AI cs.CL

    Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

    Authors: Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz, Shimon Ullman

    Abstract: Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, r… ▽ More

    Submitted 26 March, 2016; originally announced March 2016.

    Comments: EMNLP 2015

    Journal ref: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015, pages 1477--1487

  12. arXiv:1412.2672  [pdf

    cs.AI cs.CV

    When Computer Vision Gazes at Cognition

    Authors: Tao Gao, Daniel Harari, Joshua Tenenbaum, Shimon Ullman

    Abstract: Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, little is known about human ability to discriminate a third person gaze directed towards objects that… ▽ More

    Submitted 8 December, 2014; originally announced December 2014.

    Comments: Tao Gao and Daniel Harari contributed equally to this work

    Report number: CBMM Memo No. 025, MIT