Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Borenstein, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19238  [pdf, other

    cs.CL cs.CY cs.LG

    Revealing Fine-Grained Values and Opinions in Large Language Models

    Authors: Dustin Wright, Arnav Arora, Nadav Borenstein, Srishti Yadav, Serge Belongie, Isabelle Augenstein

    Abstract: Uncovering latent values and opinions in large language models (LLMs) can help identify biases and mitigate potential harm. Recently, this has been approached by presenting LLMs with survey questions and quantifying their stances towards morally and politically charged statements. However, the stances generated by LLMs can vary greatly depending on how they are prompted, and there are many ways to… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 28 pages, 20 figures, 7 tables

  2. arXiv:2406.04289  [pdf, other

    cs.CL

    What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

    Authors: Nadav Borenstein, Anej Svete, Robin Chan, Josef Valvoda, Franz Nowak, Isabelle Augenstein, Eleanor Chodroff, Ryan Cotterell

    Abstract: What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over strings. While prior work in this direction focused on assessing the theoretical limits, in contrast, we seek to understand the empirical learnability. U… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  3. arXiv:2402.14177  [pdf, other

    cs.SI cs.CY

    Investigating Human Values in Online Communities

    Authors: Nadav Borenstein, Arnav Arora, Lucie-Aimée Kaffee, Isabelle Augenstein

    Abstract: Human values play a vital role as an analytical tool in social sciences, enabling the study of diverse dimensions within society as a whole and among individual communities. This paper addresses the limitations of traditional survey-based studies of human values by proposing a computational application of Schwartz's values framework to Reddit, a platform organized into distinct online communities.… ▽ More

    Submitted 17 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  4. arXiv:2312.12681  [pdf, other

    cs.CL cs.AI

    Imitation of Life: A Search Engine for Biologically Inspired Design

    Authors: Hen Emuna, Nadav Borenstein, Xin Qian, Hyeonsu Kang, Joel Chan, Aniket Kittur, Dafna Shahaf

    Abstract: Biologically Inspired Design (BID), or Biomimicry, is a problem-solving methodology that applies analogies from nature to solve engineering challenges. For example, Speedo engineers designed swimsuits based on shark skin. Finding relevant biological solutions for real-world problems poses significant challenges, both due to the limited biological knowledge engineers and designers typically possess… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: To be published in the AAAI 2024 Proceedings Main Track

  5. arXiv:2311.09000  [pdf, other

    cs.CL

    Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers

    Authors: Yuxia Wang, Revanth Gangi Reddy, Zain Muhammad Mujahid, Arnav Arora, Aleksandr Rubashevskii, Jiahui Geng, Osama Mohammed Afzal, Liangming Pan, Nadav Borenstein, Aditya Pillai, Isabelle Augenstein, Iryna Gurevych, Preslav Nakov

    Abstract: The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present a holistic end-to-end solution for annotating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels concerning the verifiability and factu… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: 30 pages, 13 figures

  6. arXiv:2310.18343  [pdf, other

    cs.CL

    PHD: Pixel-Based Language Modeling of Historical Documents

    Authors: Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein

    Abstract: The digitisation of historical documents has provided historians with unprecedented research opportunities. Yet, the conventional approach to analysing historical documents involves converting them from images to text using OCR, a process that overlooks the potential benefits of treating them as images and introduces high levels of noise. To bridge this gap, we take advantage of recent advancement… ▽ More

    Submitted 4 November, 2023; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to the main conference of EMNLP 2023

  7. arXiv:2305.12376  [pdf, other

    cs.CL cs.CY cs.LG

    Measuring Intersectional Biases in Historical Documents

    Authors: Nadav Borenstein, Karolina Stańczak, Thea Rolskov, Natália da Silva Perez, Natacha Klein Käfer, Isabelle Augenstein

    Abstract: Data-driven analyses of biases in historical texts can help illuminate the origin and development of biases prevailing in modern society. However, digitised historical documents pose a challenge for NLP practitioners as these corpora suffer from errors introduced by optical character recognition (OCR) and are written in an archaic language. In this paper, we investigate the continuities and tran… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL2023

  8. arXiv:2305.10928  [pdf, other

    cs.CL cs.LG

    Multilingual Event Extraction from Historical Newspaper Adverts

    Authors: Nadav Borenstein, Natalia da Silva Perez, Isabelle Augenstein

    Abstract: NLP methods can aid historians in analyzing textual materials in greater volumes than manually feasible. Developing such methods poses substantial challenges though. First, acquiring large, annotated historical datasets is difficult, as only domain experts can reliably label them. Second, most available off-the-shelf NLP models are trained on modern language texts, rendering them significantly les… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to the main track of ACL2023

  9. arXiv:2110.08893  [pdf, other

    cs.CV

    Temporally stable video segmentation without video annotations

    Authors: Aharon Azulay, Tavi Halperin, Orestis Vantzos, Nadav Borenstein, Ofir Bibi

    Abstract: Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmen… ▽ More

    Submitted 17 March, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3449-3458. 2022

  10. arXiv:2106.03048  [pdf, other

    cs.CL cs.AI

    How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements

    Authors: Chen Shani, Nadav Borenstein, Dafna Shahaf

    Abstract: Humor is an important social phenomenon, serving complex social and psychological functions. However, despite being studied for millennia humor is computationally not well understood, often considered an AI-complete problem. In this work, we introduce a novel setting in humor mining: automatically detecting funny and unusual scientific papers. We are inspired by the Ig Nobel prize, a satirical pri… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: To be published in the main conference of ACL-IJCNLP2021. Code and dataset can be found here: https://github.com/nadavborenstein/Iggy