Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Jorgensen, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.03813  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Activation Steering in Language Models with Mean-Centring

    Authors: Ole Jorgensen, Dylan Cope, Nandi Schoots, Murray Shanahan

    Abstract: Recent work in activation steering has demonstrated the potential to better control the outputs of Large Language Models (LLMs), but it involves finding steering vectors. This is difficult because engineers do not typically know how features are represented in these models. We seek to address this issue by applying the idea of mean-centring to steering vectors. We find that taking the average of a… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  2. arXiv:2310.13439  [pdf, other

    cs.CL cs.AI

    Self-Consistency of Large Language Models under Ambiguity

    Authors: Henning Bartsch, Ole Jorgensen, Domenic Rosati, Jason Hoelscher-Obermaier, Jacob Pfau

    Abstract: Large language models (LLMs) that do not give consistent answers across contexts are problematic when used for tasks with expectations of consistency, e.g., question-answering, explanations, etc. Our work presents an evaluation benchmark for self-consistency in cases of under-specification where two or more answers can be correct. We conduct a series of behavioral experiments on the OpenAI model s… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: BlackboxNLP @ EMNLP 2023