Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: de Wynter, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15409  [pdf, other

    cs.CL

    Awes, Laws, and Flaws From Today's LLM Research

    Authors: Adrian de Wynter

    Abstract: We perform a critical examination of the scientific methodology behind contemporary large language model (LLM) research. For this we assess over 2,000 research works based on criteria typical of what is considered good research (e.g. presence of statistical tests and reproducibility) and cross-validate it with arguments that are at the centre of controversy (e.g., claims of emergent behaviour, the… ▽ More

    Submitted 29 August, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: Under review -- v1 was an old draft with an unrevised abstract (oops)

  2. arXiv:2404.14397  [pdf, other

    cs.CL cs.CY cs.LG

    RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

    Authors: Adrian de Wynter, Ishaan Watts, Nektar Ege Altıntoprak, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanović, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yueh Tsao , et al. (8 additional authors not shown)

    Abstract: Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transc… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Work in progress

  3. arXiv:2404.01230  [pdf, other

    cs.CL

    LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models

    Authors: Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei

    Abstract: This paper presents a comprehensive survey of the current status and opportunities for Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. Strategic reasoning is distinguished by its focus on the dynamic and uncertain nature of interact… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures

  4. arXiv:2403.05468  [pdf, other

    cs.CL cs.AI cs.CV

    Will GPT-4 Run DOOM?

    Authors: Adrian de Wynter

    Abstract: We show that GPT-4's reasoning and planning capabilities extend to the 1993 first-person shooter Doom. This large language model (LLM) is able to run and play the game with only a few instructions, plus a textual description--generated by the model itself from screenshots--about the state of the game being observed. We find that GPT-4 can play the game to a passable degree: it is able to manipulat… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2312.06562  [pdf, other

    cs.CL cs.AI cs.LG math.CT

    On Meta-Prompting

    Authors: Adrian de Wynter, Xun Wang, Qilong Gu, Si-Qing Chen

    Abstract: Certain statistical models are capable of interpreting input strings as instructions, or prompts, and carry out tasks based on them. Many approaches to prompting and pre-training these models involve the automated generation of these prompts. We call these approaches meta-prompting, or prompting to obtain prompts. We propose a theoretical framework based on category theory to generalize and descri… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  6. arXiv:2309.16938  [pdf, other

    cs.CL

    "I'd Like to Have an Argument, Please": Argumentative Reasoning in Large Language Models

    Authors: Adrian de Wynter, Tangming Yuan

    Abstract: We evaluate two large language models (LLMs) ability to perform argumentative reasoning. We experiment with argument mining (AM) and argument pair extraction (APE), and evaluate the LLMs' ability to recognize arguments under progressively more abstract input and output (I/O) representations (e.g., arbitrary label sets, graphs, etc.). Unlike the well-known evaluation of prompt phrasings, abstractio… ▽ More

    Submitted 10 June, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted to COMMA '24. Final, peer-reviewed version to appear in the proceedings

  7. arXiv:2309.07462  [pdf, other

    cs.CL

    Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

    Authors: Rishav Hada, Varun Gumma, Adrian de Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, Sunayana Sitaram

    Abstract: Large Language Models (LLMs) excel in various Natural Language Processing (NLP) tasks, yet their evaluation, particularly in languages beyond the top $20$, remains inadequate due to existing benchmarks and metrics limitations. Employing LLMs as evaluators to rank or score other models' outputs emerges as a viable solution, addressing the constraints tied to human annotators and established benchma… ▽ More

    Submitted 13 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to EACL 2024 findings

  8. arXiv:2308.07556  [pdf, other

    cs.CL cs.LG

    A User-Centered Evaluation of Spanish Text Simplification

    Authors: Adrian de Wynter, Anthony Hevia, Si-Qing Chen

    Abstract: We present an evaluation of text simplification (TS) in Spanish for a production system, by means of two corpora focused in both complex-sentence and complex-word identification. We compare the most prevalent Spanish-specific readability scores with neural networks, and show that the latter are consistently better at predicting user preferences regarding TS. As part of our analysis, we find that m… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Data at https://github.com/microsoft/BrevE-CLaro

  9. An Evaluation on Large Language Model Outputs: Discourse and Memorization

    Authors: Adrian de Wynter, Xun Wang, Alex Sokolov, Qilong Gu, Si-Qing Chen

    Abstract: We present an empirical evaluation of various outputs generated by nine of the most widely-available large language models (LLMs). Our analysis is done with off-the-shelf, readily-available tools. We find a correlation between percentage of memorized text, percentage of unique text, and overall output quality, when measured with respect to output pathologies such as counterfactual and logically-fl… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Preprint. Under review

  10. Turing Completeness and Sid Meier's Civilization

    Authors: Adrian de Wynter

    Abstract: We prove that three strategy video games from the Sid Meier's Civilization series: Sid Meier's Civilization: Beyond Earth, Sid Meier's Civilization V, and Sid Meier's Civilization VI, are Turing complete. We achieve this by building three universal Turing machines-one for each game-using only the elements present in the games, and using their internal rules and mechanics as the transition function… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: Preprint

    Journal ref: IEEE Transactions on Games (Volume: 15, Issue: 2, June 2023)

  11. arXiv:2010.10499  [pdf, other

    cs.CL cs.LG

    Optimal Subarchitecture Extraction For BERT

    Authors: Adrian de Wynter, Daniel J. Perry

    Abstract: We extract an optimal subset of architectural parameters for the BERT architecture from Devlin et al. (2018) by applying recent breakthroughs in algorithms for neural architecture search. This optimal subset, which we refer to as "Bort", is demonstrably smaller, having an effective (that is, not counting the embedding layer) size of $5.5\%$ the original BERT-large architecture, and $16\%$ of the n… ▽ More

    Submitted 6 November, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: Preprint. Under review. Corrected typos on v2

  12. arXiv:2010.08542  [pdf, other

    cs.CL cs.CR cs.LG

    Mischief: A Simple Black-Box Attack Against Transformer Architectures

    Authors: Adrian de Wynter

    Abstract: We introduce Mischief, a simple and lightweight method to produce a class of human-readable, realistic adversarial examples for language models. We perform exhaustive experimentations of our algorithm on four transformer-based architectures, across a variety of downstream tasks, as well as under varying concentrations of said examples. Our findings show that the presence of Mischief-generated adve… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: Technical report

  13. arXiv:2010.08512  [pdf, ps, other

    cs.LG cs.DS

    An Approximation Algorithm for Optimal Subarchitecture Extraction

    Authors: Adrian de Wynter

    Abstract: We consider the problem of finding the set of architectural parameters for a chosen deep neural network which is optimal under three metrics: parameter size, inference speed, and error rate. In this paper we state the problem formally, and present an approximation algorithm that, for a large subset of instances behaves like an FPTAS with an approximation error of $ρ\leq |{1- ε}|$, and that runs in… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: Preprint. Under review. Original submission does not present the bibliography issues from this version

  14. arXiv:2010.07990  [pdf, ps, other

    cs.LG cs.AI cs.DS

    An Algorithm for Learning Smaller Representations of Models With Scarce Data

    Authors: Adrian de Wynter

    Abstract: We present a greedy algorithm for solving binary classification problems in situations where the dataset is either too small or not fully representative of the problem being solved, and obtaining more data is not possible. This algorithm is of particular interest when training small models that have trouble generalizing. It relies on a trained model with loose accuracy constraints, an iterative hy… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Comments: Preprint. Under review

  15. arXiv:1908.09942  [pdf, ps, other

    cs.LG cs.CC cs.NE stat.ML

    On the Bounds of Function Approximations

    Authors: Adrian de Wynter

    Abstract: Within machine learning, the subfield of Neural Architecture Search (NAS) has recently garnered research attention due to its ability to improve upon human-designed models. However, the computational requirements for finding an exact solution to this problem are often intractable, and the design of the search space still requires manual intervention. In this paper we attempt to establish a formali… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

    Comments: Accepted as a full paper at ICANN 2019. The final, authenticated publication will be available at https://doi.org/10.1007/978-3-030-30487-4_32

    Journal ref: In: Tetko, I. V. et al. (eds.) ICANN 2019. LNCS, vol 11727. Springer, Heidelberg, pp. 401-417

  16. arXiv:1908.09936  [pdf, other

    cs.LG cs.CL stat.ML

    Leveraging External Knowledge for Out-Of-Vocabulary Entity Labeling

    Authors: Adrian de Wynter, Lambert Mathias

    Abstract: Dealing with previously unseen slots is a challenging problem in a real-world multi-domain dialogue state tracking task. Other approaches rely on predefined mappings to generate candidate slot keys, as well as their associated values. This, however, may fail when the key, the value, or both, are not seen during training. To address this problem we introduce a neural network that leverages external… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

    Comments: 8 pages