Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Matoshi, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.09237  [pdf, other

    cs.CL cs.AI cs.LG

    One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support

    Authors: Ronja Stern, Vishvaksenan Rasiah, Veton Matoshi, Srinanda Brügger Bose, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho, Joel Niklaus

    Abstract: Recent strides in Large Language Models (LLMs) have saturated many Natural Language Processing (NLP) benchmarks, emphasizing the need for more challenging ones to properly assess LLM capabilities. However, domain-specific and multilingual benchmarks are rare because they require in-depth expertise to develop. Still, most public models are trained predominantly on English corpora, while other langu… ▽ More

    Submitted 21 August, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    MSC Class: 68T50 ACM Class: I.2

  2. arXiv:2306.02069  [pdf, other

    cs.CL cs.AI cs.LG

    MultiLegalPile: A 689GB Multilingual Legal Corpus

    Authors: Joel Niklaus, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho

    Abstract: Large, high-quality datasets are crucial for training Large Language Models (LLMs). However, so far, there are few datasets available for specialized critical domains such as law and the available ones are often only for the English language. We curate and release MultiLegalPile, a 689GB corpus in 24 languages from 17 jurisdictions. The MultiLegalPile corpus, which includes diverse legal data sour… ▽ More

    Submitted 19 May, 2024; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2024

    MSC Class: 68T50 ACM Class: I.2

  3. LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain

    Authors: Joel Niklaus, Veton Matoshi, Pooja Rani, Andrea Galassi, Matthias Stürmer, Ilias Chalkidis

    Abstract: Lately, propelled by the phenomenal advances around the transformer architecture, the legal NLP field has enjoyed spectacular growth. To measure progress, well curated and challenging benchmarks are crucial. However, most benchmarks are English only and in legal NLP specifically there is no multilingual benchmark available yet. Additionally, many benchmarks are saturated, with the best models clea… ▽ More

    Submitted 8 January, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Published at EMNLP Findings 2023

    MSC Class: 68T50 ACM Class: I.2

    Journal ref: EMNLP Findings 2023