SaulLM-7B: A pioneering Large Language Model for Law
Authors:
Pierre Colombo,
Telmo Pessoa Pires,
Malik Boudiaf,
Dominic Culver,
Rui Melo,
Caio Corro,
Andre F. T. Martins,
Fabrizio Esposito,
Vera Lúcia Raposo,
Sofia Morgado,
Michael Desa
Abstract:
In this paper, we introduce SaulLM-7B, a large language model (LLM) tailored for the legal domain. With 7 billion parameters, SaulLM-7B is the first LLM designed explicitly for legal text comprehension and generation. Leveraging the Mistral 7B architecture as its foundation, SaulLM-7B is trained on an English legal corpus of over 30 billion tokens. SaulLM-7B exhibits state-of-the-art proficiency i…
▽ More
In this paper, we introduce SaulLM-7B, a large language model (LLM) tailored for the legal domain. With 7 billion parameters, SaulLM-7B is the first LLM designed explicitly for legal text comprehension and generation. Leveraging the Mistral 7B architecture as its foundation, SaulLM-7B is trained on an English legal corpus of over 30 billion tokens. SaulLM-7B exhibits state-of-the-art proficiency in understanding and processing legal documents. Additionally, we present a novel instructional fine-tuning method that leverages legal datasets to further enhance SaulLM-7B's performance in legal tasks. SaulLM-7B is released under the MIT License.
△ Less
Submitted 7 March, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.