-
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Authors:
Varun Magesh,
Faiz Surani,
Matthew Dahl,
Mirac Suzgun,
Christopher D. Manning,
Daniel E. Ho
Abstract:
Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, c…
▽ More
Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, certain legal research providers have touted methods such as retrieval-augmented generation (RAG) as "eliminating" (Casetext, 2023) or "avoid[ing]" hallucinations (Thomson Reuters, 2023), or guaranteeing "hallucination-free" legal citations (LexisNexis, 2023). Because of the closed nature of these systems, systematically assessing these claims is challenging. In this article, we design and report on the first preregistered empirical evaluation of AI-driven legal research tools. We demonstrate that the providers' claims are overstated. While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time. We also document substantial differences between systems in responsiveness and accuracy. Our article makes four key contributions. It is the first to assess and report the performance of RAG-based proprietary legal AI tools. Second, it introduces a comprehensive, preregistered dataset for identifying and understanding vulnerabilities in these systems. Third, it proposes a clear typology for differentiating between hallucinations and accurate legal responses. Last, it provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs, which remains a central open question for the responsible integration of AI into law.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE
Authors:
Christian Møller Dahl,
Torben Johansen,
Christian Vedel
Abstract:
This paper introduces a new tool, OccCANINE, to automatically transform occupational descriptions into the HISCO classification system. The manual work involved in processing and classifying occupational descriptions is error-prone, tedious, and time-consuming. We finetune a preexisting language model (CANINE) to do this automatically, thereby performing in seconds and minutes what previously took…
▽ More
This paper introduces a new tool, OccCANINE, to automatically transform occupational descriptions into the HISCO classification system. The manual work involved in processing and classifying occupational descriptions is error-prone, tedious, and time-consuming. We finetune a preexisting language model (CANINE) to do this automatically, thereby performing in seconds and minutes what previously took days and weeks. The model is trained on 14 million pairs of occupational descriptions and HISCO codes in 13 different languages contributed by 22 different sources. Our approach is shown to have accuracy, recall, and precision above 90 percent. Our tool breaks the metaphorical HISCO barrier and makes this data readily available for analysis of occupational structures with broad applicability in economics, economic history, and various related disciplines.
△ Less
Submitted 2 April, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Over-the-Air Federated Learning with Phase Noise: Analysis and Countermeasures
Authors:
Martin Dahl,
Erik G. Larsson
Abstract:
Wirelessly connected devices can collaborately train a machine learning model using federated learning, where the aggregation of model updates occurs using over-the-air computation. Carrier frequency offset caused by imprecise clocks in devices will cause the phase of the over-the-air channel to drift randomly, such that late symbols in a coherence block are transmitted with lower quality than ear…
▽ More
Wirelessly connected devices can collaborately train a machine learning model using federated learning, where the aggregation of model updates occurs using over-the-air computation. Carrier frequency offset caused by imprecise clocks in devices will cause the phase of the over-the-air channel to drift randomly, such that late symbols in a coherence block are transmitted with lower quality than early symbols. To mitigate the effect of degrading symbol quality, we propose a scheme where one of the permutations Roll, Flip and Sort are applied on gradients before transmission. Through simulations we show that the permutations can both improve and degrade learning performance. Furthermore, we derive the expectation and variance of the gradient estimate, which is shown to grow exponentially with the number of symbols in a coherence block.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models
Authors:
Matthew Dahl,
Varun Magesh,
Mirac Suzgun,
Daniel E. Ho
Abstract:
Do large language models (LLMs) know the law? These models are increasingly being used to augment legal practice, education, and research, yet their revolutionary potential is threatened by the presence of hallucinations -- textual output that is not consistent with legal facts. We present the first systematic evidence of these hallucinations, documenting LLMs' varying performance across jurisdict…
▽ More
Do large language models (LLMs) know the law? These models are increasingly being used to augment legal practice, education, and research, yet their revolutionary potential is threatened by the presence of hallucinations -- textual output that is not consistent with legal facts. We present the first systematic evidence of these hallucinations, documenting LLMs' varying performance across jurisdictions, courts, time periods, and cases. Our work makes four key contributions. First, we develop a typology of legal hallucinations, providing a conceptual framework for future research in this area. Second, we find that legal hallucinations are alarmingly prevalent, occurring between 58% of the time with ChatGPT 4 and 88% with Llama 2, when these models are asked specific, verifiable questions about random federal court cases. Third, we illustrate that LLMs often fail to correct a user's incorrect legal assumptions in a contra-factual question setup. Fourth, we provide evidence that LLMs cannot always predict, or do not always know, when they are producing legal hallucinations. Taken together, our findings caution against the rapid and unsupervised integration of popular LLMs into legal tasks. Even experienced lawyers must remain wary of legal hallucinations, and the risks are highest for those who stand to benefit from LLMs the most -- pro se litigants or those without access to traditional legal resources.
△ Less
Submitted 21 June, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access
Authors:
Zheng Chen,
Martin Dahl,
Erik G. Larsson
Abstract:
In this work, we focus on the communication aspect of decentralized learning, which involves multiple agents training a shared machine learning model using decentralized stochastic gradient descent (D-SGD) over distributed data. In particular, we investigate the impact of broadcast transmission and probabilistic random access policy on the convergence performance of D-SGD, considering the broadcas…
▽ More
In this work, we focus on the communication aspect of decentralized learning, which involves multiple agents training a shared machine learning model using decentralized stochastic gradient descent (D-SGD) over distributed data. In particular, we investigate the impact of broadcast transmission and probabilistic random access policy on the convergence performance of D-SGD, considering the broadcast nature of wireless channels and the link dynamics in the communication topology. Our results demonstrate that optimizing the access probability to maximize the expected number of successful links is a highly effective strategy for accelerating the system convergence.
△ Less
Submitted 7 July, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
DARE: A large-scale handwritten date recognition system
Authors:
Christian M. Dahl,
Torben S. D. Johansen,
Emil N. Sørensen,
Christian E. Westermann,
Simon F. Wittrock
Abstract:
Handwritten text recognition for historical documents is an important task but it remains difficult due to a lack of sufficient training data in combination with a large variability of writing styles and degradation of historical documents. While recurrent neural network architectures are commonly used for handwritten text recognition, they are often computationally expensive to train and the bene…
▽ More
Handwritten text recognition for historical documents is an important task but it remains difficult due to a lack of sufficient training data in combination with a large variability of writing styles and degradation of historical documents. While recurrent neural network architectures are commonly used for handwritten text recognition, they are often computationally expensive to train and the benefit of recurrence drastically differs by task. For these reasons, it is important to consider non-recurrent architectures. In the context of handwritten date recognition, we propose an architecture based on the EfficientNetV2 class of models that is fast to train, robust to parameter choices, and accurately transcribes handwritten dates from a number of sources. For training, we introduce a database containing almost 10 million tokens, originating from more than 2.2 million handwritten dates which are segmented from different historical documents. As dates are some of the most common information on historical documents, and with historical archives containing millions of such documents, the efficient and automatic transcription of dates has the potential to lead to significant cost-savings over manual transcription. We show that training on handwritten text with high variability in writing styles result in robust models for general handwritten text recognition and that transfer learning from the DARE system increases transcription accuracy substantially, allowing one to obtain high accuracy even when using a relatively small training sample.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Applications of Machine Learning in Document Digitisation
Authors:
Christian M. Dahl,
Torben S. D. Johansen,
Emil N. Sørensen,
Christian E. Westermann,
Simon F. Wittrock
Abstract:
Data acquisition forms the primary step in all empirical research. The availability of data directly impacts the quality and extent of conclusions and insights. In particular, larger and more detailed datasets provide convincing answers even to complex research questions. The main problem is that 'large and detailed' usually implies 'costly and difficult', especially when the data medium is paper…
▽ More
Data acquisition forms the primary step in all empirical research. The availability of data directly impacts the quality and extent of conclusions and insights. In particular, larger and more detailed datasets provide convincing answers even to complex research questions. The main problem is that 'large and detailed' usually implies 'costly and difficult', especially when the data medium is paper and books. Human operators and manual transcription have been the traditional approach for collecting historical data. We instead advocate the use of modern machine learning techniques to automate the digitisation process. We give an overview of the potential for applying machine digitisation for data collection through two illustrative applications. The first demonstrates that unsupervised layout classification applied to raw scans of nurse journals can be used to construct a treatment indicator. Moreover, it allows an assessment of assignment compliance. The second application uses attention-based neural networks for handwritten text recognition in order to transcribe age and birth and death dates from a large collection of Danish death certificates. We describe each step in the digitisation pipeline and provide implementation insights.
△ Less
Submitted 5 February, 2021;
originally announced February 2021.
-
Time Series (re)sampling using Generative Adversarial Networks
Authors:
Christian M. Dahl,
Emil N. Sørensen
Abstract:
We propose a novel bootstrap procedure for dependent data based on Generative Adversarial networks (GANs). We show that the dynamics of common stationary time series processes can be learned by GANs and demonstrate that GANs trained on a single sample path can be used to generate additional samples from the process. We find that temporal convolutional neural networks provide a suitable design for…
▽ More
We propose a novel bootstrap procedure for dependent data based on Generative Adversarial networks (GANs). We show that the dynamics of common stationary time series processes can be learned by GANs and demonstrate that GANs trained on a single sample path can be used to generate additional samples from the process. We find that temporal convolutional neural networks provide a suitable design for the generator and discriminator, and that convincing samples can be generated on the basis of a vector of iid normal noise. We demonstrate the finite sample properties of GAN sampling and the suggested bootstrap using simulations where we compare the performance to circular block bootstrapping in the case of resampling an AR(1) time series processes. We find that resampling using the GAN can outperform circular block bootstrapping in terms of empirical coverage.
△ Less
Submitted 30 January, 2021;
originally announced February 2021.
-
HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition
Authors:
Christian M. Dahl,
Torben Johansen,
Emil N. Sørensen,
Simon Wittrock
Abstract:
Methods for linking individuals across historical data sets, typically in combination with AI based transcription models, are developing rapidly. Probably the single most important identifier for linking is personal names. However, personal names are prone to enumeration and transcription errors and although modern linking methods are designed to handle such challenges, these sources of errors are…
▽ More
Methods for linking individuals across historical data sets, typically in combination with AI based transcription models, are developing rapidly. Probably the single most important identifier for linking is personal names. However, personal names are prone to enumeration and transcription errors and although modern linking methods are designed to handle such challenges, these sources of errors are critical and should be minimized. For this purpose, improved transcription methods and large-scale databases are crucial components. This paper describes and provides documentation for HANA, a newly constructed large-scale database which consists of more than 3.3 million names. The database contain more than 105 thousand unique names with a total of more than 1.1 million images of personal names, which proves useful for transfer learning to other settings. We provide three examples hereof, obtaining significantly improved transcription accuracy on both Danish and US census data. In addition, we present benchmark results for deep learning models automatically transcribing the personal names from the scanned documents. Through making more challenging large-scale databases publicly available we hope to foster more sophisticated, accurate, and robust models for handwritten text recognition.
△ Less
Submitted 10 March, 2022; v1 submitted 22 January, 2021;
originally announced January 2021.
-
Privacy-preserving collaborative machine learning on genomic data using TensorFlow
Authors:
Cheng Hong,
Zhicong Huang,
Wen-jie Lu,
Hunter Qu,
Li Ma,
Morten Dahl,
Jason Mancuso
Abstract:
Machine learning (ML) methods have been widely used in genomic studies. However, genomic data are often held by different stakeholders (e.g. hospitals, universities, and healthcare companies) who consider the data as sensitive information, even though they desire to collaborate. To address this issue, recent works have proposed solutions using Secure Multi-party Computation (MPC), which train on t…
▽ More
Machine learning (ML) methods have been widely used in genomic studies. However, genomic data are often held by different stakeholders (e.g. hospitals, universities, and healthcare companies) who consider the data as sensitive information, even though they desire to collaborate. To address this issue, recent works have proposed solutions using Secure Multi-party Computation (MPC), which train on the decentralized data in a way that the participants could learn nothing from each other beyond the final trained model.
We design and implement several MPC-friendly ML primitives, including class weight adjustment and parallelizable approximation of activation function. In addition, we develop the solution as an extension to TF Encrypted~\citep{dahl2018private}, enabling us to quickly experiment with enhancements of both machine learning techniques and cryptographic protocols while leveraging the advantages of TensorFlow's optimizations. Our implementation compares favorably with state-of-the-art methods, winning first place in Track IV of the iDASH2019 secure genome analysis competition.
△ Less
Submitted 29 February, 2020; v1 submitted 11 February, 2020;
originally announced February 2020.
-
A ROS2 based communication architecture for control in collaborative and intelligent automation systems
Authors:
Endre Erős,
Martin Dahl,
Kristofer Bengtsson,
Atieh Hanna,
Petter Falkman
Abstract:
Collaborative robots are becoming part of intelligent automation systems in modern industry. Development and control of such systems differs from traditional automation methods and consequently leads to new challenges. Thankfully, Robot Operating System (ROS) provides a communication platform and a vast variety of tools and utilities that can aid that development. However, it is hard to use ROS in…
▽ More
Collaborative robots are becoming part of intelligent automation systems in modern industry. Development and control of such systems differs from traditional automation methods and consequently leads to new challenges. Thankfully, Robot Operating System (ROS) provides a communication platform and a vast variety of tools and utilities that can aid that development. However, it is hard to use ROS in large-scale automation systems due to communication issues in a distributed setup, hence the development of ROS2. In this paper, a ROS2 based communication architecture is presented together with an industrial use-case of a collaborative and intelligent automation system.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Sequence Planner - Automated Planning and Control for ROS2-based Collaborative and Intelligent Automation Systems
Authors:
Martin Dahl,
Endre Erös,
Atieh Hanna,
Kristofer Bengtsson,
Petter Falkman
Abstract:
Systems based on the Robot Operating System (ROS) are easy to extend with new on-line algorithms and devices. However, there is relatively little support for coordinating a large number of heterogeneous sub-systems. In this paper we propose an architecture to model and control collaborative and intelligent automation systems in a hierarchical fashion.
Systems based on the Robot Operating System (ROS) are easy to extend with new on-line algorithms and devices. However, there is relatively little support for coordinating a large number of heterogeneous sub-systems. In this paper we propose an architecture to model and control collaborative and intelligent automation systems in a hierarchical fashion.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
A generic framework for privacy preserving deep learning
Authors:
Theo Ryffel,
Andrew Trask,
Morten Dahl,
Bobby Wagner,
Jason Mancuso,
Daniel Rueckert,
Jonathan Passerat-Palmbach
Abstract:
We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Priv…
▽ More
We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Privacy while still exposing a familiar deep learning API to the end-user. We report early results on the Boston Housing and Pima Indian Diabetes datasets. While the privacy features apart from Differential Privacy do not impact the prediction accuracy, the current implementation of the framework introduces a significant overhead in performance, which will be addressed at a later stage of the development. We believe this work is an important milestone introducing the first reliable, general framework for privacy preserving deep learning.
△ Less
Submitted 13 November, 2018; v1 submitted 9 November, 2018;
originally announced November 2018.
-
Private Machine Learning in TensorFlow using Secure Computation
Authors:
Morten Dahl,
Jason Mancuso,
Yann Dupis,
Ben Decoste,
Morgan Giraud,
Ian Livingstone,
Justin Patriquin,
Gavin Uhma
Abstract:
We present a framework for experimenting with secure multi-party computation directly in TensorFlow. By doing so we benefit from several properties valuable to both researchers and practitioners, including tight integration with ordinary machine learning processes, existing optimizations for distributed computation in TensorFlow, high-level abstractions for expressing complex algorithms and protoc…
▽ More
We present a framework for experimenting with secure multi-party computation directly in TensorFlow. By doing so we benefit from several properties valuable to both researchers and practitioners, including tight integration with ordinary machine learning processes, existing optimizations for distributed computation in TensorFlow, high-level abstractions for expressing complex algorithms and protocols, and an expanded set of familiar tooling. We give an open source implementation of a state-of-the-art protocol and report on concrete benchmarks using typical models from private machine learning.
△ Less
Submitted 23 October, 2018; v1 submitted 18 October, 2018;
originally announced October 2018.
-
Gamifying the Escape from the Engineering Method Prison - An Innovative Board Game to Teach the Essence Theory to Future Project Managers and Software Engineers
Authors:
Kai-Kristian Kemell,
Juhani Risku,
Arthur Evensen,
Pekka Abrahamsson,
Aleksander Madsen Dahl,
Lars Henrik Grytten,
Agata Jedryszek,
Petter Rostrup,
Anh Nguyen-Duc
Abstract:
Software Engineering is an engineering discipline but lacks a solid theoretical foundation. One effort in remedying this situation has been the SEMAT Essence specification. Essence consists of a language for modeling Software Engineering (SE) practices and methods and a kernel containing what its authors describe as being elements that are present in every software development project. In practice…
▽ More
Software Engineering is an engineering discipline but lacks a solid theoretical foundation. One effort in remedying this situation has been the SEMAT Essence specification. Essence consists of a language for modeling Software Engineering (SE) practices and methods and a kernel containing what its authors describe as being elements that are present in every software development project. In practice, it is a method agnostic project management tool for SE Projects. Using the language of the specification, Essence can be used to model any software development method or practice. Thus, the specification can potentially be applied to any software development context, making it a powerful tool. However, due to the manual work and the learning process involved in modeling practices with Essence, its initial adoption can be tasking for development teams. Due to the importance of project management in SE projects, new project management tools such as Essence are valuable, and facilitating their adoption is consequently important. To tackle this issue in the case of Essence, we present a game-based approach to teaching the use Essence. In this paper, we gamify the learning process by means of an innovative board game. The game is empirically validated in a study involving students from the IT faculty of University of Jyväskylä (n=61). Based on the results, we report the effectiveness of the game-based approach to teaching both Essence and SE project work.
△ Less
Submitted 23 September, 2018;
originally announced September 2018.