Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Dahl, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20362  [pdf, other

    cs.CL cs.CY

    Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

    Authors: Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho

    Abstract: Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, c… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Our dataset, tool outputs, and labels will be made available upon publication. This version of the manuscript (May 30, 2024) is updated to reflect an evaluation of Westlaw's AI-Assisted Research

  2. arXiv:2402.13604  [pdf, other

    cs.CL econ.EM

    Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE

    Authors: Christian Møller Dahl, Torben Johansen, Christian Vedel

    Abstract: This paper introduces a new tool, OccCANINE, to automatically transform occupational descriptions into the HISCO classification system. The manual work involved in processing and classifying occupational descriptions is error-prone, tedious, and time-consuming. We finetune a preexisting language model (CANINE) to do this automatically, thereby performing in seconds and minutes what previously took… ▽ More

    Submitted 2 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: All code and guides on how to use OccCANINE is available on GitHub https://github.com/christianvedels/OccCANINE

    ACM Class: I.2.7; I.7.0

  3. arXiv:2401.08183  [pdf, ps, other

    eess.SP cs.DC

    Over-the-Air Federated Learning with Phase Noise: Analysis and Countermeasures

    Authors: Martin Dahl, Erik G. Larsson

    Abstract: Wirelessly connected devices can collaborately train a machine learning model using federated learning, where the aggregation of model updates occurs using over-the-air computation. Carrier frequency offset caused by imprecise clocks in devices will cause the phase of the over-the-air channel to drift randomly, such that late symbols in a coherence block are transmitted with lower quality than ear… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted in CISS 2024

  4. arXiv:2401.01301  [pdf, other

    cs.CL cs.AI cs.CY

    Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models

    Authors: Matthew Dahl, Varun Magesh, Mirac Suzgun, Daniel E. Ho

    Abstract: Do large language models (LLMs) know the law? These models are increasingly being used to augment legal practice, education, and research, yet their revolutionary potential is threatened by the presence of hallucinations -- textual output that is not consistent with legal facts. We present the first systematic evidence of these hallucinations, documenting LLMs' varying performance across jurisdict… ▽ More

    Submitted 21 June, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Journal ref: Journal of Legal Analysis 16, no. 1 (2024): 64-93

  5. arXiv:2305.07368  [pdf, other

    cs.NI cs.LG eess.SY

    Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access

    Authors: Zheng Chen, Martin Dahl, Erik G. Larsson

    Abstract: In this work, we focus on the communication aspect of decentralized learning, which involves multiple agents training a shared machine learning model using decentralized stochastic gradient descent (D-SGD) over distributed data. In particular, we investigate the impact of broadcast transmission and probabilistic random access policy on the convergence performance of D-SGD, considering the broadcas… ▽ More

    Submitted 7 July, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: 5 pages, 5 figures, accepted in IEEE SPAWC 2023

  6. arXiv:2210.00503  [pdf, other

    cs.CV

    DARE: A large-scale handwritten date recognition system

    Authors: Christian M. Dahl, Torben S. D. Johansen, Emil N. Sørensen, Christian E. Westermann, Simon F. Wittrock

    Abstract: Handwritten text recognition for historical documents is an important task but it remains difficult due to a lack of sufficient training data in combination with a large variability of writing styles and degradation of historical documents. While recurrent neural network architectures are commonly used for handwritten text recognition, they are often computationally expensive to train and the bene… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

  7. arXiv:2102.03239  [pdf, other

    cs.CV econ.EM stat.ML

    Applications of Machine Learning in Document Digitisation

    Authors: Christian M. Dahl, Torben S. D. Johansen, Emil N. Sørensen, Christian E. Westermann, Simon F. Wittrock

    Abstract: Data acquisition forms the primary step in all empirical research. The availability of data directly impacts the quality and extent of conclusions and insights. In particular, larger and more detailed datasets provide convincing answers even to complex research questions. The main problem is that 'large and detailed' usually implies 'costly and difficult', especially when the data medium is paper… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  8. arXiv:2102.00208  [pdf, other

    cs.LG econ.EM stat.ME stat.ML

    Time Series (re)sampling using Generative Adversarial Networks

    Authors: Christian M. Dahl, Emil N. Sørensen

    Abstract: We propose a novel bootstrap procedure for dependent data based on Generative Adversarial networks (GANs). We show that the dynamics of common stationary time series processes can be learned by GANs and demonstrate that GANs trained on a single sample path can be used to generate additional samples from the process. We find that temporal convolutional neural networks provide a suitable design for… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

  9. arXiv:2101.10862  [pdf, other

    cs.CV econ.EM

    HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition

    Authors: Christian M. Dahl, Torben Johansen, Emil N. Sørensen, Simon Wittrock

    Abstract: Methods for linking individuals across historical data sets, typically in combination with AI based transcription models, are developing rapidly. Probably the single most important identifier for linking is personal names. However, personal names are prone to enumeration and transcription errors and although modern linking methods are designed to handle such challenges, these sources of errors are… ▽ More

    Submitted 10 March, 2022; v1 submitted 22 January, 2021; originally announced January 2021.

  10. arXiv:2002.04344  [pdf, other

    cs.CR

    Privacy-preserving collaborative machine learning on genomic data using TensorFlow

    Authors: Cheng Hong, Zhicong Huang, Wen-jie Lu, Hunter Qu, Li Ma, Morten Dahl, Jason Mancuso

    Abstract: Machine learning (ML) methods have been widely used in genomic studies. However, genomic data are often held by different stakeholders (e.g. hospitals, universities, and healthcare companies) who consider the data as sensitive information, even though they desire to collaborate. To address this issue, recent works have proposed solutions using Secure Multi-party Computation (MPC), which train on t… ▽ More

    Submitted 29 February, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: Description of the winning solution at Track IV of iDASH competition 2019, to be presented at the Trustworthy ML workshop co-located with ICLR2020

  11. arXiv:1905.09654  [pdf

    cs.RO

    A ROS2 based communication architecture for control in collaborative and intelligent automation systems

    Authors: Endre Erős, Martin Dahl, Kristofer Bengtsson, Atieh Hanna, Petter Falkman

    Abstract: Collaborative robots are becoming part of intelligent automation systems in modern industry. Development and control of such systems differs from traditional automation methods and consequently leads to new challenges. Thankfully, Robot Operating System (ROS) provides a communication platform and a vast variety of tools and utilities that can aid that development. However, it is hard to use ROS in… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

    Comments: 9 pages, 4 figures, 3 tables, to be published in the proceedings of 29th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM2019), June 2019

  12. arXiv:1903.05850  [pdf, other

    cs.RO

    Sequence Planner - Automated Planning and Control for ROS2-based Collaborative and Intelligent Automation Systems

    Authors: Martin Dahl, Endre Erös, Atieh Hanna, Kristofer Bengtsson, Petter Falkman

    Abstract: Systems based on the Robot Operating System (ROS) are easy to extend with new on-line algorithms and devices. However, there is relatively little support for coordinating a large number of heterogeneous sub-systems. In this paper we propose an architecture to model and control collaborative and intelligent automation systems in a hierarchical fashion.

    Submitted 14 March, 2019; originally announced March 2019.

    Comments: Submitted to IROS 2019. \c{opyright} 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

  13. arXiv:1811.04017  [pdf, other

    cs.LG cs.CR stat.ML

    A generic framework for privacy preserving deep learning

    Authors: Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, Jonathan Passerat-Palmbach

    Abstract: We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Priv… ▽ More

    Submitted 13 November, 2018; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: PPML 2018, 5 pages

  14. arXiv:1810.08130  [pdf, ps, other

    cs.CR cs.LG

    Private Machine Learning in TensorFlow using Secure Computation

    Authors: Morten Dahl, Jason Mancuso, Yann Dupis, Ben Decoste, Morgan Giraud, Ian Livingstone, Justin Patriquin, Gavin Uhma

    Abstract: We present a framework for experimenting with secure multi-party computation directly in TensorFlow. By doing so we benefit from several properties valuable to both researchers and practitioners, including tight integration with ordinary machine learning processes, existing optimizations for distributed computation in TensorFlow, high-level abstractions for expressing complex algorithms and protoc… ▽ More

    Submitted 23 October, 2018; v1 submitted 18 October, 2018; originally announced October 2018.

  15. Gamifying the Escape from the Engineering Method Prison - An Innovative Board Game to Teach the Essence Theory to Future Project Managers and Software Engineers

    Authors: Kai-Kristian Kemell, Juhani Risku, Arthur Evensen, Pekka Abrahamsson, Aleksander Madsen Dahl, Lars Henrik Grytten, Agata Jedryszek, Petter Rostrup, Anh Nguyen-Duc

    Abstract: Software Engineering is an engineering discipline but lacks a solid theoretical foundation. One effort in remedying this situation has been the SEMAT Essence specification. Essence consists of a language for modeling Software Engineering (SE) practices and methods and a kernel containing what its authors describe as being elements that are present in every software development project. In practice… ▽ More

    Submitted 23 September, 2018; originally announced September 2018.

    Comments: This is the author's version of the work. The copyright holder's version can be found at https://dx.doi.org/10.1109/ICE.2018.8436340, 2018 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC), Stuttgart, 2018