Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Watson, W

Searching in archive cs. Search in all archives.
.
  1. HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies

    Authors: William Watson, Nicole Cho, Tucker Balch, Manuela Veloso

    Abstract: A myriad of different Large Language Models (LLMs) face a common challenge in contextually analyzing table question-answering tasks. These challenges are engendered from (1) finite context windows for large tables, (2) multi-faceted discrepancies amongst tokenization patterns against cell boundaries, and (3) various limitations stemming from data confidentiality in the process of using external mo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023) 7144-7159

  2. Financial Table Extraction in Image Documents

    Authors: William Watson, Bo Liu

    Abstract: Table extraction has long been a pervasive problem in financial services. This is more challenging in the image domain, where content is locked behind cumbersome pixel format. Luckily, advances in deep learning for image segmentation, OCR, and sequence modeling provides the necessary heavy lifting to achieve impressive results. This paper presents an end-to-end pipeline for identifying, extracting… ▽ More

    Submitted 18 March, 2024; originally announced May 2024.

  3. arXiv:2404.13050  [pdf, other

    cs.CL cs.AI

    FlowMind: Automatic Workflow Generation with LLMs

    Authors: Zhen Zeng, William Watson, Nicole Cho, Saba Rahimi, Shayleen Reynolds, Tucker Balch, Manuela Veloso

    Abstract: The rapidly evolving field of Robotic Process Automation (RPA) has made significant strides in automating repetitive processes, yet its effectiveness diminishes in scenarios requiring spontaneous or unpredictable tasks demanded by users. This paper introduces a novel approach, FlowMind, leveraging the capabilities of Large Language Models (LLMs) such as Generative Pretrained Transformer (GPT), to… ▽ More

    Submitted 16 March, 2024; originally announced April 2024.

    Comments: Published in ACM ICAIF 2023

  4. arXiv:2404.12535  [pdf, other

    cs.LG cs.AI cs.CL

    HalluciBot: Is There No Such Thing as a Bad Question?

    Authors: William Watson, Nicole Cho

    Abstract: Hallucination continues to be one of the most critical challenges in the institutional adoption journey of Large Language Models (LLMs). In this context, an overwhelming number of studies have focused on analyzing the post-generation phase - refining outputs via feedback, analyzing logit output values, or deriving clues via the outputs' artifacts. We propose HalluciBot, a model that predicts the p… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  5. arXiv:2404.04003  [pdf, other

    cs.CL

    BuDDIE: A Business Document Dataset for Multi-task Information Extraction

    Authors: Ran Zmigrod, Dongsheng Wang, Mathieu Sibue, Yulong Pei, Petr Babkin, Ivan Brugere, Xiaomo Liu, Nacho Navarro, Antony Papadimitriou, William Watson, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

    Abstract: The field of visually rich document understanding (VRDU) aims to solve a multitude of well-researched NLP tasks in a multi-modal domain. Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia. These datasets cover documents like invoices and receipts with sparse ann… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  6. arXiv:2403.18855  [pdf, other

    cs.SI cs.IR cs.LG

    Directed Criteria Citation Recommendation and Ranking Through Link Prediction

    Authors: William Watson, Lawrence Yong

    Abstract: We explore link prediction as a proxy for automatically surfacing documents from existing literature that might be topically or contextually relevant to a new document. Our model uses transformer-based graph embeddings to encode the meaning of each document, presented as a node within a citation network. We show that the semantic representations that our model generates can outperform other conten… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Extended Abstract at the International Conference of AI in Finance (ICAIF '20)

  7. arXiv:2212.08124  [pdf, other

    cs.HC

    Elasticity Solver in Minecraft for Learning Mechanics of Materials by Gaming

    Authors: Zachariah P. Beck, Brandon Alpert, Alexander J. Bowman, William R. Watson, Adrian Buganza Tepole

    Abstract: Video games have emerged as a medium for learning by creating engaging environments, encouraging creative and deep thinking, and exposing learners to complex problems. Unfortunately, even though there are increasing examples of video games for many basic science and engineering concepts, similar efforts for higher level engineering concepts such as mechanics of materials are still lacking. Here we… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

  8. arXiv:2101.07342  [pdf, other

    eess.IV cs.LG q-bio.QM

    Feature Fusion of Raman Chemical Imaging and Digital Histopathology using Machine Learning for Prostate Cancer Detection

    Authors: Trevor Doherty, Susan McKeever, Nebras Al-Attar, Tiarnan Murphy, Claudia Aura, Arman Rahman, Amanda O'Neill, Stephen P Finn, Elaine Kay, William M. Gallagher, R. William G. Watson, Aoife Gowen, Patrick Jackman

    Abstract: The diagnosis of prostate cancer is challenging due to the heterogeneity of its presentations, leading to the over diagnosis and treatment of non-clinically important disease. Accurate diagnosis can directly benefit a patient's quality of life and prognosis. Towards addressing this issue, we present a learning model for the automatic identification of prostate cancer. While many prostate cancer st… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

    Comments: 19 pages, 8 tables, 18 figures

  9. arXiv:1702.05376  [pdf, other

    cs.AI cs.DM stat.ML

    Towards a Unified Taxonomy of Biclustering Methods

    Authors: Dmitry I. Ignatov, Bruce W. Watson

    Abstract: Being an unsupervised machine learning and data mining technique, biclustering and its multimodal extensions are becoming popular tools for analysing object-attribute data in different domains. Apart from conventional clustering techniques, biclustering is searching for homogeneous groups of objects while keeping their common description, e.g., in binary setting, their shared attributes. In bioinf… ▽ More

    Submitted 17 February, 2017; originally announced February 2017.

    Comments: http://ceur-ws.org/Vol-1552/

    MSC Class: 06B99; 62H30 ACM Class: I.5.3; H.2.8; I.2.6; I.2.4

    Journal ref: Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis (RuZA 2015), November 30 - December 5, 2015, Stellenbosch, South Africa, In CEUR Workshop Proceedings, Vol. 1552, p. 23-39