Zum Hauptinhalt springen

Showing 1–50 of 337 results for author: Sophia

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17433  [pdf, other

    cs.CV

    DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model

    Authors: Mona Sheikh Zeinoddin, Chiara Lena, Jiongqi Qu, Luca Carlini, Mattia Magro, Seunghoi Kim, Elena De Momi, Sophia Bano, Matthew Grech-Sollars, Evangelos Mazomenos, Daniel C. Alexander, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam

    Abstract: Robotic-assisted surgery (RAS) relies on accurate depth estimation for 3D reconstruction and visualization. While foundation models like Depth Anything Models (DAM) show promise, directly applying them to surgery often yields suboptimal results. Fully fine-tuning on limited surgical data can cause overfitting and catastrophic forgetting, compromising model robustness and generalization. Although L… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 11 pages

  2. arXiv:2408.16445  [pdf, other

    cs.CV

    Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks

    Authors: Sierra Bonilla, Chiara Di Vece, Rema Daher, Xinwei Ju, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano

    Abstract: Three-dimensional (3D) reconstruction from two-dimensional images is an active research field in computer vision, with applications ranging from navigation and object tracking to segmentation and three-dimensional modeling. Traditionally, parametric techniques have been employed for this task. However, recent advancements have seen a shift towards learning-based methods. Given the rapid pace of re… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 19 pages, 5 figures

  3. arXiv:2408.13518  [pdf, other

    cs.CL cs.AI cs.LG

    Selective Preference Optimization via Token-Level Reward Function Estimation

    Authors: Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou

    Abstract: Recent advancements in large language model alignment leverage token-level supervisions to perform fine-grained preference optimization. However, existing token-level alignment methods either optimize on all available tokens, which can be noisy and inefficient, or perform selective training with complex and expensive key token selection strategies. In this work, we propose Selective Preference Opt… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: Work in progress

  4. arXiv:2408.12603  [pdf

    cs.CY cs.AI

    Sleeper Social Bots: a new generation of AI disinformation bots are already a political threat

    Authors: Jaiv Doshi, Ines Novacic, Curtis Fletcher, Mats Borges, Elea Zhong, Mark C. Marino, Jason Gan, Sophia Mager, Dane Sprague, Melinda Xia

    Abstract: This paper presents a study on the growing threat of "sleeper social bots," AI-driven social bots in the political landscape, created to spread disinformation and manipulate public opinion. We based the name sleeper social bots on their ability to pass as humans on social platforms, where they're embedded like political "sleeper" agents, making them harder to detect and more disruptive. To illustr… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  5. arXiv:2408.12073  [pdf, other

    cs.AR

    Virgo: Cluster-level Matrix Unit Integration in GPUs for Scalability and Energy Efficiency

    Authors: Hansung Kim, Ruohan Yan, Joshua You, Tieliang Vamber Yang, Yakun Sophia Shao

    Abstract: Modern GPUs incorporate specialized matrix units such as Tensor Cores to accelerate GEMM operations central to deep learning workloads. However, existing matrix unit designs are tightly coupled to the SIMT core, limiting the size and energy efficiency of the operation due to capacity and bandwidth constraints from the register file. Such a limitation in scalability makes it difficult to simultaneo… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 13 pages, 13 figures. Under review at ASPLOS 2025

  6. arXiv:2408.11878  [pdf, other

    cs.CL cs.CE q-fin.CP

    Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

    Authors: Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu , et al. (14 additional authors not shown)

    Abstract: Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, table… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 33 pages, 13 figures

  7. arXiv:2408.06356  [pdf, other

    cs.CV

    Enhancing Ecological Monitoring with Multi-Objective Optimization: A Novel Dataset and Methodology for Segmentation Algorithms

    Authors: Sophia J. Abraham, Jin Huang, Brandon RichardWebster, Michael Milford, Jonathan D. Hauenstein, Walter Scheirer

    Abstract: We introduce a unique semantic segmentation dataset of 6,096 high-resolution aerial images capturing indigenous and invasive grass species in Bega Valley, New South Wales, Australia, designed to address the underrepresented domain of ecological data in the computer vision community. This dataset presents a challenging task due to the overlap and distribution of grass species, which is critical for… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

  8. arXiv:2408.04678  [pdf, other

    cs.CL cs.AI cs.DB

    CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

    Authors: Sophia Ho, Jinsol Park, Patrick Wang

    Abstract: We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datast… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  9. arXiv:2408.04289  [pdf, other

    cs.CL

    EMTeC: A Corpus of Eye Movements on Machine-Generated Texts

    Authors: Lena Sophia Bolliger, Patrick Haller, Isabelle Caroline Rose Cretton, David Robert Reich, Tannon Kew, Lena Ann Jäger

    Abstract: The Eye Movements on Machine-Generated Texts Corpus (EMTeC) is a naturalistic eye-movements-while-reading corpus of 107 native English speakers reading machine-generated texts. The texts are generated by three large language models using five different decoding strategies, and they fall into six different text type categories. EMTeC entails the eye movement data at all stages of pre-processing, i.… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  10. arXiv:2408.03408  [pdf, other

    cs.AR cs.LG cs.PL

    LLM-Aided Compilation for Tensor Accelerators

    Authors: Charles Hong, Sahil Bhatia, Altan Haan, Shengjun Kris Dong, Dima Nikiforov, Alvin Cheung, Yakun Sophia Shao

    Abstract: Hardware accelerators, in particular accelerators for tensor processing, have many potential application domains. However, they currently lack the software infrastructure to support the majority of domains outside of deep learning. Furthermore, a compiler that can easily be updated to reflect changes at both application and hardware levels would enable more agile development and design space explo… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 4 page workshop paper

  11. arXiv:2408.02927  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection

    Authors: Yuxin Wang, Duanyu Feng, Yongfu Dai, Zhengyu Chen, Jimin Huang, Sophia Ananiadou, Qianqian Xie, Hao Wang

    Abstract: Data serves as the fundamental foundation for advancing deep learning, particularly tabular data presented in a structured format, which is highly conducive to modeling. However, even in the era of LLM, obtaining tabular data from sensitive domains remains a challenge due to privacy or copyright concerns. Hence, exploring how to effectively use models like LLMs to generate realistic and privacy-pr… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  12. arXiv:2407.15992  [pdf, other

    cs.CL cs.SD eess.AS

    Multimodal Input Aids a Bayesian Model of Phonetic Learning

    Authors: Sophia Zhi, Roger P. Levy, Stephan C. Meylan

    Abstract: One of the many tasks facing the typically-developing child language learner is learning to discriminate between the distinctive sounds that make up words in their native language. Here we investigate whether multimodal information--specifically adult speech coupled with video frames of speakers' faces--benefits a computational model of phonetic learning. We introduce a method for creating high-qu… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 12 pages, 5 figures

  13. arXiv:2407.09468  [pdf, other

    cs.LG

    Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

    Authors: Sophia Sanborn, Johan Mathe, Mathilde Papillon, Domas Buracas, Hansen J Lillemark, Christian Shewmake, Abby Bertics, Xavier Pennec, Nina Miolane

    Abstract: The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-tim… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  14. arXiv:2407.08877  [pdf, other

    q-bio.NC cs.HC

    Analyzing Speech Motor Movement using Surface Electromyography in Minimally Verbal Adults with Autism Spectrum Disorder

    Authors: Wazeer Zulfikar, Nishat Protyasha, Camila Canales, Heli Patel, James Williamson, Laura Sarnie, Lisa Nowinski, Nataliya Kosmyna, Paige Townsend, Sophia Yuditskaya, Tanya Talkar, Utkarsh Oggy Sarawgi, Christopher McDougle, Thomas Quatieri, Pattie Maes, Maria Mody

    Abstract: Adults who are minimally verbal with autism spectrum disorder (mvASD) have pronounced speech difficulties linked to impaired motor skills. Existing research and clinical assessments primarily use indirect methods such as standardized tests, video-based facial features, and handwriting tasks, which may not directly target speech-related motor skills. In this study, we measure activity from eight fa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  15. arXiv:2407.07655  [pdf, other

    cs.LG

    The Selective G-Bispectrum and its Inversion: Applications to G-Invariant Networks

    Authors: Simon Mataigne, Johan Mathe, Sophia Sanborn, Christopher Hillar, Nina Miolane

    Abstract: An important problem in signal processing and deep learning is to achieve \textit{invariance} to nuisance factors not relevant for the task. Since many of these factors are describable as the action of a group $G$ (e.g. rotations, translations, scalings), we want methods to be $G$-invariant. The $G$-Bispectrum extracts every characteristic of a given signal up to group action: for example, the sha… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 9 pages

    MSC Class: 68T01; 68T07; 68R01; 20K01

  16. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  17. arXiv:2407.04667  [pdf, other

    stat.ME cs.LG

    The diameter of a stochastic matrix: A new measure for sensitivity analysis in Bayesian networks

    Authors: Manuele Leonelli, Jim Q. Smith, Sophia K. Wright

    Abstract: Bayesian networks are one of the most widely used classes of probabilistic models for risk management and decision support because of their interpretability and flexibility in including heterogeneous pieces of information. In any applied modelling, it is critical to assess how robust the inferences on certain target variables are to changes in the model. In Bayesian networks, these analyses fall u… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  18. arXiv:2407.04352  [pdf, other

    cs.HC cs.LG

    UpStory: the Uppsala Storytelling dataset

    Authors: Marc Fraile, Natalia Calvo-Barajas, Anastasia Sophia Apeiron, Giovanna Varni, Joakim Lindblad, Nataša Sladoje, Ginevra Castellano

    Abstract: Friendship and rapport play an important role in the formation of constructive social interactions, and have been widely studied in educational settings due to their impact on student outcomes. Given the growing interest in automating the analysis of such phenomena through Machine Learning (ML), access to annotated interaction datasets is highly valuable. However, no dataset on dyadic child-child… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  19. arXiv:2406.16192  [pdf, other

    cs.CV

    HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

    Authors: Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro-Pérez, Sophia J. Wagner, Anurag J. Vaidya, Richard J. Chen, Drew F. K. Williamson, Ahrong Kim, Faisal Mahmood

    Abstract: Spatial transcriptomics (ST) enables interrogating the molecular composition of tissue with ever-increasing resolution, depth, and sensitivity. However, costs, rapidly evolving technology, and lack of standards have constrained computational methods in ST to narrow tasks and small cohorts. In addition, the underlying tissue morphology as reflected by H&E-stained whole slide images (WSIs) encodes r… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Under review

  20. arXiv:2406.15647  [pdf, other

    cs.SD cs.LG eess.AS

    Generating Music with Structure Using Self-Similarity as Attention

    Authors: Sophia Hager, Kathleen Hablutzel, Katherine M. Kinnaird

    Abstract: Despite the innovations in deep learning and generative AI, creating long term structure as well as the layers of repeated structure common in musical works remains an open challenge in music generation. We propose an attention layer that uses a novel approach applying user-supplied self-similarity matrices to previous time steps, and demonstrate it in our Similarity Incentivized Neural Generator… ▽ More

    Submitted 25 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  21. arXiv:2406.14949  [pdf, other

    cs.AI

    CEASEFIRE: An AI-powered system for combatting illicit firearms trafficking

    Authors: Ioannis Mademlis, Jorgen Cani, Marina Mancuso, Caterina Paternoster, Emmanouil Adamakis, George Margetis, Sylvie Chambon, Alain Crouzil, Loubna Lechelek, Georgia Dede, Spyridon Evangelatos, George Lalas, Franck Mignet, Pantelis Linardatos, Konstantinos Kentrotis, Henryk Gierszal, Piotr Tyczka, Sophia Karagiorgou, George Pantelis, Georgios Stavropoulos, Konstantinos Votis, Georgios Th. Papadopoulos

    Abstract: Modern technologies have led illicit firearms trafficking to partially merge with cybercrime, while simultaneously permitting its off-line aspects to become more sophisticated. Law enforcement officers face difficult challenges that require hi-tech solutions. This article presents a real-world system, powered by advanced Artificial Intelligence, for facilitating them in their everyday work.

    Submitted 21 June, 2024; originally announced June 2024.

  22. arXiv:2406.11328  [pdf, other

    cs.CL

    Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams

    Authors: Zheheng Luo, Chenhan Yuan, Qianqian Xie, Sophia Ananiadou

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated their potential in delivering accurate answers to questions about world knowledge. Despite this, existing benchmarks for evaluating LLMs in healthcare predominantly focus on medical doctors, leaving other critical healthcare professions underrepresented. To fill this research gap, we introduce the Examinations for Medical Person… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 15 pages, 4 figures

  23. arXiv:2406.11186  [pdf, other

    cs.CY cs.HC

    An Initial Study Review of Designing a Technology Solution for Women in Technologically Deprived Areas or Low Resource Constraint Communities

    Authors: Jones Yeboah, Sophia Bampoh, Annu Sible Prabhakar

    Abstract: In the West African country of Ghana, depression is a significant issue affecting a large number of women. Despite its importance, the issue received insufficient attention during the COVID-19 pandemic. In developed countries, mobile phones serve as a convenient medium for accessing health information and providers. However, in Ghana, women's access to mobile phones is limited by cultural, social,… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 15 pages, 1 figure

  24. arXiv:2406.11093  [pdf, other

    cs.CL

    RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information

    Authors: Zhiwei Liu, Kailai Yang, Qianqian Xie, Christine de Kock, Sophia Ananiadou, Eduard Hovy

    Abstract: Misinformation is prevalent in various fields such as education, politics, health, etc., causing significant harm to society. However, current methods for cross-domain misinformation detection rely on time and resources consuming fine-tuning and complex model structures. With the outstanding performance of LLMs, many studies have employed them for misinformation detection. Unfortunately, they focu… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  25. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  26. arXiv:2406.08216  [pdf, ps, other

    cs.SE

    A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks

    Authors: Sinclair Hudson, Sophia Jit, Boyue Caroline Hu, Marsha Chechik

    Abstract: Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems. To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing. Software Engineering (SE) research on testing Machine Learning (ML) components and ML-based systems has systematically explored many topic… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  27. Mind Mansion: Exploring Metaphorical Interactions to Engage with Negative Thoughts in Virtual Reality

    Authors: Julian Rasch, Michelle Johanna Zender, Sophia Sakel, Nadine Wagener

    Abstract: Recurrent negative thoughts can significantly disrupt daily life and contribute to negative emotional states. Facing, confronting, and noticing such thoughts without support can be challenging. To provide a playful setting and leverage the technical maturation of Virtual Reality (VR), our VR experience, Mind Mansion, places the user in an initially cluttered virtual apartment. Here we utilize esta… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: To appear in Proceedings of the Designing Interactive Systems Conference (DIS '24), July 1-5, 2024, IT University of Copenhagen, Denmark

  28. arXiv:2406.04287  [pdf, other

    cs.CV cs.RO

    SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera

    Authors: Jackson Arnold, Sophia Rossi, Chloe Petrosino, Ethan Mitchell, Sanjeev J. Koppal

    Abstract: Hyperspectral image segmentation is crucial for many fields such as agriculture, remote sensing, biomedical imaging, battlefield sensing and astronomy. However, the challenge of hyper and multi spectral imaging is its large data footprint. We propose both a novel camera design and a vision transformer-based (ViT) algorithm that alleviate both the captured data footprint and the computational load… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  29. arXiv:2406.01528  [pdf, other

    cs.LG

    Physics-Informed Neural Networks for Dynamic Process Operations with Limited Physical Knowledge and Data

    Authors: Mehmet Velioglu, Song Zhai, Sophia Rupprecht, Alexander Mitsos, Andreas Jupke, Manuel Dahmen

    Abstract: In chemical engineering, process data are expensive to acquire, and complex phenomena are difficult to fully model. We explore the use of physics-informed neural networks (PINNs) for dynamic processes with incomplete mechanistic semi-explicit differential-algebraic equation systems and scarce process data. In particular, we focus on estimating states for which neither direct observational data nor… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: manuscript (32 pages, 9 figures, 11 tables), supporting materials (14 pages, 4 figures, 5 tables)

  30. arXiv:2405.20195  [pdf, other

    cs.HC

    Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

    Authors: Zilin Ma, Susannah, Su, Nathan Zhao, Linn Bieske, Blake Bullwinkel, Yanyi Zhang, Sophia, Yang, Ziqing Luo, Siyao Li, Gekai Liao, Boxiang Wang, Jinglun Gao, Zihan Wen, Claude Bruderlein, Weiwei Pan

    Abstract: Humanitarian negotiations in conflict zones, called \emph{frontline negotiation}, are often highly adversarial, complex, and high-risk. Several best-practices have emerged over the years that help negotiators extract insights from large datasets to navigate nuanced and rapidly evolving scenarios. Recent advances in large language models (LLMs) have sparked interest in the potential for AI to aid d… ▽ More

    Submitted 30 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  31. arXiv:2405.18536  [pdf, other

    cs.LG

    Data-Driven Simulator for Mechanical Circulatory Support with Domain Adversarial Neural Process

    Authors: Sophia Sun, Wenyuan Chen, Zihao Zhou, Sonia Fereidooni, Elise Jortberg, Rose Yu

    Abstract: Mechanical Circulatory Support (MCS) devices, implemented as a probabilistic deep sequence model. Existing mechanical simulators for MCS rely on oversimplifying assumptions and are insensitive to patient-specific behavior, limiting their applicability to real-world treatment scenarios. To address these shortcomings, our model Domain Adversarial Neural Process (DANP) employs a neural process archit… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  32. arXiv:2405.13949  [pdf, other

    cs.CV

    PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery

    Authors: Runlong He, Mengya Xu, Adrito Das, Danyal Z. Khan, Sophia Bano, Hani J. Marcus, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam

    Abstract: Visual Question Answering (VQA) within the surgical domain, utilizing Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the i… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures

  33. arXiv:2405.07111  [pdf, other

    cs.CL

    Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

    Authors: Boyd Branch, Piotr Mirowski, Kory Mathewson, Sophia Ppali, Alexandra Covaci

    Abstract: Social robotics researchers are increasingly interested in multi-party trained conversational agents. With a growing demand for real-world evaluations, our study presents Large Language Models (LLMs) deployed in a month-long live show at the Edinburgh Festival Fringe. This case study investigates human improvisers co-creating with conversational agents in a professional theatre setting. We explore… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 figures, accepted for publication at the International Conference on Computational Creativity 2024

  34. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  35. arXiv:2405.00146  [pdf, other

    quant-ph cs.ET

    Averting multi-qubit burst errors in surface code magic state factories

    Authors: Jason D. Chadwick, Christopher Kang, Joshua Viszlai, Sophia Fuhui Lin, Frederic T. Chong

    Abstract: Fault-tolerant quantum computation relies on the assumption of time-invariant, sufficiently low physical error rates. However, current superconducting quantum computers suffer from frequent disruptive noise events, including cosmic ray impacts and shifting two-level system defects. Several methods have been proposed to mitigate these issues in software, but they add large overheads in terms of phy… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 13 pages, 12 figures

  36. arXiv:2404.19264  [pdf, other

    cs.RO

    DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

    Authors: Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

    Abstract: This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged rob… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  37. arXiv:2404.18796  [pdf, other

    cs.CL cs.AI

    Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

    Authors: Pat Verga, Sebastian Hofstatter, Sophia Althammer, Yixuan Su, Aleksandra Piktus, Arkady Arkhangorodsky, Minjie Xu, Naomi White, Patrick Lewis

    Abstract: As Large Language Models (LLMs) have become more advanced, they have outpaced our abilities to accurately evaluate their quality. Not only is finding data to adequately probe particular model properties difficult, but evaluating the correctness of a model's freeform generation alone is a challenge. To address this, many evaluations now rely on using LLMs themselves as judges to score the quality o… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  38. arXiv:2404.15236  [pdf, other

    cs.SE

    Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models

    Authors: Aidan Z. H. Yang, Sophia Kolak, Vincent J. Hellendoorn, Ruben Martins, Claire Le Goues

    Abstract: Language models have improved by orders of magnitude with the recent emergence of Transformer-based Large Language Models (LLMs). LLMs have demonstrated their ability to generate natural code that is highly similar to code written by professional developers. One intermediate value an LLM can emit is entropy, which measures the naturalness of a token of code. We hypothesize that entropy can be used… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  39. arXiv:2404.14789  [pdf, other

    cs.MA cs.LO

    Opinion Update in a Subjective Logic Model for Social Networks

    Authors: Mário S. Alvim, Sophia Knight, José C. Oliveira

    Abstract: Subjective Logic (SL) is a logic incorporating uncertainty and opinions for agents in dynamic systems. In this work, we investigate the use of subjective logic to model opinions and belief change in social networks. In particular, we work toward the development of a subjective logic belief/opinion update function appropriate for modeling belief change as communication occurs in social networks. We… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  40. arXiv:2404.14040  [pdf, other

    cs.CV

    Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery

    Authors: Yuyang Sheng, Sophia Bano, Matthew J. Clarkson, Mobarakol Islam

    Abstract: Purpose: The recent Segment Anything Model (SAM) has demonstrated impressive performance with point, text or bounding box prompts, in various applications. However, in safety-critical surgical tasks, prompting is not possible due to (i) the lack of per-frame prompts for supervised learning, (ii) it is unrealistic to prompt frame-by-frame in a real-time tracking application, and (iii) it is expensi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 8 pages, 2 figures

  41. arXiv:2404.14027  [pdf, other

    cs.CV cs.LG

    OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

    Authors: Sophia Sirko-Galouchenko, Alexandre Boulch, Spyros Gidaris, Andrei Bursuc, Antonin Vobecky, Patrick Pérez, Renaud Marlet

    Abstract: We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to th… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024, Workshop on Autonomous Driving

  42. arXiv:2404.09220  [pdf, other

    cs.CL

    Compass: Large Multilingual Language Model for South-east Asia

    Authors: Sophia Maria

    Abstract: Large language models have exhibited significant proficiency in languages endowed with extensive linguistic resources, such as English and Chinese. Nevertheless, their effectiveness notably diminishes when applied to languages characterized by limited linguistic resources, particularly within the Southeast Asian linguistic landscape, such as Indonesian. The scarcity of linguistic resources for the… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  43. arXiv:2404.06309  [pdf, other

    cs.CV

    Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models

    Authors: David Kurzendörfer, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

    Abstract: Audio-visual zero-shot learning methods commonly build on features extracted from pre-trained models, e.g. video or audio classification models. However, existing benchmarks predate the popularization of large multi-modal models, such as CLIP and CLAP. In this work, we explore such large pre-trained models to obtain features, i.e. CLIP for visual features, and CLAP for audio features. Furthermore,… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPRw 2024 (L3D-IVU)

  44. arXiv:2404.06128  [pdf, other

    cs.CV

    Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction

    Authors: Sierra Bonilla, Shuai Zhang, Dimitrios Psychogyios, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano

    Abstract: Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-ca… ▽ More

    Submitted 16 August, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures

  45. arXiv:2404.05022  [pdf, other

    cs.CV cs.LG

    DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology

    Authors: Valentin Koch, Sophia J. Wagner, Salome Kazeminia, Ece Sancar, Matthias Hehr, Julia Schnabel, Tingying Peng, Carsten Marr

    Abstract: In hematology, computational models offer significant potential to improve diagnostic accuracy, streamline workflows, and reduce the tedious work of analyzing single cells in peripheral blood or bone marrow smears. However, clinical adoption of computational models has been hampered by the lack of generalization due to large batch effects, small dataset sizes, and poor performance in transfer lear… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  46. arXiv:2403.17141  [pdf, other

    cs.CL cs.AI

    MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

    Authors: Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Tianlin Zhang, Sophia Ananiadou

    Abstract: Recent advancements in large language models (LLMs) aim to tackle heterogeneous human expectations and values via multi-objective preference alignment. However, existing methods are parameter-adherent to the policy model, leading to two key limitations: (1) the high-cost repetition of their alignment algorithms for each new target model; (2) they cannot expand to unseen objectives due to their sta… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Work in progress

  47. arXiv:2403.16760  [pdf

    cs.HC cs.AI cs.SD eess.AS

    As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

    Authors: Di Cooke, Abigail Edwards, Sophia Barkoff, Kathryn Kelly

    Abstract: As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However,… ▽ More

    Submitted 4 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: For study pre-registration, see https://osf.io/fnhr3

    MSC Class: 68T01 ACM Class: I.2

  48. arXiv:2403.15243  [pdf, other

    q-fin.CP cs.LG q-fin.MF q-fin.PM

    Robust Utility Optimization via a GAN Approach

    Authors: Florian Krach, Josef Teichmann, Hanna Wutte

    Abstract: Robust utility optimization enables an investor to deal with market uncertainty in a structured way, with the goal of maximizing the worst-case outcome. In this work, we propose a generative adversarial network (GAN) approach to (approximately) solve robust utility optimization problems in general and realistic settings. In particular, we model both the investor and the market by neural networks (… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    MSC Class: 91-08; 68T07; 91G10; 91G60

  49. arXiv:2403.13313  [pdf, other

    cs.AI cs.CL

    Polaris: A Safety-focused LLM Constellation Architecture for Healthcare

    Authors: Subhabrata Mukherjee, Paul Gamble, Markel Sanz Ausin, Neel Kant, Kriti Aggarwal, Neha Manjunath, Debajyoti Datta, Zhengliang Liu, Jiayuan Ding, Sophia Busacca, Cezanne Bianco, Swapnil Sharma, Rae Lasko, Michelle Voisard, Sanchay Harneja, Darya Filippova, Gerry Meixiong, Kevin Cha, Amir Youssefi, Meyhaa Buvanesh, Howard Weingram, Sebastian Bierman-Lytle, Harpreet Singh Mangat, Kim Parikh, Saad Godil , et al. (1 additional authors not shown)

    Abstract: We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful pr… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  50. arXiv:2403.11743  [pdf, other

    cs.LG stat.ML

    PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks

    Authors: Philip Matthias Winter, Maria Wimmer, David Major, Dimitrios Lenis, Astrid Berg, Theresa Neubauer, Gaia Romana De Paolis, Johannes Novotny, Sophia Ulonska, Katja Bühler

    Abstract: This work addresses flexibility in deep learning by means of transductive reasoning. For adaptation to new data and tasks, e.g., in continual learning, existing methods typically involve tuning learnable parameters or complete re-training from scratch, rendering such approaches unflexible in practice. We argue that the notion of separating computation from memory by the means of transduction can a… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: preprint, 25 pages, 7 figures