Skip to main content

Showing 1–18 of 18 results for author: Harris, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.12283  [pdf, ps, other

    cs.CL

    Enhancing Embedding Performance through Large Language Model-based Text Enrichment and Rewriting

    Authors: Nicholas Harris, Anand Butani, Syed Hashmy

    Abstract: Embedding models are crucial for various natural language processing tasks but can be limited by factors such as limited vocabulary, lack of context, and grammatical errors. This paper proposes a novel approach to improve embedding performance by leveraging large language models (LLMs) to enrich and rewrite input text before the embedding process. By utilizing ChatGPT 3.5 to provide additional con… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  2. arXiv:2404.03044  [pdf

    cs.LG cs.AI

    The Artificial Intelligence Ontology: LLM-assisted construction of AI concept hierarchies

    Authors: Marcin P. Joachimiak, Mark A. Miller, J. Harry Caufield, Ryan Ly, Nomi L. Harris, Andrew Tritt, Christopher J. Mungall, Kristofer E. Bouchard

    Abstract: The Artificial Intelligence Ontology (AIO) is a systematization of artificial intelligence (AI) concepts, methodologies, and their interrelations. Developed via manual curation, with the additional assistance of large language models (LLMs), AIO aims to address the rapidly evolving landscape of AI by providing a comprehensive framework that encompasses both technical and ethical aspects of AI tech… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  3. arXiv:2312.16529  [pdf, other

    cs.LG math.CT

    Using Enriched Category Theory to Construct the Nearest Neighbour Classification Algorithm

    Authors: Matthew Pugh, Jo Grundy, Corina Cirstea, Nick Harris

    Abstract: Exploring whether Enriched Category Theory could provide the foundation of an alternative approach to Machine Learning. This paper is the first to construct and motivate a Machine Learning algorithm solely with Enriched Category Theory. In order to supplement evidence that Category Theory can be used to motivate robust and explainable algorithms, it is shown that a series of reasonable assumptions… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  4. arXiv:2312.10904  [pdf

    cs.AI

    Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI)

    Authors: Sabrina Toro, Anna V Anagnostopoulos, Sue Bello, Kai Blumberg, Rhiannon Cameron, Leigh Carmody, Alexander D Diehl, Damion Dooley, William Duncan, Petra Fey, Pascale Gaudet, Nomi L Harris, Marcin Joachimiak, Leila Kiani, Tiago Lubiana, Monica C Munoz-Torres, Shawn O'Neil, David Osumi-Sutherland, Aleix Puig, Justin P Reese, Leonore Reiser, Sofia Robb, Troy Ruemping, James Seager, Eric Sid , et al. (5 additional authors not shown)

    Abstract: Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dyna… ▽ More

    Submitted 12 June, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

  5. arXiv:2310.03666  [pdf

    cs.CL cs.AI

    MapperGPT: Large Language Models for Linking and Mapping Entities

    Authors: Nicolas Matentzoglu, J. Harry Caufield, Harshad B. Hegde, Justin T. Reese, Sierra Moxon, Hyeongsik Kim, Nomi L. Harris, Melissa A Haendel, Christopher J. Mungall

    Abstract: Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is the process of determining correspondences between entities across these resources, such as gene identifiers, disease concepts, or chemical entity identifiers. Ma… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  6. arXiv:2305.13338  [pdf

    q-bio.GN cs.AI cs.CL q-bio.QM

    Gene Set Summarization using Large Language Models

    Authors: Marcin P. Joachimiak, J. Harry Caufield, Nomi L. Harris, Hyeongsik Kim, Christopher J. Mungall

    Abstract: Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpretin… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

  7. arXiv:2304.02711  [pdf, other

    cs.AI cs.LG

    Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning

    Authors: J. Harry Caufield, Harshad Hegde, Vincent Emonet, Nomi L. Harris, Marcin P. Joachimiak, Nicolas Matentzoglu, HyeongSik Kim, Sierra A. T. Moxon, Justin T. Reese, Melissa A. Haendel, Peter N. Robinson, Christopher J. Mungall

    Abstract: Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (S… ▽ More

    Submitted 22 December, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Comments: Updated 2023-12-22

  8. arXiv:2302.10800  [pdf

    q-bio.QM cs.AI cs.LG

    KG-Hub -- Building and Exchanging Biological Knowledge Graphs

    Authors: J Harry Caufield, Tim Putman, Kevin Schaper, Deepak R Unni, Harshad Hegde, Tiffany J Callahan, Luca Cappelletti, Sierra AT Moxon, Vida Ravanmehr, Seth Carbon, Lauren E Chan, Katherina Cortes, Kent A Shefchek, Glass Elsarboukh, James P Balhoff, Tommaso Fontana, Nicolas Matentzoglu, Richard M Bruskiewich, Anne E Thessen, Nomi L Harris, Monica C Munoz-Torres, Melissa A Haendel, Peter N Robinson, Marcin P Joachimiak, Christopher J Mungall , et al. (1 additional authors not shown)

    Abstract: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simp… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  9. arXiv:2212.03250  [pdf, other

    cs.CV

    Neural Cell Video Synthesis via Optical-Flow Diffusion

    Authors: Manuel Serna-Aguilera, Khoa Luu, Nathaniel Harris, Min Zou

    Abstract: The biomedical imaging world is notorious for working with small amounts of data, frustrating state-of-the-art efforts in the computer vision and deep learning worlds. With large datasets, it is easier to make progress we have seen from the natural image distribution. It is the same with microscopy videos of neuron cells moving in a culture. This problem presents several challenges as it can be di… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 9 pages, 2 tables, 7 figures

  10. arXiv:2208.01623  [pdf, other

    cs.ET physics.optics

    Single chip photonic deep neural network with accelerated training

    Authors: Saumil Bandyopadhyay, Alexander Sludds, Stefan Krastanov, Ryan Hamerly, Nicholas Harris, Darius Bunandar, Matthew Streshinsky, Michael Hochberg, Dirk Englund

    Abstract: As deep neural networks (DNNs) revolutionize machine learning, energy consumption and throughput are emerging as fundamental limitations of CMOS electronics. This has motivated a search for new hardware architectures optimized for artificial intelligence, such as electronic systolic arrays, memristor crossbar arrays, and optical accelerators. Optical systems can perform linear matrix operations at… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 21 pages, 10 figures. Comments welcome

  11. arXiv:2207.02941  [pdf, other

    cs.LG cs.AI

    Boosting the interpretability of clinical risk scores with intervention predictions

    Authors: Eric Loreaux, Ke Yu, Jonas Kemp, Martin Seneviratne, Christina Chen, Subhrajit Roy, Ivan Protsyuk, Natalie Harris, Alexander D'Amour, Steve Yadlowsky, Ming-Jun Chen

    Abstract: Machine learning systems show significant promise for forecasting patient adverse events via risk scores. However, these risk scores implicitly encode assumptions about future interventions that the patient is likely to receive, based on the intervention policy present in the training data. Without this important context, predictions from such systems are less interpretable for clinicians. We prop… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted by DSHealth on KDD 2022

  12. Ontology Development Kit: a toolkit for building, maintaining, and standardising biomedical ontologies

    Authors: Nicolas Matentzoglu, Damien Goutte-Gattat, Shawn Zheng Kai Tan, James P. Balhoff, Seth Carbon, Anita R. Caron, William D. Duncan, Joe E. Flack, Melissa Haendel, Nomi L. Harris, William R Hogan, Charles Tapley Hoyt, Rebecca C. Jackson, HyeongSik Kim, Huseyin Kir, Martin Larralde, Julie A. McMurry, James A. Overton, Bjoern Peters, Clare Pilgrim, Ray Stefancsik, Sofia MC Robb, Sabrina Toro, Nicole A Vasilevsky, Ramona Walls , et al. (2 additional authors not shown)

    Abstract: Similar to managing software packages, managing the ontology life cycle involves multiple complex workflows such as preparing releases, continuous quality control checking, and dependency management. To manage these processes, a diverse set of tools is required, from command line utilities to powerful ontology engineering environments such as ROBOT. Particularly in the biomedical domain, which has… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: 19 pages, 2 supplementary tables, 1 supplementary figure

  13. arXiv:2205.06287  [pdf, other

    cs.LG cs.AR

    Adaptive Block Floating-Point for Analog Deep Learning Hardware

    Authors: Ayon Basumallik, Darius Bunandar, Nicholas Dronen, Nicholas Harris, Ludmila Levkova, Calvin McCarter, Lakshmi Nair, David Walter, David Widemann

    Abstract: Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an accuracy penalty because of precision loss. To mitigate this penalty, we present a novel AMS-compatible adaptive block floating-point (ABFP) number representation. We… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 13 pages including Appendix, 7 figures, under submission at IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

  14. arXiv:2204.03969  [pdf, other

    cs.LG

    Disability prediction in multiple sclerosis using performance outcome measures and demographic data

    Authors: Subhrajit Roy, Diana Mincu, Lev Proleev, Negar Rostamzadeh, Chintan Ghate, Natalie Harris, Christina Chen, Jessica Schrouff, Nenad Tomasev, Fletcher Lee Hartsell, Katherine Heller

    Abstract: Literature on machine learning for multiple sclerosis has primarily focused on the use of neuroimaging data such as magnetic resonance imaging and clinical laboratory tests for disease identification. However, studies have shown that these modalities are not consistent with disease activity such as symptoms or disease progression. Furthermore, the cost of collecting data from these modalities is h… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

  15. Biolink Model: A Universal Schema for Knowledge Graphs in Clinical, Biomedical, and Translational Science

    Authors: Deepak R. Unni, Sierra A. T. Moxon, Michael Bada, Matthew Brush, Richard Bruskiewich, Paul Clemons, Vlado Dancik, Michel Dumontier, Karamarie Fecho, Gustavo Glusman, Jennifer J. Hadlock, Nomi L. Harris, Arpita Joshi, Tim Putman, Guangrong Qin, Stephen A. Ramsey, Kent A. Shefchek, Harold Solbrig, Karthik Soman, Anne T. Thessen, Melissa A. Haendel, Chris Bizon, Christopher J. Mungall, the Biomedical Data Translator Consortium

    Abstract: Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness between core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  16. arXiv:2202.01034  [pdf, other

    cs.LG cs.CY stat.ML

    Diagnosing failures of fairness transfer across distribution shift in real-world medical settings

    Authors: Jessica Schrouff, Natalie Harris, Oluwasanmi Koyejo, Ibrahim Alabdulmohsin, Eva Schnider, Krista Opsahl-Ong, Alex Brown, Subhrajit Roy, Diana Mincu, Christina Chen, Awa Dieng, Yuan Liu, Vivek Natarajan, Alan Karthikesalingam, Katherine Heller, Silvia Chiappa, Alexander D'Amour

    Abstract: Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is enco… ▽ More

    Submitted 10 February, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

    Journal ref: Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

  17. A Simple Standard for Sharing Ontological Mappings (SSSOM)

    Authors: Nicolas Matentzoglu, James P. Balhoff, Susan M. Bello, Chris Bizon, Matthew Brush, Tiffany J. Callahan, Christopher G Chute, William D. Duncan, Chris T. Evelo, Davera Gabriel, John Graybeal, Alasdair Gray, Benjamin M. Gyori, Melissa Haendel, Henriette Harmse, Nomi L. Harris, Ian Harrow, Harshad Hegde, Amelia L. Hoyt, Charles T. Hoyt, Dazhi Jiao, Ernesto Jiménez-Ruiz, Simon Jupp, Hyeongsik Kim, Sebastian Koehler , et al. (19 additional authors not shown)

    Abstract: Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, ar… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: Corresponding author: Christopher J. Mungall <[email protected]>

  18. arXiv:2109.01126  [pdf, other

    cs.AR cs.ET

    An Electro-Photonic System for Accelerating Deep Neural Networks

    Authors: Cansu Demirkiran, Furkan Eris, Gongyu Wang, Jonathan Elmhurst, Nick Moore, Nicholas C. Harris, Ayon Basumallik, Vijay Janapa Reddi, Ajay Joshi, Darius Bunandar

    Abstract: The number of parameters in deep neural networks (DNNs) is scaling at about 5$\times$ the rate of Moore's Law. To sustain this growth, photonic computing is a promising avenue, as it enables higher throughput in dominant general matrix-matrix multiplication (GEMM) operations in DNNs than their electrical counterpart. However, purely photonic systems face several challenges including lack of photon… ▽ More

    Submitted 16 December, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Journal ref: J. Emerg. Technol. Comput. Syst. 19, 4, Article 30 (October 2023)