-
LLMs in Biomedicine: A study on clinical Named Entity Recognition
Authors:
Masoud Monajatipoor,
Jiaxin Yang,
Joel Stremmel,
Melika Emami,
Fazlolah Mohaghegh,
Mozhdeh Rouhsedaghat,
Kai-Wei Chang
Abstract:
Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the bi…
▽ More
Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the biomedical. Strategic selection of in-context examples yields a marked improvement, offering ~15-20\% increase in F1 score across all benchmark datasets for biomedical few-shot NER. Additionally, our results indicate that integrating external biomedical knowledge via prompting strategies can enhance the proficiency of general-purpose LLMs to meet the specialized needs of biomedical NER. Leveraging a medical knowledge base, our proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), can boost the zero-shot F1 score of LLMs for biomedical NER. Code is released at \url{https://github.com/masoud-monajati/LLM_Bio_NER}
△ Less
Submitted 11 July, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Parendi: Thousand-Way Parallel RTL Simulation
Authors:
Mahyar Emami,
Thomas Bourgeat,
James Larus
Abstract:
Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators could run on thousand…
▽ More
Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators could run on thousands of parallel cores. However, existing simulators can only exploit tens of cores.
This paper studies the challenges inherent in running parallel RTL simulation on a multi-thousand-core machine (the Graphcore IPU, a 1472-core machine). Simulation performance requires balancing three factors: synchronization, communication, and computation. We experimentally evaluate each metric and analyze how it affects parallel simulation speed, drawing on contrasts between the large-scale IPU and smaller but faster x86 systems.
Using this analysis, we build Parendi, an RTL simulator for the IPU. It distributes RTL simulation across 5888 cores on 4 IPU sockets. Parendi runs large RTL designs up to 4x faster than a powerful, state-of-the-art x86 multicore system.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Estimation of embedding vectors in high dimensions
Authors:
Golara Ahmadi Azar,
Melika Emami,
Alyson Fletcher,
Sundeep Rangan
Abstract:
Embeddings are a basic initial feature extraction step in many machine learning models, particularly in natural language processing. An embedding attempts to map data tokens to a low-dimensional space where similar tokens are mapped to vectors that are close to one another by some metric in the embedding space. A basic question is how well can such embedding be learned? To study this problem, we c…
▽ More
Embeddings are a basic initial feature extraction step in many machine learning models, particularly in natural language processing. An embedding attempts to map data tokens to a low-dimensional space where similar tokens are mapped to vectors that are close to one another by some metric in the embedding space. A basic question is how well can such embedding be learned? To study this problem, we consider a simple probability model for discrete data where there is some "true" but unknown embedding where the correlation of random variables is related to the similarity of the embeddings. Under this model, it is shown that the embeddings can be learned by a variant of low-rank approximate message passing (AMP) method. The AMP approach enables precise predictions of the accuracy of the estimation in certain high-dimensional limits. In particular, the methodology provides insight on the relations of key parameters such as the number of samples per value, the frequency of the terms, and the strength of the embedding correlation on the probability distribution. Our theoretical findings are validated by simulations on both synthetic data and real text data.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
A Deep Learning Sequential Decoder for Transient High-Density Electromyography in Hand Gesture Recognition Using Subject-Embedded Transfer Learning
Authors:
Golara Ahmadi Azar,
Qin Hu,
Melika Emami,
Alyson Fletcher,
Sundeep Rangan,
S. Farokh Atashzar
Abstract:
Hand gesture recognition (HGR) has gained significant attention due to the increasing use of AI-powered human-computer interfaces that can interpret the deep spatiotemporal dynamics of biosignals from the peripheral nervous system, such as surface electromyography (sEMG). These interfaces have a range of applications, including the control of extended reality, agile prosthetics, and exoskeletons.…
▽ More
Hand gesture recognition (HGR) has gained significant attention due to the increasing use of AI-powered human-computer interfaces that can interpret the deep spatiotemporal dynamics of biosignals from the peripheral nervous system, such as surface electromyography (sEMG). These interfaces have a range of applications, including the control of extended reality, agile prosthetics, and exoskeletons. However, the natural variability of sEMG among individuals has led researchers to focus on subject-specific solutions. Deep learning methods, which often have complex structures, are particularly data-hungry and can be time-consuming to train, making them less practical for subject-specific applications. In this paper, we propose and develop a generalizable, sequential decoder of transient high-density sEMG (HD-sEMG) that achieves 73% average accuracy on 65 gestures for partially-observed subjects through subject-embedded transfer learning, leveraging pre-knowledge of HGR acquired during pre-training. The use of transient HD-sEMG before gesture stabilization allows us to predict gestures with the ultimate goal of counterbalancing system control delays. The results show that the proposed generalized models significantly outperform subject-specific approaches, especially when the training data is limited, and there is a significant number of gesture classes. By building on pre-knowledge and incorporating a multiplicative subject-embedded structure, our method comparatively achieves more than 13% average accuracy across partially observed subjects with minimal data availability. This work highlights the potential of HD-sEMG and demonstrates the benefits of modeling common patterns across users to reduce the need for large amounts of data for new users, enhancing practicality.
△ Less
Submitted 23 September, 2023;
originally announced October 2023.
-
Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism
Authors:
Mahyar Emami,
Sahand Kashani,
Keisuke Kamahori,
Mohammad Sepehr Pourghannad,
Ritik Raj,
James R. Larus
Abstract:
The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1--1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved s…
▽ More
The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1--1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved simulators can increase designers' productivity by speeding design iterations and permitting more exhaustive exploration. One possibility is to exploit low-level parallelism, as RTL expresses considerable fine-grain concurrency. Unfortunately, state-of-the-art RTL simulators often perform best on a single core since modern processors cannot effectively exploit fine-grain parallelism. This work presents Manticore: a parallel computer designed to accelerate RTL simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution model to eliminate fine-grain synchronization overhead. It relies entirely on a compiler to schedule resources and communication, which is feasible since RTL code contains few divergent execution paths. With static scheduling, communication and synchronization no longer incur runtime overhead, making fine-grain parallelism practical. Moreover, static scheduling dramatically simplifies processor implementation, significantly increasing the number of cores that fit on a chip. Our 225-core FPGA implementation running at 475 MHz outperforms a state-of-the-art RTL simulator running on desktop and server computers in 8 out of 9 benchmarks.
△ Less
Submitted 20 October, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions
Authors:
Mojtaba Sahraee-Ardakan,
Melikasadat Emami,
Parthe Pandit,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of sampl…
▽ More
Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e. proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a kernel model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e. linear models achieve the minimum risk among all models, linear or nonlinear. These results suggest that more complex models for the data other than independent features are needed for high-dimensional analysis.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Augmented Contrastive Self-Supervised Learning for Audio Invariant Representations
Authors:
Melikasadat Emami,
Dung Tran,
Kazuhito Koishida
Abstract:
Improving generalization is a major challenge in audio classification due to labeled data scarcity. Self-supervised learning (SSL) methods tackle this by leveraging unlabeled data to learn useful features for downstream classification tasks. In this work, we propose an augmented contrastive SSL framework to learn invariant representations from unlabeled data. Our method applies various perturbatio…
▽ More
Improving generalization is a major challenge in audio classification due to labeled data scarcity. Self-supervised learning (SSL) methods tackle this by leveraging unlabeled data to learn useful features for downstream classification tasks. In this work, we propose an augmented contrastive SSL framework to learn invariant representations from unlabeled data. Our method applies various perturbations to the unlabeled input data and utilizes contrastive learning to learn representations robust to such perturbations. Experimental results on the Audioset and DESED datasets show that our framework significantly outperforms state-of-the-art SSL and supervised learning methods on sound/event classification tasks.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)
Authors:
Endri Bezati,
Mahyar Emami,
Jörn Janneck,
James Larus
Abstract:
To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be difficult to predict in advance and require experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new p…
▽ More
To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be difficult to predict in advance and require experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can retard the study of different configurations. A single-language system with an appropriate programming model and compiler that targets both platforms simplifies this exploration to a simple recompile with new compiler directives.
This work introduces StreamBlocks, an open-source compiler and runtime that uses the CAL dataflow programming language to partition computations across heterogeneous (CPU/accelerator) platforms. Because of the dataflow model's semantics and the CAL language, StreamBlocks can exploit both thread parallelism in multi-core CPUs and the inherent parallelism of FPGAs. StreamBlocks supports exploring the design space with a profile-guided tool that helps identify the best hardware-software partitions.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Implicit Bias of Linear RNNs
Authors:
Melikasadat Emami,
Mojtaba Sahraee-Ardakan,
Parthe Pandit,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, precise reasoning for this behavior is still unknown. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditional…
▽ More
Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, precise reasoning for this behavior is still unknown. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution and hence, shorter memory. The degree of this bias depends on the variance of the transition kernel matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated in both synthetic and real data experiments.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Low-Rank Nonlinear Decoding of $μ$-ECoG from the Primary Auditory Cortex
Authors:
Melikasadat Emami,
Mojtaba Sahraee-Ardakan,
Parthe Pandit,
Alyson K. Fletcher,
Sundeep Rangan,
Michael Trumpis,
Brinnae Bent,
Chia-Han Chiang,
Jonathan Viventi
Abstract:
This paper considers the problem of neural decoding from parallel neural measurements systems such as micro-electrocorticography ($μ$-ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this high-dimensional data can be challenging, particularly when the number of training samples i…
▽ More
This paper considers the problem of neural decoding from parallel neural measurements systems such as micro-electrocorticography ($μ$-ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this high-dimensional data can be challenging, particularly when the number of training samples is limited. To address this challenge, this work presents a novel neural network decoder with a low-rank structure in the first hidden layer. The low-rank constraints dramatically reduce the number of parameters in the decoder while still enabling a rich class of nonlinear decoder maps. The low-rank decoder is illustrated on $μ$-ECoG data from the primary auditory cortex (A1) of awake rats. This decoding problem is particularly challenging due to the complexity of neural responses in the auditory cortex and the presence of confounding signals in awake animals. It is shown that the proposed low-rank decoder significantly outperforms models using standard dimensionality reduction techniques such as principal component analysis (PCA).
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
Generalization Error of Generalized Linear Models in High Dimensions
Authors:
Melikasadat Emami,
Mojtaba Sahraee-Ardakan,
Parthe Pandit,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
At the heart of machine learning lies the question of generalizability of learned rules over previously unseen data. While over-parameterized models based on neural networks are now ubiquitous in machine learning applications, our understanding of their generalization capabilities is incomplete. This task is made harder by the non-convexity of the underlying learning problems. We provide a general…
▽ More
At the heart of machine learning lies the question of generalizability of learned rules over previously unseen data. While over-parameterized models based on neural networks are now ubiquitous in machine learning applications, our understanding of their generalization capabilities is incomplete. This task is made harder by the non-convexity of the underlying learning problems. We provide a general framework to characterize the asymptotic generalization error for single-layer neural networks (i.e., generalized linear models) with arbitrary non-linearities, making it applicable to regression as well as classification problems. This framework enables analyzing the effect of (i) over-parameterization and non-linearity during modeling; and (ii) choices of loss function, initialization, and regularizer during learning. Our model also captures mismatch between training and test distributions. As examples, we analyze a few special cases, namely linear regression and logistic regression. We are also able to rigorously and analytically explain the \emph{double descent} phenomenon in generalized linear models.
△ Less
Submitted 30 April, 2020;
originally announced May 2020.
-
Input-Output Equivalence of Unitary and Contractive RNNs
Authors:
M. Emami,
M. Sahraee-Ardakan,
S. Rangan,
A. K. Fletcher
Abstract:
Unitary recurrent neural networks (URNNs) have been proposed as a method to overcome the vanishing and exploding gradient problem in modeling data with long-term dependencies. A basic question is how restrictive is the unitary constraint on the possible input-output mappings of such a network? This work shows that for any contractive RNN with ReLU activations, there is a URNN with at most twice th…
▽ More
Unitary recurrent neural networks (URNNs) have been proposed as a method to overcome the vanishing and exploding gradient problem in modeling data with long-term dependencies. A basic question is how restrictive is the unitary constraint on the possible input-output mappings of such a network? This work shows that for any contractive RNN with ReLU activations, there is a URNN with at most twice the number of hidden states and the identical input-output mapping. Hence, with ReLU activations, URNNs are as expressive as general RNNs. In contrast, for certain smooth activations, it is shown that the input-output mapping of an RNN cannot be matched with a URNN, even with an arbitrary number of states. The theoretical results are supported by experiments on modeling of slowly-varying dynamical systems.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Energy-Aware Scheduling using Dynamic Voltage-Frequency Scaling
Authors:
Masnida Emami,
Yashar Ghiasi,
Nasrin Jaberi
Abstract:
The energy consumption issue in distributed computing systems has become quite critical due to environmental concerns. In response to this, many energy-aware scheduling algorithms have been developed primarily by using the dynamic voltage-frequency scaling (DVFS) capability incorporated in recent commodity processors. The majority of these algorithms involve two passes: schedule generation and sla…
▽ More
The energy consumption issue in distributed computing systems has become quite critical due to environmental concerns. In response to this, many energy-aware scheduling algorithms have been developed primarily by using the dynamic voltage-frequency scaling (DVFS) capability incorporated in recent commodity processors. The majority of these algorithms involve two passes: schedule generation and slack reclamation. The latter is typically achieved by lowering processor frequency for tasks with slacks. In this article, we study the latest papers in this area and develop them. This study has been evaluated based on results obtained from experiments with 1,500 randomly generated task graphs.
△ Less
Submitted 9 June, 2012;
originally announced June 2012.
-
Distributed computing of Seismic Imaging Algorithms
Authors:
Masnida Emami,
Ali Setayesh,
Nasrin Jaberi
Abstract:
The primary use of technical computing in the oil and gas industries is for seismic imaging of the earth's subsurface, driven by the business need for making well-informed drilling decisions during petroleum exploration and production. Since each oil/gas well in exploration areas costs several tens of millions of dollars, producing high-quality seismic images in a reasonable time can significantly…
▽ More
The primary use of technical computing in the oil and gas industries is for seismic imaging of the earth's subsurface, driven by the business need for making well-informed drilling decisions during petroleum exploration and production. Since each oil/gas well in exploration areas costs several tens of millions of dollars, producing high-quality seismic images in a reasonable time can significantly reduce the risk of drilling a "dry hole". Similarly, these images are important as they can improve the position of wells in a billion-dollar producing oil field. However seismic imaging is very data- and compute-intensive which needs to process terabytes of data and require Gflop-years of computation (using "flop" to mean floating point operation per second). Due to the data/computing intensive nature of seismic imaging, parallel computing are used to process data to reduce the time compilation.
With introducing of Cloud computing, MapReduce programming model has been attracted a lot of attention in parallel and distributed systems [1, 2] to execute massive processing algorithms such as Bioinformatics[3], Astronomy[4], Geology[5] and so on. In this report, we will investigate and discuss current approaches to fit seismic algorithms to MapReduce programming model.
△ Less
Submitted 5 April, 2012;
originally announced April 2012.