Search | arXiv e-print repository

The Power of External Memory in Increasing Predictive Model Capacity

Authors: Cenk Baykal, Dylan J Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang

Abstract: One way of introducing sparsity into deep networks is by attaching an external table of parameters that is sparsely looked up at different layers of the network. By storing the bulk of the parameters in the external table, one can increase the capacity of the model without necessarily increasing the inference time. Two crucial questions in this setting are then: what is the lookup function for acc… ▽ More One way of introducing sparsity into deep networks is by attaching an external table of parameters that is sparsely looked up at different layers of the network. By storing the bulk of the parameters in the external table, one can increase the capacity of the model without necessarily increasing the inference time. Two crucial questions in this setting are then: what is the lookup function for accessing the table and how are the contents of the table consumed? Prominent methods for accessing the table include 1) using words/wordpieces token-ids as table indices, 2) LSH hashing the token vector in each layer into a table of buckets, and 3) learnable softmax style routing to a table entry. The ways to consume the contents include adding/concatenating to input representation, and using the contents as expert networks that specialize to different inputs. In this work, we conduct rigorous experimental evaluations of existing ideas and their combinations. We also introduce a new method, alternating updates, that enables access to an increased token dimension without increasing the computation time, and demonstrate its effectiveness in language modeling. △ Less

Submitted 30 January, 2023; originally announced February 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2301.13310

arXiv:2301.13310 [pdf, other]

Alternating Updates for Efficient Transformers

Authors: Cenk Baykal, Dylan Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang

Abstract: It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. We introduce Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of t… ▽ More It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. We introduce Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation, i.e., the token embedding, while only incurring a negligible increase in latency. AltUp achieves this by working on a subblock of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks. We present extensions of AltUp, such as its applicability to the sequence dimension, and demonstrate how AltUp can be synergistically combined with existing approaches, such as Sparse Mixture-of-Experts models, to obtain efficient models with even higher capacity. Our experiments on benchmark transformer models and language tasks demonstrate the consistent effectiveness of AltUp on a diverse set of scenarios. Notably, on SuperGLUE and SQuAD benchmarks, AltUp enables up to $87\%$ speedup relative to the dense baselines at the same accuracy. △ Less

Submitted 3 October, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2204.10836 [pdf, other]

doi 10.1038/s41467-022-33407-5

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing. △ Less

Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS

arXiv:2111.00917 [pdf, other]

doi 10.1002/jrs.6316

Adaptive Modeling Powers Fast Multi-parameter Fitting of CARS Spectra

Authors: Gregory J. Hunt, Cody R. Ground, Andrew D. Cutler

Abstract: Coherent anti-Stokes Raman Spectroscopy (CARS) is a laser-based measurement technique widely applied across many science and engineering disciplines to perform non-intrusive gas diagnostics. CARS is often used to study combustion, where the measured spectra can be used to simultaneously recover multiple flow parameters from the reacting gas such as temperature and relative species mole fractions.… ▽ More Coherent anti-Stokes Raman Spectroscopy (CARS) is a laser-based measurement technique widely applied across many science and engineering disciplines to perform non-intrusive gas diagnostics. CARS is often used to study combustion, where the measured spectra can be used to simultaneously recover multiple flow parameters from the reacting gas such as temperature and relative species mole fractions. This is typically done by using numerical optimization to find the flow parameters for which a theoretical model of the CARS spectra best matches the actual measurements. The most commonly used theoretical model is the CARSFT spectrum calculator. Unfortunately, this CARSFT spectrum generator is computationally expensive and using it to recover multiple flow parameters can be prohibitively time-consuming, especially when experiments have hundreds or thousands of measurements distributed over time or space. To overcome these issues, several methods have been developed to approximate CARSFT using a library of pre-computed theoretical spectra. In this work we present a new approach that leverages ideas from the machine learning literature to build an adaptively smoothed kernel-based approximator. In application on a simulated dual-pump CARS experiment probing a $H_2/$air flame, we show that the approach can use a small number library spectra to quickly and accurately recover temperature and four gas species' mole fractions. The method's flexibility allows fine-tuned navigation of the trade-off between speed and accuracy, and makes the approach suitable for a wide range of problems and flow regimes. △ Less

Submitted 26 October, 2021; originally announced November 2021.

Comments: 14 pages, 6 figures

arXiv:2008.05873 [pdf]

doi 10.1007/s12667-021-00446-8

Computational Framework for Behind-The-Meter DER Techno-Economic Modeling and Optimization -- REopt Lite

Authors: Sakshi Mishra, Josiah Pohl, Nick Laws, Dylan Cutler, Ted Kwasnik, William Becker, Alex Zolan, Kate Anderson, Dan Olis, Emma Elgqvist

Abstract: The global energy system is undergoing a major transformation. Renewable energy generation is growing and is projected to accelerate further with the global emphasis on decarbonization. Furthermore, distributed generation is projected to play a significant role in the new energy system, and energy models are playing a key role in understanding how distributed generation can be integrated reliably… ▽ More The global energy system is undergoing a major transformation. Renewable energy generation is growing and is projected to accelerate further with the global emphasis on decarbonization. Furthermore, distributed generation is projected to play a significant role in the new energy system, and energy models are playing a key role in understanding how distributed generation can be integrated reliably and economically. The deployment of massive amounts of distributed generation requires understanding the interface of technology, economics, and policy in the energy modeling process. In this work, we present an end-to-end computational framework for distributed energy resource (DER) modeling, REopt Lite which addresses this need effectively. We describe the problem space, the building blocks of the model, the scaling capabilities of the design, the optimization formulation, and the accessibility of the model. We present a framework for accelerating the techno-economic analysis of behind-the-meter distributed energy resources to enable rapid planning and decision-making, thereby significantly boosting the rate the renewable energy deployment. Lastly, but equally importantly, this computation framework is open-sourced to facilitate transparency, flexibility, and wider collaboration opportunities within the worldwide energy modeling community. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: 18 pages, 6 figures, under journal review

arXiv:2003.07690 [pdf]

doi 10.1016/j.autcon.2020.103411

A Unified Architecture for Data-Driven Metadata Tagging of Building Automation Systems

Authors: Sakshi Mishra, Andrew Glaws, Dylan Cutler, Stephen Frank, Muhammad Azam, Farzam Mohammadi, Jean-Simon Venne

Abstract: This article presents a Unified Architecture for automated point tagging of Building Automation System data, based on a combination of data-driven approaches. Advanced energy analytics applications-including fault detection and diagnostics and supervisory control-have emerged as a significant opportunity for improving the performance of our built environment. Effective application of these analyti… ▽ More This article presents a Unified Architecture for automated point tagging of Building Automation System data, based on a combination of data-driven approaches. Advanced energy analytics applications-including fault detection and diagnostics and supervisory control-have emerged as a significant opportunity for improving the performance of our built environment. Effective application of these analytics depends on harnessing structured data from the various building control and monitoring systems, but typical Building Automation System implementations do not employ any standardized metadata schema. While standards such as Project Haystack and Brick Schema have been developed to address this issue, the process of structuring the data, i.e., tagging the points to apply a standard metadata schema, has, to date, been a manual process. This process is typically costly, labor-intensive, and error-prone. In this work we address this gap by proposing a UA that automates the process of point tagging by leveraging the data accessible through connection to the BAS, including time series data and the raw point names. The UA intertwines supervised classification and unsupervised clustering techniques from machine learning and leverages both their deterministic and probabilistic outputs to inform the point tagging process. Furthermore, we extend the UA to embed additional input and output data-processing modules that are designed to address the challenges associated with the real-time deployment of this automation solution. We test the UA on two datasets for real-life buildings: 1. commercial retail buildings and 2. office buildings from the National Renewable Energy Laboratory campus. The proposed methodology correctly applied 85-90 percent and 70-75 percent of the tags in each of these test scenarios, respectively. △ Less

Submitted 11 September, 2020; v1 submitted 26 February, 2020; originally announced March 2020.

Comments: 19 pages, 9 figures, accepted for publication in Automation in Construction

arXiv:1712.00644 [pdf]

Short-term Mortality Prediction for Elderly Patients Using Medicare Claims Data

Authors: Maggie Makar, Marzyeh Ghassemi, David Cutler, Ziad Obermeyer

Abstract: Risk prediction is central to both clinical medicine and public health. While many machine learning models have been developed to predict mortality, they are rarely applied in the clinical literature, where classification tasks typically rely on logistic regression. One reason for this is that existing machine learning models often seek to optimize predictions by incorporating features that are no… ▽ More Risk prediction is central to both clinical medicine and public health. While many machine learning models have been developed to predict mortality, they are rarely applied in the clinical literature, where classification tasks typically rely on logistic regression. One reason for this is that existing machine learning models often seek to optimize predictions by incorporating features that are not present in the databases readily available to providers and policy makers, limiting generalizability and implementation. Here we tested a number of machine learning classifiers for prediction of six-month mortality in a population of elderly Medicare beneficiaries, using an administrative claims database of the kind available to the majority of health care payers and providers. We show that machine learning classifiers substantially outperform current widely-used methods of risk prediction but only when used with an improved feature set incorporating insights from clinical medicine, developed for this study. Our work has applications to supporting patient and provider decision making at the end of life, as well as population health-oriented efforts to identify patients at high risk of poor outcomes. △ Less

Submitted 2 December, 2017; originally announced December 2017.

Showing 1–7 of 7 results for author: Cutler, D