Search | arXiv e-print repository

The infrastructure powering IBM's Gen AI model development

Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

arXiv:2403.02078 [pdf, other]

Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5

Authors: Qiao Wang, Ralph Rose, Naho Orita, Ayaka Sugawara

Abstract: A common way of assessing language learners' mastery of vocabulary is via multiple-choice cloze (i.e., fill-in-the-blank) questions. But the creation of test items can be laborious for individual teachers or in large-scale language programs. In this paper, we evaluate a new method for automatically generating these types of questions using large language models (LLM). The VocaTT (vocabulary teachi… ▽ More A common way of assessing language learners' mastery of vocabulary is via multiple-choice cloze (i.e., fill-in-the-blank) questions. But the creation of test items can be laborious for individual teachers or in large-scale language programs. In this paper, we evaluate a new method for automatically generating these types of questions using large language models (LLM). The VocaTT (vocabulary teaching and training) engine is written in Python and comprises three basic steps: pre-processing target word lists, generating sentences and candidate word options using GPT, and finally selecting suitable word options. To test the efficiency of this system, 60 questions were generated targeting academic words. The generated items were reviewed by expert reviewers who judged the well-formedness of the sentences and word options, adding comments to items judged not well-formed. Results showed a 75% rate of well-formedness for sentences and 66.85% rate for suitable word options. This is a marked improvement over the generator used earlier in our research which did not take advantage of GPT's capabilities. Post-hoc qualitative analysis reveals several points for improvement in future work including cross-referencing part-of-speech tagging, better sentence validation, and improving GPT prompts. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Journal ref: Mika Hämäläinen, Emily Öhman, Flammie Pirinen, Khalid Alnajjar, So Miyagawa, Yuri Bizzoni, Niko Partanen, and Jack Rueter. 2023. Proc. of the Joint 3rd International Conference on NLP4DH and 8th IWCLUL. ACL, Tokyo, Japan, edition

arXiv:2308.02265 [pdf]

doi 10.1016/j.sse.2023.108701

Experimental analysis of variability in WS$_2$-based devices for hardware security

Authors: M. Vatalaro, H. Neill, F. Gity, P. Magnone, V. Maccaronio, C. Márquez, J. C. Galdon, F. Gamiz, F. Crupi, P. Hurley, R. De Rose

Abstract: This work investigates the variability of tungsten disulfide (WS$_2$)-based devices by experimental characterization in view of possible application in the field of hardware security. To this aim, a preliminary analysis was performed by measurements across voltages and temperatures on a set of seven Si/SiO$_2$/WS$_2$ back-gated devices, also considering the effect of different stabilization condit… ▽ More This work investigates the variability of tungsten disulfide (WS$_2$)-based devices by experimental characterization in view of possible application in the field of hardware security. To this aim, a preliminary analysis was performed by measurements across voltages and temperatures on a set of seven Si/SiO$_2$/WS$_2$ back-gated devices, also considering the effect of different stabilization conditions on their conductivity. Obtained results show appreciable variability in the conductivity, while also revealing similar dependence on bias and temperature across tested devices. Overall, our analysis demonstrates that WS$_2$-based devices can be potentially exploited to ensure adequate randomness and robustness against environmental variations and then used as building blocks for hardware security primitives. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Journal ref: Solid-State Electronics 2023

arXiv:2306.16398 [pdf, other]

Cascaded encoders for fine-tuning ASR models on overlapped speech

Authors: Richard Rose, Oscar Chang, Olivier Siohan

Abstract: Multi-talker speech recognition (MT-ASR) has been shown to improve ASR performance on speech containing overlapping utterances from more than one speaker. Multi-talker models have typically been trained from scratch using simulated or actual overlapping speech datasets. On the other hand, the trend in ASR has been to train foundation models using massive datasets collected from a wide variety of t… ▽ More Multi-talker speech recognition (MT-ASR) has been shown to improve ASR performance on speech containing overlapping utterances from more than one speaker. Multi-talker models have typically been trained from scratch using simulated or actual overlapping speech datasets. On the other hand, the trend in ASR has been to train foundation models using massive datasets collected from a wide variety of task domains. Given the scale of these models and their ability to generalize well across a variety of domains, it makes sense to consider scenarios where a foundation model is augmented with multi-talker capability. This paper presents an MT-ASR model formed by combining a well-trained foundation model with a multi-talker mask model in a cascaded RNN-T encoder configuration. Experimental results show that the cascade configuration provides improved WER on overlapping speech utterances with respect to a baseline multi-talker model without sacrificing performance achievable by the foundation model on non-overlapping utterances. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2211.07357 [pdf, other]

Controlling Commercial Cooling Systems Using Reinforcement Learning

Authors: Jerry Luo, Cosmin Paduraru, Octavian Voicu, Yuri Chervonyi, Scott Munns, Jerry Li, Crystal Qian, Praneet Dutta, Jared Quincy Davis, Ningjia Wu, Xingwei Yang, Chu-Ming Chang, Ted Li, Rob Rose, Mingyan Fan, Hootan Nakhost, Tinglin Liu, Brian Kirkman, Frank Altamura, Lee Cline, Patrick Tonker, Joel Gouker, Dave Uden, Warren Buddy Bryan, Jason Law , et al. (11 additional authors not shown)

Abstract: This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments ha… ▽ More This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites. △ Less

Submitted 14 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 27 pages, 11 figures

arXiv:2205.09388 [pdf]

doi 10.1016/j.sse.2022.108390

Smart Material Implication Using Spin-Transfer Torque Magnetic Tunnel Junctions for Logic-in-Memory Computing

Authors: Raffaele De Rose, Tommaso Zanotti, Francesco Maria Puglisi, Felice Crupi, Paolo Pavan, Marco Lanuzza

Abstract: Smart material implication (SIMPLY) logic has been recently proposed for the design of energy-efficient Logic-in-Memory (LIM) architectures based on non-volatile resistive memory devices. The SIMPLY logic is enabled by adding a comparator to the conventional IMPLY scheme. This allows performing a preliminary READ operation and hence the SET operation only in the case it is actually required. This… ▽ More Smart material implication (SIMPLY) logic has been recently proposed for the design of energy-efficient Logic-in-Memory (LIM) architectures based on non-volatile resistive memory devices. The SIMPLY logic is enabled by adding a comparator to the conventional IMPLY scheme. This allows performing a preliminary READ operation and hence the SET operation only in the case it is actually required. This work explores the SIMPLY logic scheme using nanoscale spin-transfer torque magnetic tunnel junction (STT-MTJ) devices. The performance of the STT-MTJ based SIMPLY architecture is analyzed by varying the load resistor and applied voltages to implement both READ and SET operations, while also investigating the effect of temperature on circuit operation. Obtained results show an existing tradeoff between error rate and energy consumption, which can be effectively managed by properly setting the values of load resistor and applied voltages. In addition, our analysis proves that tracking the temperature dependence of the MTJ properties through a proportional to absolute temperature (PTAT) reference voltage at the input of the comparator is beneficial to mitigate the reliability degradation under temperature variations. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Journal ref: Solid-State Electronics 2022

arXiv:2204.09395 [pdf, other]

doi 10.1016/j.sse.2022.108315

Adjusting Thermal Stability in Double-Barrier MTJ for Energy Improvement in Cryogenic STT-MRAMs

Authors: Esteban Garzón, Raffaele De Rose, Felice Crupi, Lionel Trojman, Adam Teman, Marco Lanuzza

Abstract: This paper investigates the impact of thermal stability relaxation in double-barrier magnetic tunnel junctions (DMTJs) for energy-efficient spin-transfer torque magnetic random access memories (STT-MRAMs) operating at the liquid nitrogen boiling point (77K). Our study is carried out through a macrospin-based Verilog-A compact model of DMTJ, along with a 65nm commercial process design kit (PDK) cal… ▽ More This paper investigates the impact of thermal stability relaxation in double-barrier magnetic tunnel junctions (DMTJs) for energy-efficient spin-transfer torque magnetic random access memories (STT-MRAMs) operating at the liquid nitrogen boiling point (77K). Our study is carried out through a macrospin-based Verilog-A compact model of DMTJ, along with a 65nm commercial process design kit (PDK) calibrated down to 77K under silicon measurements. Comprehensive bitcell-level electrical characterization is used to estimate the energy/latency per operation and leakage power at the memory architecture-level. As a main result of our analysis, we show that energy-efficient small-to-large embedded memories can be obtained by significantly relaxing the non-volatility requirement of DMTJ devices at room temperature (i.e., by reducing the cross-section area), while maintaining the typical 10-years retention time at cryogenic temperatures. This makes DMTJ-based STT-MRAM operating at 77K more energy-efficient than six-transistors static random-access memory (6T-SRAM) under both read and write accesses (-56% and -37% on average, respectively). Obtained results thus prove that DMTJ-based STT-MRAM with relaxed retention time is a promising alternative for the realization of reliable and energy-efficient embedded memories operating at cryogenic temperatures. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Journal ref: Solid-State Electronics, 2022

arXiv:2204.00652 [pdf, other]

End-to-end multi-talker audio-visual ASR using an active speaker attention module

Authors: Richard Rose, Olivier Siohan

Abstract: This paper presents a new approach for end-to-end audio-visual multi-talker speech recognition. The approach, referred to here as the visual context attention model (VCAM), is important because it uses the available video information to assign decoded text to one of multiple visible faces. This essentially resolves the label ambiguity issue associated with most multi-talker modeling approaches whi… ▽ More This paper presents a new approach for end-to-end audio-visual multi-talker speech recognition. The approach, referred to here as the visual context attention model (VCAM), is important because it uses the available video information to assign decoded text to one of multiple visible faces. This essentially resolves the label ambiguity issue associated with most multi-talker modeling approaches which can decode multiple label strings but cannot assign the label strings to the correct speakers. This is implemented as a transformer-transducer based end-to-end model and evaluated using a two speaker audio-visual overlapping speech dataset created from YouTube videos. It is shown in the paper that the VCAM model improves performance with respect to previously reported audio-only and audio-visual multi-talker ASR systems. △ Less

Submitted 1 April, 2022; originally announced April 2022.

Comments: 5 pages, 3 figures, 3 tables, 28 citations

arXiv:2001.05863 [pdf, other]

Establishing Human-Robot Trust through Music-Driven Robotic Emotion Prosody and Gesture

Authors: Richard Savery, Ryan Rose, Gil Weinberg

Abstract: As human-robot collaboration opportunities continue to expand, trust becomes ever more important for full engagement and utilization of robots. Affective trust, built on emotional relationship and interpersonal bonds is particularly critical as it is more resilient to mistakes and increases the willingness to collaborate. In this paper we present a novel model built on music-driven emotional proso… ▽ More As human-robot collaboration opportunities continue to expand, trust becomes ever more important for full engagement and utilization of robots. Affective trust, built on emotional relationship and interpersonal bonds is particularly critical as it is more resilient to mistakes and increases the willingness to collaborate. In this paper we present a novel model built on music-driven emotional prosody and gestures that encourages the perception of a robotic identity, designed to avoid uncanny valley. Symbolic musical phrases were generated and tagged with emotional information by human musicians. These phrases controlled a synthesis engine playing back pre-rendered audio samples generated through interpolation of phonemes and electronic instruments. Gestures were also driven by the symbolic phrases, encoding the emotion from the musical phrase to low degree-of-freedom movements. Through a user study we showed that our system was able to accurately portray a range of emotions to the user. We also showed with a significant result that our non-linguistic audio generation achieved an 8% higher mean of average trust than using a state-of-the-art text-to-speech system. △ Less

Submitted 11 January, 2020; originally announced January 2020.

Journal ref: The 28th IEEE International Conference on Robot & Human Interactive Communication 2019

arXiv:1808.09016 [pdf, other]

Review Helpfulness Assessment based on Convolutional Neural Network

Authors: Xianshan Qu, Xiaopeng Li, John R. Rose

Abstract: In this paper we describe the implementation of a convolutional neural network (CNN) used to assess online review helpfulness. To our knowledge, this is the first use of this architecture to address this problem. We explore the impact of two related factors impacting CNN performance: different word embedding initializations and different input review lengths. We also propose an approach to combini… ▽ More In this paper we describe the implementation of a convolutional neural network (CNN) used to assess online review helpfulness. To our knowledge, this is the first use of this architecture to address this problem. We explore the impact of two related factors impacting CNN performance: different word embedding initializations and different input review lengths. We also propose an approach to combining rating star information with review text to further improve prediction accuracy. We demonstrate that this can improve the overall accuracy by 2%. Finally, we evaluate the method on a benchmark dataset and show an improvement in accuracy relative to published results for traditional methods of 2.5% for a model trained using only review text and 4.24% for a model trained on a combination of rating star information and review text. △ Less

Submitted 27 August, 2018; originally announced August 2018.

arXiv:1702.01090 [pdf, other]

doi 10.1371/journal.pone.0184188

Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library

Authors: Jaimie Murdock, Colin Allen, Katy Börner, Robert Light, Simon McAlister, Andrew Ravenscroft, Robert Rose, Doori Rose, Jun Otsuka, David Bourget, John Lawrence, Chris Reed

Abstract: We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other researc… ▽ More We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of "drill-down" topic modeling---simultaneously reducing both the size of the corpus and the unit of analysis---we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of "close reading" needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted. △ Less

Submitted 7 June, 2017; v1 submitted 3 February, 2017; originally announced February 2017.

Comments: revised, 29 pages, 3 figures

arXiv:1608.00027 [pdf, ps, other]

gLOP: the global and Local Penalty for Capturing Predictive Heterogeneity

Authors: Rhiannon V. Rose, Daniel J. Lizotte

Abstract: When faced with a supervised learning problem, we hope to have rich enough data to build a model that predicts future instances well. However, in practice, problems can exhibit predictive heterogeneity: most instances might be relatively easy to predict, while others might be predictive outliers for which a model trained on the entire dataset does not perform well. Identifying these can help focus… ▽ More When faced with a supervised learning problem, we hope to have rich enough data to build a model that predicts future instances well. However, in practice, problems can exhibit predictive heterogeneity: most instances might be relatively easy to predict, while others might be predictive outliers for which a model trained on the entire dataset does not perform well. Identifying these can help focus future data collection. We present gLOP, the global and Local Penalty, a framework for capturing predictive heterogeneity and identifying predictive outliers. gLOP is based on penalized regression for multitask learning, which improves learning by leveraging training signal information from related tasks. We give two optimization algorithms for gLOP, one space-efficient, and another giving the full regularization path. We also characterize uniqueness in terms of the data and tuning parameters, and present empirical results on synthetic data and on two health research problems. △ Less

Submitted 29 July, 2016; originally announced August 2016.

Comments: Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA

arXiv:1606.05925 [pdf, other]

Graph based manifold regularized deep neural networks for automatic speech recognition

Authors: Vikrant Singh Tomar, Richard C. Rose

Abstract: Deep neural networks (DNNs) have been successfully applied to a wide variety of acoustic modeling tasks in recent years. These include the applications of DNNs either in a discriminative feature extraction or in a hybrid acoustic modeling scenario. Despite the rapid progress in this area, a number of challenges remain in training DNNs. This paper presents an effective way of training DNNs using a… ▽ More Deep neural networks (DNNs) have been successfully applied to a wide variety of acoustic modeling tasks in recent years. These include the applications of DNNs either in a discriminative feature extraction or in a hybrid acoustic modeling scenario. Despite the rapid progress in this area, a number of challenges remain in training DNNs. This paper presents an effective way of training DNNs using a manifold learning based regularization framework. In this framework, the parameters of the network are optimized to preserve underlying manifold based relationships between speech feature vectors while minimizing a measure of loss between network outputs and targets. This is achieved by incorporating manifold based locality constraints in the objective criterion of DNNs. Empirical evidence is provided to demonstrate that training a network with manifold constraints preserves structural compactness in the hidden layers of the network. Manifold regularization is applied to train bottleneck DNNs for feature extraction in hidden Markov model (HMM) based speech recognition. The experiments in this work are conducted on the Aurora-2 spoken digits and the Aurora-4 read news large vocabulary continuous speech recognition tasks. The performance is measured in terms of word error rate (WER) on these tasks. It is shown that the manifold regularized DNNs result in up to 37% reduction in WER relative to standard DNNs. △ Less

Submitted 19 June, 2016; originally announced June 2016.

Comments: 12 pages including citations, 2 figures

Showing 1–13 of 13 results for author: Rose, R