Search | arXiv e-print repository

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

Authors: Owen Dugan, Donato Manuel Jimenez Beneto, Charlotte Loh, Zhuo Chen, Rumen Dangovski, Marin Soljačić

Abstract: Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. To achieve accurate calculations, language model systems often enable LLMs to generate code for arithmetic operations. However, this approach compromises speed and security and, if finetuning is involved, risks the language mo… ▽ More Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. To achieve accurate calculations, language model systems often enable LLMs to generate code for arithmetic operations. However, this approach compromises speed and security and, if finetuning is involved, risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in \textit{a single autoregressive step}, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of an LLM to control a symbolic architecture which performs arithmetic. Our implementation using Llama 3 8B Instruct with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+,-,\times,÷,\sin{},\cos{},\log{},\exp{},\sqrt{}$), outperforming GPT 4o and on par with GPT 4o using a code interpreter. OccamLlama also outperforms GPT 4o both with and without a code interpreter on mathematical problem solving benchmarks involving challenging arithmetic, thus enabling small LLMs to match the arithmetic performance of even much larger models. We will make our code public shortly. △ Less

Submitted 29 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00132 [pdf, other]

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

Authors: Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljačić

Abstract: We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for com… ▽ More We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for complicated downstream tasks. Our approach is theoretically supported by the universality theorem and the rank representation theorem to achieve efficient high-rank adaptations. Experiments demonstrate that QuanTA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods. Furthermore, QuanTA shows superior performance with fewer trainable parameters compared to other approaches and can be designed to integrate with existing fine-tuning algorithms for further improvement, providing a scalable and efficient solution for fine-tuning large language models and advancing state-of-the-art in natural language processing. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2312.00111 [pdf, other]

Multimodal Learning for Materials

Authors: Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Samuel Kim, Peter Y. Lu, Thomas Christensen, Marin Soljačić

Abstract: Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning effo… ▽ More Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning efforts in materials science focus primarily on single-modality tasks, i.e., relationships between materials and a single physical property, thus not taking advantage of the rich and multimodal set of material properties. Here, we introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes: (i) MultiMat achieves state-of-the-art performance for challenging material property prediction tasks; (ii) MultiMat enables novel and accurate material discovery via latent space similarity, enabling screening for stable materials with desired properties; and (iii) MultiMat encodes interpretable emergent features that may provide novel scientific insights. △ Less

Submitted 12 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: 11 pages, 4 figures

arXiv:2311.17066 [pdf]

Cluster trajectory of SOFA score in predicting mortality in sepsis

Authors: Yuhe Ke, Matilda Swee Sun Tang, Celestine Jia Ling Loh, Hairil Rizal Abdullah, Nicholas Brian Shannon

Abstract: Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.… ▽ More Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes. Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time warping and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python. Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken. Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward. Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 26 pages, 4 figures, 2 tables

arXiv:2304.00601 [pdf, other]

Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

Authors: Ligong Han, Seungwook Han, Shivchander Sudalairaj, Charlotte Loh, Rumen Dangovski, Fei Deng, Pulkit Agrawal, Dimitris Metaxas, Leonid Karlinsky, Tsui-Wei Weng, Akash Srivastava

Abstract: Transformations based on domain expertise (expert transformations), such as random-resized-crop and color-jitter, have proven critical to the success of contrastive learning techniques such as SimCLR. Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned. However for imagery data, so far none of these view-ge… ▽ More Transformations based on domain expertise (expert transformations), such as random-resized-crop and color-jitter, have proven critical to the success of contrastive learning techniques such as SimCLR. Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned. However for imagery data, so far none of these view-generation methods has been able to outperform expert transformations. In this work, we tackle a different question: instead of replacing expert transformations with generated views, can we constructively assimilate generated views with expert transformations? We answer this question in the affirmative and propose a view generation method and a simple, effective assimilation method that together improve the state-of-the-art by up to ~3.6% on three different datasets. Importantly, we conduct a detailed empirical study that systematically analyzes a range of view generation and assimilation methods and provides a holistic picture of the efficacy of learned views in contrastive representation learning. △ Less

Submitted 8 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: Accepted at Generative Models for Computer Vision Workshop 2023

arXiv:2303.02484 [pdf, other]

Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries

Authors: Charlotte Loh, Seungwook Han, Shivchander Sudalairaj, Rumen Dangovski, Kai Xu, Florian Wenzel, Marin Soljacic, Akash Srivastava

Abstract: Deep ensembles (DE) have been successful in improving model performance by learning diverse members via the stochasticity of random initialization. While recent works have attempted to promote further diversity in DE via hyperparameters or regularizing loss functions, these methods primarily still rely on a stochastic approach to explore the hypothesis space. In this work, we present Multi-Symmetr… ▽ More Deep ensembles (DE) have been successful in improving model performance by learning diverse members via the stochasticity of random initialization. While recent works have attempted to promote further diversity in DE via hyperparameters or regularizing loss functions, these methods primarily still rely on a stochastic approach to explore the hypothesis space. In this work, we present Multi-Symmetry Ensembles (MSE), a framework for constructing diverse ensembles by capturing the multiplicity of hypotheses along symmetry axes, which explore the hypothesis space beyond stochastic perturbations of model weights and hyperparameters. We leverage recent advances in contrastive representation learning to create models that separately capture opposing hypotheses of invariant and equivariant functional classes and present a simple ensembling approach to efficiently combine appropriate hypotheses for a given task. We show that MSE effectively captures the multiplicity of conflicting hypotheses that is often required in large, diverse datasets like ImageNet. As a result of their inherent diversity, MSE improves classification performance, uncertainty quantification, and generalization across a series of transfer tasks. △ Less

Submitted 19 June, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: Camera Ready Revision. ICML 2023

arXiv:2301.11756 [pdf, ps, other]

A comment on the structure of graded modules over graded principal ideal domains in the context of persistent homology

Authors: Clara Loeh

Abstract: The literature in persistent homology often refers to a "structure theorem for finitely generated graded modules over a graded principal ideal domain". We clarify the nature of this structure theorem in this context. The literature in persistent homology often refers to a "structure theorem for finitely generated graded modules over a graded principal ideal domain". We clarify the nature of this structure theorem in this context. △ Less

Submitted 5 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: 10 pages; v2: small improvements to exposition

arXiv:2210.04783 [pdf, other]

On the Importance of Calibration in Semi-supervised Learning

Authors: Charlotte Loh, Rumen Dangovski, Shivchander Sudalairaj, Seungwook Han, Ligong Han, Leonid Karlinsky, Marin Soljacic, Akash Srivastava

Abstract: State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data by combining techniques of consistency regularization and pseudo-labeling. During pseudo-labeling, the model's predictions on unlabeled data are used for training and thus, model calibration is important in mitigating confirmation bias. Yet, many SOTA methods… ▽ More State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data by combining techniques of consistency regularization and pseudo-labeling. During pseudo-labeling, the model's predictions on unlabeled data are used for training and thus, model calibration is important in mitigating confirmation bias. Yet, many SOTA methods are optimized for model performance, with little focus directed to improve model calibration. In this work, we empirically demonstrate that model calibration is strongly correlated with model performance and propose to improve calibration via approximate Bayesian techniques. We introduce a family of new SSL models that optimizes for calibration and demonstrate their effectiveness across standard vision benchmarks of CIFAR-10, CIFAR-100 and ImageNet, giving up to 15.9% improvement in test accuracy. Furthermore, we also demonstrate their effectiveness in additional realistic and challenging problems, such as class-imbalanced datasets and in photonics science. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: 24 pages

arXiv:2202.03159 [pdf, ps, other]

$L^2$-Betti numbers and computability of reals

Authors: Clara Loeh, Matthias Uschold

Abstract: We study the computability degree of real numbers arising as $L^2$-Betti numbers or $L^2$-torsion of groups, parametrised over the Turing degree of the word problem. We study the computability degree of real numbers arising as $L^2$-Betti numbers or $L^2$-torsion of groups, parametrised over the Turing degree of the word problem. △ Less

Submitted 7 March, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: 33 pages; To appear in Computability; v2: clarified Theorem 1.5; v3: removed Section 9, minor corrections; v4: added Appendix B and Remark 1.5; Lean implementation available at https://gitlab.com/L2-comp/l2-comp-lean;

arXiv:2111.00899 [pdf, other]

Equivariant Contrastive Learning

Authors: Rumen Dangovski, Li Jing, Charlotte Loh, Seungwook Han, Akash Srivastava, Brian Cheung, Pulkit Agrawal, Marin Soljačić

Abstract: In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge. In fact, the property of invariance is a trivial instance of a broader class called equivariance, which can be intuitively understood as the property that representations transform according… ▽ More In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge. In fact, the property of invariance is a trivial instance of a broader class called equivariance, which can be intuitively understood as the property that representations transform according to the way the inputs transform. Here, we show that rather than using only invariance, pre-training that encourages non-trivial equivariance to some transformations, while maintaining invariance to other transformations, can be used to improve the semantic quality of representations. Specifically, we extend popular SSL methods to a more general framework which we name Equivariant Self-Supervised Learning (E-SSL). In E-SSL, a simple additional pre-training objective encourages equivariance by predicting the transformations applied to the input. We demonstrate E-SSL's effectiveness empirically on several popular computer vision benchmarks, e.g. improving SimCLR to 72.5% linear probe accuracy on ImageNet. Furthermore, we demonstrate usefulness of E-SSL for applications beyond computer vision; in particular, we show its utility on regression problems in photonics science. Our code, datasets and pre-trained models are available at https://github.com/rdangovs/essl to aid further research in E-SSL. △ Less

Submitted 14 March, 2022; v1 submitted 28 October, 2021; originally announced November 2021.

Comments: Camera Ready Revision. ICLR 2022. Discussion: https://openreview.net/forum?id=gKLAAfiytI Code: https://github.com/rdangovs/essl

arXiv:2110.08406 [pdf, other]

doi 10.1038/s41467-022-31915-y

Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

Authors: Charlotte Loh, Thomas Christensen, Rumen Dangovski, Samuel Kim, Marin Soljacic

Abstract: Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labelled data needed to train the model; this poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Here, we… ▽ More Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labelled data needed to train the model; this poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Here, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three ``inexpensive'' and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: 1)~abundant unlabeled data, 2)~prior knowledge of symmetries or invariances and 3)~surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: 21 pages, 10 figures

arXiv:2104.11667 [pdf, other]

Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure

Authors: Samuel Kim, Peter Y. Lu, Charlotte Loh, Jamie Smith, Jasper Snoek, Marin Soljačić

Abstract: Bayesian optimization (BO) is a popular paradigm for global optimization of expensive black-box functions, but there are many domains where the function is not completely a black-box. The data may have some known structure (e.g. symmetries) and/or the data generation process may be a composite process that yields useful intermediate or auxiliary information in addition to the value of the optimiza… ▽ More Bayesian optimization (BO) is a popular paradigm for global optimization of expensive black-box functions, but there are many domains where the function is not completely a black-box. The data may have some known structure (e.g. symmetries) and/or the data generation process may be a composite process that yields useful intermediate or auxiliary information in addition to the value of the optimization objective. However, surrogate models traditionally employed in BO, such as Gaussian Processes (GPs), scale poorly with dataset size and do not easily accommodate known structure. Instead, we use Bayesian neural networks, a class of scalable and flexible surrogate models with inductive biases, to extend BO to complex, structured problems with high dimensionality. We demonstrate BO on a number of realistic problems in physics and chemistry, including topology optimization of photonic crystal materials using convolutional neural networks, and chemical property optimization of molecules using graph neural networks. On these complex tasks, we show that neural networks often outperform GPs as surrogate models for BO in terms of both sampling efficiency and computational cost. △ Less

Submitted 6 December, 2022; v1 submitted 23 April, 2021; originally announced April 2021.

Comments: 32 pages, 16 figures; published in TMLR

Journal ref: Transactions on Machine Learning Research (TMLR) September 2022

arXiv:1908.10673 [pdf, other]

A Search for the Underlying Equation Governing Similar Systems

Authors: Changwei Loh, Daniel Schneegass, Pengwei Tian

Abstract: We show a data-driven approach to discover the underlying structural form of the mathematical equation governing the dynamics of multiple but similar systems induced by the same mechanisms. This approach hinges on theories that we lay out involving arguments based on the nature of physical systems. In the same vein, we also introduce a metric to search for the best candidate equation using the dat… ▽ More We show a data-driven approach to discover the underlying structural form of the mathematical equation governing the dynamics of multiple but similar systems induced by the same mechanisms. This approach hinges on theories that we lay out involving arguments based on the nature of physical systems. In the same vein, we also introduce a metric to search for the best candidate equation using the datasets generated from the systems. This approach involves symbolic regression by means of genetic programming and regressions to compute the strength of the interplay between the extrinsic parameters in a candidate equation. We relate these extrinsic parameters to the hidden properties of the data-generating systems. The behavior of a new similar system can be predicted easily by utilizing the discovered structural form of the general equation. As illustrations, we apply the approach to identify candidate structural forms of the underlying equation governing two cases: the changes in a sensor measurement of degrading engines; and the search for the governing equation of systems with known variations of an intrinsic parameter. △ Less

Submitted 27 August, 2019; originally announced August 2019.

arXiv:1812.02824 [pdf, ps, other]

Structural Damage Detection and Localization with Unknown Post-Damage Feature Distribution Using Sequential Change-Point Detection Method

Authors: Yizheng Liao, Anne S. Kiremidjian, Ram Rajagopal, Chin-Hsuing Loh

Abstract: The high structural deficient rate poses serious risks to the operation of many bridges and buildings. To prevent critical damage and structural collapse, a quick structural health diagnosis tool is needed during normal operation or immediately after extreme events. In structural health monitoring (SHM), many existing works will have limited performance in the quick damage identification process b… ▽ More The high structural deficient rate poses serious risks to the operation of many bridges and buildings. To prevent critical damage and structural collapse, a quick structural health diagnosis tool is needed during normal operation or immediately after extreme events. In structural health monitoring (SHM), many existing works will have limited performance in the quick damage identification process because 1) the damage event needs to be identified with short delay and 2) the post-damage information is usually unavailable. To address these drawbacks, we propose a new damage detection and localization approach based on stochastic time series analysis. Specifically, the damage sensitive features are extracted from vibration signals and follow different distributions before and after a damage event. Hence, we use the optimal change point detection theory to find damage occurrence time. As the existing change point detectors require the post-damage feature distribution, which is unavailable in SHM, we propose a maximum likelihood method to learn the distribution parameters from the time-series data. The proposed damage detection using estimated parameters also achieves the optimal performance. Also, we utilize the detection results to find damage location without any further computation. Validation results show highly accurate damage identification in American Society of Civil Engineers benchmark structure and two shake table experiments. △ Less

Submitted 14 November, 2018; originally announced December 2018.

Comments: 20 pages

arXiv:1507.05924 [pdf, ps, other]

doi 10.1109/JSAC.2016.2566138

Multiuser Communication through Power Talk in DC MicroGrids

Authors: Marko Angjelichinoski, Cedomir Stefanovic, Petar Popovski, Hongpeng Liu, Poh Chiang Loh, Frede Blaabjerg

Abstract: Power talk is a novel concept for communication among control units in MicroGrids (MGs), carried out without a dedicated modem, but by using power electronics that interface the common bus. The information is transmitted by modulating the parameters of the primary control, incurring subtle power deviations that can be detected by other units. In this paper, we develop power talk communication stra… ▽ More Power talk is a novel concept for communication among control units in MicroGrids (MGs), carried out without a dedicated modem, but by using power electronics that interface the common bus. The information is transmitted by modulating the parameters of the primary control, incurring subtle power deviations that can be detected by other units. In this paper, we develop power talk communication strategies for DC MG systems with arbitrary number of control units that carry out all-to-all communication. We investigate two multiple access strategies: 1) TDMA, where only one unit transmits at a time, and 2) full duplex, where all units transmit and receive simultaneously. We introduce the notions of signaling space, where the power talk symbol constellations are constructed, and detection space, where the demodulation of the symbols is performed. The proposed communication technique is challenged by the random changes of the bus parameters due to load variations in the system. To this end, we employ a solution based on training sequences, which re-establishes the signaling and detection spaces and thus enables reliable information exchange. The presented results show that power talk is an effective solution for reliable communication among units in DC MG systems. △ Less

Submitted 21 July, 2015; originally announced July 2015.

Comments: Multiuser extension of the power talk concept. Submitted to IEEE JSAC

arXiv:1504.03016 [pdf, ps, other]

Power Talk: How to Modulate Data over a DC Micro Grid Bus using Power Electronics

Authors: Marko Angjelichinoski, Cedomir Stefanovic, Petar Popovski, Hongpeng Liu, Poh Chiang Loh, Frede Blaabjerg

Abstract: We introduce a novel communication strategy for DC Micro Grids (MGs), termed power talk, in which the devices communicate by modulating the power levels in the DC bus. The information is transmitted by varying the parameters that the MG units use to control the level of the common bus voltage, while it is received by processing the bus measurements that units perform. This communication is challen… ▽ More We introduce a novel communication strategy for DC Micro Grids (MGs), termed power talk, in which the devices communicate by modulating the power levels in the DC bus. The information is transmitted by varying the parameters that the MG units use to control the level of the common bus voltage, while it is received by processing the bus measurements that units perform. This communication is challenged by the fact that the voltage level is subject to random disturbances, as the state of the MG changes with random load variations. We develop a corresponding communication model and address the random voltage fluctuations by using coding strategies that transform the MG into some well-known communication channels. The performance analysis shows that it is possible to mitigate the random voltage level variations and communicate reliably over the MG bus. △ Less

Submitted 12 April, 2015; originally announced April 2015.

Comments: IEEE GLOBECOM 2015

Showing 1–16 of 16 results for author: Loh, C