Search | arXiv e-print repository

DIEKAE: Difference Injection for Efficient Knowledge Augmentation and Editing of Large Language Models

Authors: Alessio Galatolo, Meriem Beloucif, Katie Winkle

Abstract: Pretrained Language Models (PLMs) store extensive knowledge within their weights, enabling them to recall vast amount of information. However, relying on this parametric knowledge brings some limitations such as outdated information or gaps in the training data. This work addresses these problems by distinguish between two separate solutions: knowledge editing and knowledge augmentation. We introd… ▽ More Pretrained Language Models (PLMs) store extensive knowledge within their weights, enabling them to recall vast amount of information. However, relying on this parametric knowledge brings some limitations such as outdated information or gaps in the training data. This work addresses these problems by distinguish between two separate solutions: knowledge editing and knowledge augmentation. We introduce Difference Injection for Efficient Knowledge Augmentation and Editing (DIEKÆ), a new method that decouples knowledge processing from the PLM (LLaMA2-7B, in particular) by adopting a series of encoders. These encoders handle external knowledge and inject it into the PLM layers, significantly reducing computational costs and improving performance of the PLM. We propose a novel training technique for these encoders that does not require back-propagation through the PLM, thus greatly reducing the memory and time required to train them. Our findings demonstrate how our method is faster and more efficient compared to multiple baselines in knowledge augmentation and editing during both training and inference. We have released our code and data at https://github.com/alessioGalatolo/DIEKAE. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: WIP

arXiv:2406.04116 [pdf, ps, other]

Promoting Fairness and Diversity in Speech Datasets for Mental Health and Neurological Disorders Research

Authors: Eleonora Mancini, Ana Tanevska, Andrea Galassi, Alessio Galatolo, Federico Ruggeri, Paolo Torroni

Abstract: Current research in machine learning and artificial intelligence is largely centered on modeling and performance evaluation, less so on data collection. However, recent research demonstrated that limitations and biases in data may negatively impact trustworthiness and reliability. These aspects are particularly impactful on sensitive domains such as mental health and neurological disorders, where… ▽ More Current research in machine learning and artificial intelligence is largely centered on modeling and performance evaluation, less so on data collection. However, recent research demonstrated that limitations and biases in data may negatively impact trustworthiness and reliability. These aspects are particularly impactful on sensitive domains such as mental health and neurological disorders, where speech data are used to develop AI applications aimed at improving the health of patients and supporting healthcare providers. In this paper, we chart the landscape of available speech datasets for this domain, to highlight possible pitfalls and opportunities for improvement and promote fairness and diversity. We present a comprehensive list of desiderata for building speech datasets for mental health and neurological disorders and distill it into a checklist focused on ethical concerns to foster more responsible research. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 34 pages

arXiv:2311.15698 [pdf, other]

Cerbero-7B: A Leap Forward in Language-Specific LLMs Through Enhanced Chat Corpus Generation and Evaluation

Authors: Federico A. Galatolo, Mario G. C. A. Cimino

Abstract: This study introduces a novel approach for generating high-quality, language-specific chat corpora using a self-chat mechanism. We combine a generator LLM for creating new samples and an embedder LLM to ensure diversity. A new Masked Language Modelling (MLM) model-based quality assessment metric is proposed for evaluating and filtering the corpora. Utilizing the llama2-70b as the generator and a m… ▽ More This study introduces a novel approach for generating high-quality, language-specific chat corpora using a self-chat mechanism. We combine a generator LLM for creating new samples and an embedder LLM to ensure diversity. A new Masked Language Modelling (MLM) model-based quality assessment metric is proposed for evaluating and filtering the corpora. Utilizing the llama2-70b as the generator and a multilingual sentence transformer as embedder, we generate an Italian chat corpus and refine the Fauno corpus, which is based on translated English ChatGPT self-chat data. The refinement uses structural assertions and Natural Language Processing techniques. Both corpora undergo a comprehensive quality evaluation using the proposed MLM model-based quality metric. The Italian LLM fine-tuned with these corpora demonstrates significantly enhanced language comprehension and question-answering skills. The resultant model, cerbero-7b, establishes a new state-of-the-art for Italian LLMs. This approach marks a substantial advancement in the development of language-specific LLMs, with a special emphasis on augmenting corpora for underrepresented languages like Italian. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2212.07839 [pdf, other]

TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models

Authors: Federico A. Galatolo, Mario G. C. A. Cimino, Edoardo Cogotti

Abstract: Evaluating and comparing text-to-image models is a challenging problem. Significant advances in the field have recently been made, piquing interest of various industrial sectors. As a consequence, a gold standard in the field should cover a variety of tasks and application contexts. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, made by high-qua… ▽ More Evaluating and comparing text-to-image models is a challenging problem. Significant advances in the field have recently been made, piquing interest of various industrial sectors. As a consequence, a gold standard in the field should cover a variety of tasks and application contexts. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, made by high-quality royalty-free image-text pairs, divided into ten categories; (ii) a quantitative metric, the CLIP-score, (iii) a human evaluation task to distinguish, for a given text, the real and the generated images. The proposed method has been applied to the most recent models, i.e., DALLE2, Latent Diffusion, Stable Diffusion, GLIDE and Craiyon. Early experimental results show that the accuracy of the human judgement is fully coherent with the CLIP-score. The dataset has been made available to the public. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2105.13244 [pdf, other]

Using Early-Learning Regularization to Classify Real-World Noisy Data

Authors: Alessio Galatolo, Alfred Nilsson, Roderick Karlemstrand, Yineng Wang

Abstract: The memorization problem is well-known in the field of computer vision. Liu et al. propose a technique called Early-Learning Regularization, which improves accuracy on the CIFAR datasets when label noise is present. This project replicates their experiments and investigates the performance on a real-world dataset with intrinsic noise. Results show that their experimental results are consistent. We… ▽ More The memorization problem is well-known in the field of computer vision. Liu et al. propose a technique called Early-Learning Regularization, which improves accuracy on the CIFAR datasets when label noise is present. This project replicates their experiments and investigates the performance on a real-world dataset with intrinsic noise. Results show that their experimental results are consistent. We also explore Sharpness-Aware Minimization in addition to SGD and observed a further 14.6 percentage points improvement. Future work includes using all 6 million images and manually clean a fraction of the images to fine-tune a transfer learning model. Last but not the least, having access to clean data for testing would also improve the measurement of accuracy. △ Less

Submitted 1 June, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

arXiv:2102.01645 [pdf, other]

doi 10.5220/0010503701660174

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Authors: Federico A. Galatolo, Mario G. C. A. Cimino, Gigliola Vaglini

Abstract: In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image). CLIP-GLaSS is based on the CLIP neural network, which, given an image and a descriptive caption, provides similar embeddings. Differently, CLIP-GLaSS takes a caption (or an image) as an input, and generates the image (or the caption) whose CLIP e… ▽ More In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image). CLIP-GLaSS is based on the CLIP neural network, which, given an image and a descriptive caption, provides similar embeddings. Differently, CLIP-GLaSS takes a caption (or an image) as an input, and generates the image (or the caption) whose CLIP embedding is the most similar to the input one. This optimal image (or caption) is produced via a generative network, after an exploration by a genetic algorithm. Promising results are shown, based on the experimentation of the image Generators BigGAN and StyleGAN2, and of the text Generator GPT2 △ Less

Submitted 1 October, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

Journal ref: IMPROVE, ISBN 978-989-758-511-1, pages 166-174 (2021)

arXiv:2004.04120 [pdf, other]

doi 10.1016/j.compeleceng.2021.107117

Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms

Authors: Federico A. Galatolo, Mario G. C. A. Cimino, Gigliola Vaglini

Abstract: In this research, some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm are investigated. The paper shows how a naive scalarization can lead to gradients overlapping. Furthermore, the possibility that the entropy regularization term can be a source of uncontrolled noise is discussed. Wit… ▽ More In this research, some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm are investigated. The paper shows how a naive scalarization can lead to gradients overlapping. Furthermore, the possibility that the entropy regularization term can be a source of uncontrolled noise is discussed. With respect to the above issues, a technique to avoid gradient overlapping is proposed, while keeping the same loss formulation. Moreover, a method to avoid the uncontrolled noise, by sampling the actions from distributions with a desired minimum entropy, is investigated. Pilot experiments have been carried out to show how the proposed method speeds up the training. The proposed approach can be applied to any Advantage-based Reinforcement Learning algorithm. △ Less

Submitted 1 October, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

Journal ref: Computers & Electrical Engineering, 92, 107117 (2021)

arXiv:1905.06684 [pdf, other]

doi 10.1007/s11063-021-10490-1

Formal derivation of Mesh Neural Networks with their Forward-Only gradient Propagation

Authors: Federico A. Galatolo, Mario G. C. A. Cimino, Gigliola Vaglini

Abstract: This paper proposes the Mesh Neural Network (MNN), a novel architecture which allows neurons to be connected in any topology, to efficiently route information. In MNNs, information is propagated between neurons throughout a state transition function. State and error gradients are then directly computed from state updates without backward computation. The MNN architecture and the error propagation… ▽ More This paper proposes the Mesh Neural Network (MNN), a novel architecture which allows neurons to be connected in any topology, to efficiently route information. In MNNs, information is propagated between neurons throughout a state transition function. State and error gradients are then directly computed from state updates without backward computation. The MNN architecture and the error propagation schema is formalized and derived in tensor algebra. The proposed computational model can fully supply a gradient descent process, and is potentially suitable for very large scale sparse NNs, due to its expressivity and training efficiency, with respect to NNs based on back-propagation and computational graphs. △ Less

Submitted 30 September, 2021; v1 submitted 16 May, 2019; originally announced May 2019.

Journal ref: Galatolo, F. A., Cimino, M. G., & Vaglini, G. (2021). Formal Derivation of Mesh Neural Networks with Their Forward-Only Gradient Propagation. Neural Processing Letters, 1-16

arXiv:1903.01341 [pdf]

doi 10.5220/0007581508300836

Using stigmergy as a computational memory in the design of recurrent neural networks

Authors: Federico A. Galatolo, Mario G. C. A. Cimino, Gigliola Vaglini

Abstract: In this paper, a novel architecture of Recurrent Neural Network (RNN) is designed and experimented. The proposed RNN adopts a computational memory based on the concept of stigmergy. The basic principle of a Stigmergic Memory (SM) is that the activity of deposit/removal of a quantity in the SM stimulates the next activities of deposit/removal. Accordingly, subsequent SM activities tend to reinforce… ▽ More In this paper, a novel architecture of Recurrent Neural Network (RNN) is designed and experimented. The proposed RNN adopts a computational memory based on the concept of stigmergy. The basic principle of a Stigmergic Memory (SM) is that the activity of deposit/removal of a quantity in the SM stimulates the next activities of deposit/removal. Accordingly, subsequent SM activities tend to reinforce/weaken each other, generating a coherent coordination between the SM activities and the input temporal stimulus. We show that, in a problem of supervised classification, the SM encodes the temporal input in an emergent representational model, by coordinating the deposit, removal and classification activities. This study lays down a basic framework for the derivation of a SM-RNN. A formal ontology of SM is discussed, and the SM-RNN architecture is detailed. To appreciate the computational power of an SM-RNN, comparative NNs have been selected and trained to solve the MNIST handwritten digits recognition benchmark in its two variants: spatial (sequences of bitmap rows) and temporal (sequences of pen strokes). △ Less

Submitted 9 January, 2019; originally announced March 2019.

arXiv:1811.10574 [pdf]

doi 10.1007/978-3-030-05918-7_22

Using stigmergy to incorporate the time into artificial neural networks

Authors: Federico A. Galatolo, Mario G. C. A. Cimino, Gigliola Vaglini

Abstract: A current research trend in neurocomputing involves the design of novel artificial neural networks incorporating the concept of time into their operating model. In this paper, a novel architecture that employs stigmergy is proposed. Computational stigmergy is used to dynamically increase (or decrease) the strength of a connection, or the activation level, of an artificial neuron when stimulated (o… ▽ More A current research trend in neurocomputing involves the design of novel artificial neural networks incorporating the concept of time into their operating model. In this paper, a novel architecture that employs stigmergy is proposed. Computational stigmergy is used to dynamically increase (or decrease) the strength of a connection, or the activation level, of an artificial neuron when stimulated (or released). This study lays down a basic framework for the derivation of a stigmergic NN with a related training algorithm. To show its potential, some pilot experiments have been reported. The XOR problem is solved by using only one single stigmergic neuron with one input and one output. A static NN, a stigmergic NN, a recurrent NN and a long short-term memory NN have been trained to solve the MNIST digits recognition benchmark. △ Less

Submitted 25 October, 2018; originally announced November 2018.

Showing 1–10 of 10 results for author: Galatolo, A