-
Diffusion Guided Language Modeling
Authors:
Justin Lovelace,
Varsha Kishore,
Yiwei Chen,
Kilian Q. Weinberger
Abstract:
Current language models demonstrate remarkable proficiency in text generation. However, for many applications it is desirable to control attributes, such as sentiment, or toxicity, of the generated language -- ideally tailored towards each specific use case and target audience. For auto-regressive language models, existing guidance methods are prone to decoding errors that cascade during generatio…
▽ More
Current language models demonstrate remarkable proficiency in text generation. However, for many applications it is desirable to control attributes, such as sentiment, or toxicity, of the generated language -- ideally tailored towards each specific use case and target audience. For auto-regressive language models, existing guidance methods are prone to decoding errors that cascade during generation and degrade performance. In contrast, text diffusion models can easily be guided with, for example, a simple linear sentiment classifier -- however they do suffer from significantly higher perplexity than auto-regressive alternatives. In this paper we use a guided diffusion model to produce a latent proposal that steers an auto-regressive language model to generate text with desired properties. Our model inherits the unmatched fluency of the auto-regressive approach and the plug-and-play flexibility of diffusion. We show that it outperforms previous plug-and-play guidance methods across a wide range of benchmark data sets. Further, controlling a new attribute in our framework is reduced to training a single logistic regression classifier.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Correction with Backtracking Reduces Hallucination in Summarization
Authors:
Zhenzhen Liu,
Chao Wan,
Varsha Kishore,
Jin Peng Zhou,
Minmin Chen,
Kilian Q. Weinberger
Abstract:
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we intr…
▽ More
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words. Further, we demonstrate that straight-forward backtracking is surprisingly effective at mitigation. We thoroughly evaluate the proposed method with prior art on three benchmark datasets for text summarization. The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility.
△ Less
Submitted 31 October, 2023; v1 submitted 24 October, 2023;
originally announced October 2023.
-
IncDSI: Incrementally Updatable Document Retrieval
Authors:
Varsha Kishore,
Chao Wan,
Justin Lovelace,
Yoav Artzi,
Kilian Q. Weinberger
Abstract:
Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not…
▽ More
Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.
△ Less
Submitted 19 August, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Learning Iterative Neural Optimizers for Image Steganography
Authors:
Xiangyu Chen,
Varsha Kishore,
Kilian Q Weinberger
Abstract:
Image steganography is the process of concealing secret information in images through imperceptible changes. Recent work has formulated this task as a classic constrained optimization problem. In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose an iterative neural network trained to perform the optimization steps. In con…
▽ More
Image steganography is the process of concealing secret information in images through imperceptible changes. Recent work has formulated this task as a classic constrained optimization problem. In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose an iterative neural network trained to perform the optimization steps. In contrast to classical optimization methods like L-BFGS or projected gradient descent, we train the neural network to also stay close to the manifold of natural images throughout the optimization. We show that our learned neural optimization is faster and more reliable than classical optimization approaches. In comparison to previous state-of-the-art encoder-decoder-based steganography methods, it reduces the recovery error rate by multiple orders of magnitude and achieves zero error up to 3 bits per pixel (bpp) without the need for error-correcting codes.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Latent Diffusion for Language Generation
Authors:
Justin Lovelace,
Varsha Kishore,
Chao Wan,
Eliot Shekhtman,
Kilian Q. Weinberger
Abstract:
Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing pretrained language models. We view diffusion and existing language models as complementary. We demonstrate that enc…
▽ More
Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing pretrained language models. We view diffusion and existing language models as complementary. We demonstrate that encoder-decoder language models can be utilized to efficiently learn high-quality language autoencoders. We then demonstrate that continuous diffusion models can be learned in the latent space of the language autoencoder, enabling us to sample continuous latent representations that can be decoded into natural language with the pretrained decoder. We validate the effectiveness of our approach for unconditional, class-conditional, and sequence-to-sequence language generation. We demonstrate across multiple diverse data sets that our latent language diffusion models are significantly more effective than previous diffusion language models.
△ Less
Submitted 7 November, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
BERTScore: Evaluating Text Generation with BERT
Authors:
Tianyi Zhang,
Varsha Kishore,
Felix Wu,
Kilian Q. Weinberger,
Yoav Artzi
Abstract:
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning sys…
▽ More
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task to show that BERTScore is more robust to challenging examples when compared to existing metrics.
△ Less
Submitted 24 February, 2020; v1 submitted 21 April, 2019;
originally announced April 2019.
-
Identifying Diabetic Patients with High Risk of Readmission
Authors:
Malladihalli S Bhuvan,
Ankit Kumar,
Adil Zafar,
Vinith Kishore
Abstract:
Hospital readmissions are expensive and reflect the inadequacies in healthcare system. In the United States alone, treatment of readmitted diabetic patients exceeds 250 million dollars per year. Early identification of patients facing a high risk of readmission can enable healthcare providers to to conduct additional investigations and possibly prevent future readmissions. This not only improves t…
▽ More
Hospital readmissions are expensive and reflect the inadequacies in healthcare system. In the United States alone, treatment of readmitted diabetic patients exceeds 250 million dollars per year. Early identification of patients facing a high risk of readmission can enable healthcare providers to to conduct additional investigations and possibly prevent future readmissions. This not only improves the quality of care but also reduces the medical expenses on readmission. Machine learning methods have been leveraged on public health data to build a system for identifying diabetic patients facing a high risk of future readmission. Number of inpatient visits, discharge disposition and admission type were identified as strong predictors of readmission. Further, it was found that the number of laboratory tests and discharge disposition together predict whether the patient will be readmitted shortly after being discharged from the hospital (i.e. <30 days) or after a longer period of time (i.e. >30 days). These insights can help healthcare providers to improve inpatient diabetic care. Finally, the cost analysis suggests that \$252.76 million can be saved across 98,053 diabetic patient encounters by incorporating the proposed cost sensitive analysis model.
△ Less
Submitted 12 February, 2016;
originally announced February 2016.
-
Extreme events and event size fluctuations in biased random walks on networks
Authors:
Vimal Kishore,
M. S. Santhanam,
R. E. Amritkar
Abstract:
Random walk on discrete lattice models is important to understand various types of transport processes. The extreme events, defined as exceedences of the flux of walkers above a prescribed threshold, have been studied recently in the context of complex networks. This was motivated by the occurrence of rare events such as traffic jams, floods, and power black-outs which take place on networks. In t…
▽ More
Random walk on discrete lattice models is important to understand various types of transport processes. The extreme events, defined as exceedences of the flux of walkers above a prescribed threshold, have been studied recently in the context of complex networks. This was motivated by the occurrence of rare events such as traffic jams, floods, and power black-outs which take place on networks. In this work, we study extreme events in a generalized random walk model in which the walk is preferentially biased by the network topology. The walkers preferentially choose to hop toward the hubs or small degree nodes. In this setting, we show that extremely large fluctuations in event-sizes are possible on small degree nodes when the walkers are biased toward the hubs. In particular, we obtain the distribution of event-sizes on the network. Further, the probability for the occurrence of extreme events on any node in the network depends on its 'generalized strength', a measure of the ability of a node to attract walkers. The 'generalized strength' is a function of the degree of the node and that of its nearest neighbors. We obtain analytical and simulation results for the probability of occurrence of extreme events on the nodes of a network using a generalized random walk model. The result reveals that the nodes with a larger value of 'generalized strength', on average, display lower probability for the occurrence of extreme events compared to the nodes with lower values of 'generalized strength'.
△ Less
Submitted 30 May, 2012; v1 submitted 9 December, 2011;
originally announced December 2011.
-
Extreme events on complex networks
Authors:
Vimal Kishore,
M. S. Santhanam,
R. E. Amritkar
Abstract:
We study the extreme events taking place on complex networks. The transport on networks is modelled using random walks and we compute the probability for the occurance and recurrence of extreme events on the network. We show that the nodes with smaller number of links are more prone to extreme events than the ones with larger number of links. We obtain analytical estimates and verify them with num…
▽ More
We study the extreme events taking place on complex networks. The transport on networks is modelled using random walks and we compute the probability for the occurance and recurrence of extreme events on the network. We show that the nodes with smaller number of links are more prone to extreme events than the ones with larger number of links. We obtain analytical estimates and verify them with numerical simulations. They are shown to be robust even when random walkers follow shortest path on the network. The results suggest a revision of design principles and can be used as an input for designing the nodes of a network so as to smoothly handle an extreme event.
△ Less
Submitted 9 February, 2011;
originally announced February 2011.