-
Harm Mitigation in Recommender Systems under User Preference Dynamics
Authors:
Jerry Chee,
Shankar Kalyanaraman,
Sindhu Kiranmai Ernala,
Udi Weinsberg,
Sarah Dean,
Stratis Ioannidis
Abstract:
We consider a recommender system that takes into account the interplay between recommendations, the evolution of user interests, and harmful content. We model the impact of recommendations on user behavior, particularly the tendency to consume harmful content. We seek recommendation policies that establish a tradeoff between maximizing click-through rate (CTR) and mitigating harm. We establish con…
▽ More
We consider a recommender system that takes into account the interplay between recommendations, the evolution of user interests, and harmful content. We model the impact of recommendations on user behavior, particularly the tendency to consume harmful content. We seek recommendation policies that establish a tradeoff between maximizing click-through rate (CTR) and mitigating harm. We establish conditions under which the user profile dynamics have a stationary point, and propose algorithms for finding an optimal recommendation policy at stationarity. We experiment on a semi-synthetic movie recommendation setting initialized with real data and observe that our policies outperform baselines at simultaneously maximizing CTR and mitigating harm.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Authors:
Albert Tseng,
Jerry Chee,
Qingyao Sun,
Volodymyr Kuleshov,
Christopher De Sa
Abstract:
Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques. First, QuIP# improves QuIP's (Chee et al., 2023) incoherence processing by using the randomized Had…
▽ More
Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques. First, QuIP# improves QuIP's (Chee et al., 2023) incoherence processing by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_8$ lattice, which achieves the optimal 8-dimension unit ball packing. Third, QuIP# uses fine-tuning to improve fidelity to the original model. Our experiments show that QuIP# outperforms existing PTQ methods, enables new behaviors in PTQ scaling, and supports fast inference. Our code can be found at https://github.com/Cornell-RelaxML/quip-sharp.
△ Less
Submitted 4 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Designing and Testing a Mobile Application for Collecting WhatsApp Chat Data While Preserving Privacy
Authors:
Brennan Schaffner,
Archie Brohn,
Jason Chee,
K. J. Feng,
Marshini Chetty
Abstract:
It is common practice for researchers to join public WhatsApp chats and scrape their contents for analysis. However, research shows collecting data this way contradicts user expectations and preferences, even if the data is effectively public. To overcome these issues, we outline design considerations for collecting WhatsApp chat data with improved user privacy by heightening user control and over…
▽ More
It is common practice for researchers to join public WhatsApp chats and scrape their contents for analysis. However, research shows collecting data this way contradicts user expectations and preferences, even if the data is effectively public. To overcome these issues, we outline design considerations for collecting WhatsApp chat data with improved user privacy by heightening user control and oversight of data collection and taking care to minimize the data researchers collect and process off a user's device. We refer to these design principles as User-Centered Data Sharing (UCDS). To evaluate our UCDS principles, we implemented a mobile application representing one possible instance of these improved data collection techniques and evaluated the viability of using the app to collect WhatsApp chat data. Second, we surveyed WhatsApp users to gather user perceptions on common existing WhatsApp data collection methods as well as UCDS methods. Our results show that we were able to glean similar informative insights into WhatsApp chats using UCDS principles in our prototype app to common, less privacy-preserving methods. Our survey showed that methods following the UCDS principles are preferred by users because they offered users more control over the data collection process. Future user studies could further expand upon UCDS principles to overcome complications of researcher-to-group communication in research on WhatsApp chats and evaluate these principles in other data sharing contexts.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Authors:
Jerry Chee,
Yaohui Cai,
Volodymyr Kuleshov,
Christopher De Sa
Abstract:
This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from $\textit{incoherent}$ weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned w…
▽ More
This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from $\textit{incoherent}$ weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at https://github.com/Cornell-RelaxML/QuIP.
△ Less
Submitted 15 January, 2024; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Performance optimizations on deep noise suppression models
Authors:
Jerry Chee,
Sebastian Braun,
Vishak Gopal,
Ross Cutler
Abstract:
We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model. While deep learning approaches have been remarkably successful in enhancing audio quality, their increased complexity inhibits their deployment in real-time applications. We achieve up to a 7.25X inference speedup over the baseline, with a smooth model…
▽ More
We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model. While deep learning approaches have been remarkably successful in enhancing audio quality, their increased complexity inhibits their deployment in real-time applications. We achieve up to a 7.25X inference speedup over the baseline, with a smooth model performance degradation. Ablation studies indicate that our proposed network re-parameterization (i.e., size per layer) is the major driver of the speedup, and that magnitude structured pruning does comparably to directly training a model in the smaller size. We report inference speed because a parameter reduction does not necessitate speedup, and we measure model quality using an accurate non-intrusive objective speech quality metric.
△ Less
Submitted 8 October, 2021;
originally announced October 2021.
-
Model Preserving Compression for Neural Networks
Authors:
Jerry Chee,
Megan Renz,
Anil Damle,
Christopher De Sa
Abstract:
After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands. When compressing, it is desirable to preserve the original model's per-example decisions (e.g., to go beyond top-1 accuracy or preserve robustness), maintain the network's structure, automatically determine per-layer compression levels, and eliminate the need for fine tuning.…
▽ More
After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands. When compressing, it is desirable to preserve the original model's per-example decisions (e.g., to go beyond top-1 accuracy or preserve robustness), maintain the network's structure, automatically determine per-layer compression levels, and eliminate the need for fine tuning. No existing compression methods simultaneously satisfy these criteria $\unicode{x2014}$ we introduce a principled approach that does by leveraging interpolative decompositions. Our approach simultaneously selects and eliminates channels (analogously, neurons), then constructs an interpolation matrix that propagates a correction into the next layer, preserving the network's structure. Consequently, our method achieves good performance even without fine tuning and admits theoretical analysis. Our theoretical generalization bound for a one layer network lends itself naturally to a heuristic that allows our method to automatically choose per-layer sizes for deep networks. We demonstrate the efficacy of our approach with strong empirical performance on a variety of tasks, models, and datasets $\unicode{x2014}$ from simple one-hidden-layer networks to deep networks on ImageNet.
△ Less
Submitted 14 October, 2022; v1 submitted 30 July, 2021;
originally announced August 2021.
-
How Low Can We Go: Trading Memory for Error in Low-Precision Training
Authors:
Chengrun Yang,
Ziyang Wu,
Jerry Chee,
Christopher De Sa,
Madeleine Udell
Abstract:
Low-precision arithmetic trains deep learning models using less energy, less memory and less time. However, we pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error. As applications proliferate, users must choose which precision to use to train a new model, and chip manufacturers must decide which precisions to manufacture. We view these pr…
▽ More
Low-precision arithmetic trains deep learning models using less energy, less memory and less time. However, we pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error. As applications proliferate, users must choose which precision to use to train a new model, and chip manufacturers must decide which precisions to manufacture. We view these precision choices as a hyperparameter tuning problem, and borrow ideas from meta-learning to learn the tradeoff between memory and error. In this paper, we introduce Pareto Estimation to Pick the Perfect Precision (PEPPP). We use matrix factorization to find non-dominated configurations (the Pareto frontier) with a limited number of network evaluations. For any given memory budget, the precision that minimizes error is a point on this frontier. Practitioners can use the frontier to trade memory for error and choose the best precision for their goals.
△ Less
Submitted 17 March, 2022; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum
Authors:
Jerry Chee,
Ping Li
Abstract:
Convergence detection of iterative stochastic optimization methods is of great practical interest. This paper considers stochastic gradient descent (SGD) with a constant learning rate and momentum. We show that there exists a transient phase in which iterates move towards a region of interest, and a stationary phase in which iterates remain bounded in that region around a minimum point. We constru…
▽ More
Convergence detection of iterative stochastic optimization methods is of great practical interest. This paper considers stochastic gradient descent (SGD) with a constant learning rate and momentum. We show that there exists a transient phase in which iterates move towards a region of interest, and a stationary phase in which iterates remain bounded in that region around a minimum point. We construct a statistical diagnostic test for convergence to the stationary phase using the inner product between successive gradients and demonstrate that the proposed diagnostic works well. We theoretically and empirically characterize how momentum can affect the test statistic of the diagnostic, and how the test statistic captures a relatively sparse signal within the gradients in convergence. Finally, we demonstrate an application to automatically tune the learning rate by reducing it each time stationarity is detected, and show the procedure is robust to mis-specified initial rates.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Convergence diagnostics for stochastic gradient descent with constant step size
Authors:
Jerry Chee,
Panos Toulis
Abstract:
Many iterative procedures in stochastic optimization exhibit a transient phase followed by a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in that region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transition in the context…
▽ More
Many iterative procedures in stochastic optimization exhibit a transient phase followed by a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in that region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transition in the context of stochastic gradient descent with constant learning rate. We present theory and experiments suggesting that the region where the proposed diagnostic is activated coincides with the convergence region. For a class of loss functions, we derive a closed-form solution describing such region. Finally, we suggest an application to speed up convergence of stochastic gradient descent by halving the learning rate each time stationarity is detected. This leads to a new variant of stochastic gradient descent, which in many settings is comparable to state-of-art.
△ Less
Submitted 22 February, 2018; v1 submitted 17 October, 2017;
originally announced October 2017.