Zum Hauptinhalt springen

Showing 1–12 of 12 results for author: Cheng, D Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.09668  [pdf, other

    cs.LG cs.AI cs.CL

    How to Train Data-Efficient LLMs

    Authors: Noveen Sachdeva, Benjamin Coleman, Wang-Cheng Kang, Jianmo Ni, Lichan Hong, Ed H. Chi, James Caverlee, Julian McAuley, Derek Zhiyuan Cheng

    Abstract: The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. We seek to understand the tradeoffs associated with data selection routines based on (i) expensive-to-compute data-quality estimates, and (ii) maximizati… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Under review. 44 pages, 30 figures

  2. arXiv:2310.09983  [pdf, other

    cs.LG cs.AI cs.CL cs.IR

    Farzi Data: Autoregressive Data Distillation

    Authors: Noveen Sachdeva, Zexue He, Wang-Cheng Kang, Jianmo Ni, Derek Zhiyuan Cheng, Julian McAuley

    Abstract: We study data distillation for auto-regressive machine learning tasks, where the input and output have a strict left-to-right causal structure. More specifically, we propose Farzi, which summarizes an event sequence dataset into a small number of synthetic sequences -- Farzi Data -- which are optimized to maintain (if not improve) model performance compared to training on the full dataset. Under t… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Under review. 23 pages, 9 figures

  3. arXiv:2305.17386  [pdf, other

    cs.IR cs.LG

    HyperFormer: Learning Expressive Sparse Feature Representations via Hypergraph Transformer

    Authors: Kaize Ding, Albert Jiongqian Liang, Bryan Perrozi, Ting Chen, Ruoxi Wang, Lichan Hong, Ed H. Chi, Huan Liu, Derek Zhiyuan Cheng

    Abstract: Learning expressive representations for high-dimensional yet sparse features has been a longstanding problem in information retrieval. Though recent deep learning methods can partially solve the problem, they often fail to handle the numerous sparse features, particularly those tail feature values with infrequent occurrences in the training data. Worse still, existing methods cannot explicitly lev… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted by SIGIR 2023

  4. arXiv:2305.12102  [pdf, other

    cs.LG cs.IR

    Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

    Authors: Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng

    Abstract: Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely h… ▽ More

    Submitted 14 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: NeurIPS'23 Spotlight

    Journal ref: Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023) 56234-56255

  5. arXiv:2305.06474  [pdf, other

    cs.IR cs.LG

    Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

    Authors: Wang-Cheng Kang, Jianmo Ni, Nikhil Mehta, Maheswaran Sathiamoorthy, Lichan Hong, Ed Chi, Derek Zhiyuan Cheng

    Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities in generalizing to new tasks in a zero-shot or few-shot manner. However, the extent to which LLMs can comprehend user preferences based on their previous behavior remains an emerging and still unclear research question. Traditionally, Collaborative Filtering (CF) has been the most effective method for these tasks, predominantl… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  6. arXiv:2210.14309  [pdf, other

    cs.IR

    Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)

    Authors: Yin Zhang, Ruoxi Wang, Tiansheng Yao, Xinyang Yi, Lichan Hong, James Caverlee, Ed H. Chi, Derek Zhiyuan Cheng

    Abstract: Industry recommender systems usually suffer from highly-skewed long-tail item distributions where a small fraction of the items receives most of the user feedback. This skew hurts recommender quality especially for the item slices without much user feedback. While there have been many research advances made in academia, deploying these methods in production is very difficult and very few improveme… ▽ More

    Submitted 3 September, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted by KDD 2023 Applied Data Science (ADS) track

  7. arXiv:2010.15982  [pdf, other

    cs.IR

    A Model of Two Tales: Dual Transfer Learning Framework for Improved Long-tail Item Recommendation

    Authors: Yin Zhang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Lichan Hong, Ed H. Chi

    Abstract: Highly skewed long-tail item distribution is very common in recommendation systems. It significantly hurts model performance on tail items. To improve tail-item recommendation, we conduct research to transfer knowledge from head items to tail items, leveraging the rich user feedback in head items and the semantic connections between head and tail items. Specifically, we propose a novel dual transf… ▽ More

    Submitted 7 March, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

    Comments: Accepted by WWW 2021 as a long paper

  8. arXiv:2010.10784  [pdf, other

    cs.LG cs.IR

    Learning to Embed Categorical Features without Embedding Tables for Recommendation

    Authors: Wang-Cheng Kang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Ting Chen, Lichan Hong, Ed H. Chi

    Abstract: Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen… ▽ More

    Submitted 7 June, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted to KDD'21, Research Track

  9. arXiv:2008.13535  [pdf, other

    cs.IR cs.LG stat.ML

    DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

    Authors: Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi

    Abstract: Learning effective feature crosses is the key behind building recommender systems. However, the sparse and large feature space requires exhaustive search to identify effective crosses. Deep & Cross Network (DCN) was proposed to automatically and efficiently learn bounded-degree predictive feature interactions. Unfortunately, in models that serve web-scale traffic with billions of training examples… ▽ More

    Submitted 20 October, 2020; v1 submitted 19 August, 2020; originally announced August 2020.

    Journal ref: In Proceedings of the Web Conference 2021 (WWW '21)

  10. arXiv:2008.07032  [pdf, other

    cs.LG stat.ML

    Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems

    Authors: Zhe Chen, Yuyan Wang, Dong Lin, Derek Zhiyuan Cheng, Lichan Hong, Ed H. Chi, Claire Cui

    Abstract: Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the same data can produce very different prediction results. Ensemble method is one state-of-the-art benchmark for prediction uncertainty estimation. However, ensembles are expensive to train and serve for web-scale tr… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    Comments: 9 pages

  11. arXiv:2007.12865  [pdf, other

    cs.LG cs.IR stat.ML

    Self-supervised Learning for Large-scale Item Recommendations

    Authors: Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H. Chi, Steve Tjoa, Jieqi Kang, Evan Ettinger

    Abstract: Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the… ▽ More

    Submitted 24 February, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

  12. arXiv:2002.08530  [pdf, other

    cs.IR

    Learning Multi-granular Quantized Embeddings for Large-Vocab Categorical Features in Recommender Systems

    Authors: Wang-Cheng Kang, Derek Zhiyuan Cheng, Ting Chen, Xinyang Yi, Dong Lin, Lichan Hong, Ed H. Chi

    Abstract: Recommender system models often represent various sparse features like users, items, and categorical features via embeddings. A standard approach is to map each unique feature value to an embedding vector. The size of the produced embedding table grows linearly with the size of the vocabulary. Therefore, a large vocabulary inevitably leads to a gigantic embedding table, creating two severe problem… ▽ More

    Submitted 24 August, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: longer version