Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Rasul, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06424  [pdf, other

    cs.CV

    Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

    Authors: Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong

    Abstract: Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the al… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2405.07836  [pdf, other

    cs.LG stat.ME

    Forecasting with Hyper-Trees

    Authors: Alexander März, Kashif Rasul

    Abstract: This paper introduces the concept of Hyper-Trees and offers a new direction in applying tree-based models to time series data. Unlike conventional applications of decision trees that forecast time series directly, Hyper-Trees are designed to learn the parameters of a target time series model. Our framework leverages the gradient-based nature of boosted trees, which allows us to extend the concept… ▽ More

    Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Forecasting, Gradient Boosting, Hyper-Networks, LightGBM, Parameter Non-Stationarity, Time Series, XGBoost

  3. arXiv:2404.07377  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.IT

    Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI

    Authors: Sahil Garg, Anderson Schneider, Anant Raj, Kashif Rasul, Yuriy Nevmyvaka, Sneihil Gopal, Amit Dhurandhar, Guillermo Cecchi, Irina Rish

    Abstract: Building on the remarkable achievements in generative sampling of natural images, we propose an innovative challenge, potentially overly ambitious, which involves generating samples of entire multivariate time series that resemble images. However, the statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects. This issue is especially problematic for deep g… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  4. arXiv:2403.17031  [pdf, other

    cs.LG

    The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

    Authors: Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

    Abstract: This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work. We create an RLHF pipeline from scratch, enumerate over 20 key implementation details, and share key insights during the reproduction. Our RLHF-trained Pythia models demonstrate significant gains in response quality that scale wit… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  5. arXiv:2402.12722  [pdf, other

    cs.LG

    Structural Knowledge Informed Continual Multivariate Time Series Forecasting

    Authors: Zijie Pan, Yushan Jiang, Dongjin Song, Sahil Garg, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka

    Abstract: Recent studies in multivariate time series (MTS) forecasting reveal that explicitly modeling the hidden dependencies among different time series can yield promising forecasting performance and reliable explanations. However, modeling variable dependencies remains underexplored when MTS is continuously accumulated under different regimes (stages). Due to the potential distribution and dependency di… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  6. arXiv:2310.16944  [pdf, other

    cs.LG cs.CL

    Zephyr: Direct Distillation of LM Alignment

    Authors: Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf

    Abstract: We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Start… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  7. arXiv:2310.08278  [pdf, other

    cs.LG cs.AI

    Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

    Authors: Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, Irina Rish

    Abstract: Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a… ▽ More

    Submitted 8 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: First two authors contributed equally. All data, models and code used are open-source. GitHub: https://github.com/time-series-foundation-models/lag-llama

  8. arXiv:2305.14406  [pdf, other

    cs.LG cs.AI

    Deep Learning based Forecasting: a case study from the online fashion industry

    Authors: Manuel Kunz, Stefan Birr, Mones Raslan, Lei Ma, Zhen Li, Adele Gouttes, Mateusz Koren, Tofigh Naghibi, Johannes Stephan, Mariia Bulycheva, Matthias Grzeschik, Armin Kekić, Michael Narodovitch, Kashif Rasul, Julian Sieber, Tim Januschowski

    Abstract: Demand forecasting in the online fashion industry is particularly amendable to global, data-driven forecasting models because of the industry's set of particular challenges. These include the volume of data, the irregularity, the high amount of turn-over in the catalog and the fixed inventory assumption. While standard deep learning forecasting approaches cater for many of these, the fixed invento… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  9. arXiv:2305.07247  [pdf, other

    cs.LG

    Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation

    Authors: Yu Chen, Wei Deng, Shikai Fang, Fengpei Li, Nicole Tianjiao Yang, Yikai Zhang, Kashif Rasul, Shandian Zhe, Anderson Schneider, Yuriy Nevmyvaka

    Abstract: The Schrödinger bridge problem (SBP) is gaining increasing attention in generative modeling and showing promising potential even in comparison with the score-based generative models (SGMs). SBP can be interpreted as an entropy-regularized optimal transport problem, which conducts projections onto every other marginal alternatingly. However, in practice, only approximated projections are accessible… ▽ More

    Submitted 10 September, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML 2023

  10. arXiv:2211.02590  [pdf, other

    cs.LG

    Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion

    Authors: Marin Biloš, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, Stephan Günnemann

    Abstract: Temporal data such as time series can be viewed as discretized measurements of the underlying function. To build a generative model for such data we have to model the stochastic process that governs it. We propose a solution by defining the denoising diffusion model in the function space which also allows us to naturally handle irregularly-sampled observations. The forward process gradually adds n… ▽ More

    Submitted 19 May, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: International Conference on Machine Learning (ICML), 2023

  11. arXiv:2206.14342  [pdf, other

    cs.LG stat.ML

    Intrinsic Anomaly Detection for Multi-Variate Time Series

    Authors: Stephan Rabanser, Tim Januschowski, Kashif Rasul, Oliver Borchert, Richard Kurle, Jan Gasthaus, Michael Bohlke-Schneider, Nicolas Papernot, Valentin Flunkert

    Abstract: We introduce a novel, practically relevant variation of the anomaly detection problem in multi-variate time series: intrinsic anomaly detection. It appears in diverse practical scenarios ranging from DevOps to IoT, where we want to recognize failures of a system that operates under the influence of a surrounding environment. Intrinsic anomalies are changes in the functional dependency structure be… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

  12. arXiv:2205.15894  [pdf, other

    cs.LG cs.AI

    VQ-AR: Vector Quantized Autoregressive Probabilistic Time Series Forecasting

    Authors: Kashif Rasul, Young-Jin Park, Max Nihlén Ramström, Kyung-Min Kim

    Abstract: Time series models aim for accurate predictions of the future given the past, where the forecasts are used for important downstream tasks like business decision making. In practice, deep learning based time series models come in many forms, but at a high level learn some continuous representation of the past and use it to output point or probabilistic forecasts. In this paper, we introduce a novel… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

  13. arXiv:2107.03743  [pdf, other

    cs.LG cs.AI

    Probabilistic Time Series Forecasting with Implicit Quantile Networks

    Authors: Adèle Gouttes, Kashif Rasul, Mateusz Koren, Johannes Stephan, Tofigh Naghibi

    Abstract: Here, we propose a general method for probabilistic time series forecasting. We combine an autoregressive recurrent neural network to model temporal dynamics with Implicit Quantile Networks to learn a large class of distributions over a time-series target. When compared to other probabilistic neural forecasting models on real- and simulated data, our approach is favorable in terms of point-wise pr… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

    Comments: Accepted at the ICML 2021 Time Series Workshop

  14. arXiv:2101.12072  [pdf, other

    cs.LG cs.AI

    Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting

    Authors: Kashif Rasul, Calvin Seward, Ingmar Schuster, Roland Vollgraf

    Abstract: In this work, we propose \texttt{TimeGrad}, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a… ▽ More

    Submitted 2 February, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8857-8868, 2021

  15. arXiv:2002.06103  [pdf, other

    cs.LG stat.ML

    Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows

    Authors: Kashif Rasul, Abdul-Saboor Sheikh, Ingmar Schuster, Urs Bergmann, Roland Vollgraf

    Abstract: Time series forecasting is often fundamental to scientific and engineering problems and enables decision making. With ever increasing data set sizes, a trivial solution to scale up predictions is to assume independence between interacting time series. However, modeling statistical dependencies can improve accuracy and enable analysis of interaction effects. Deep learning methods are well suited fo… ▽ More

    Submitted 14 January, 2021; v1 submitted 14 February, 2020; originally announced February 2020.

  16. arXiv:1909.02775  [pdf, other

    cs.LG stat.ML

    Set Flow: A Permutation Invariant Normalizing Flow

    Authors: Kashif Rasul, Ingmar Schuster, Roland Vollgraf, Urs Bergmann

    Abstract: We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

  17. arXiv:1902.03657  [pdf, other

    cs.LG stat.ML

    A Bandit Framework for Optimal Selection of Reinforcement Learning Agents

    Authors: Andreas Merentitis, Kashif Rasul, Roland Vollgraf, Abdul-Saboor Sheikh, Urs Bergmann

    Abstract: Deep Reinforcement Learning has been shown to be very successful in complex games, e.g. Atari or Go. These games have clearly defined rules, and hence allow simulation. In many practical applications, however, interactions with the environment are costly and a good simulator of the environment is not available. Further, as environments differ by application, the optimal inductive bias (architectur… ▽ More

    Submitted 10 February, 2019; originally announced February 2019.

    Comments: Published at the 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montreal, Canada. Deep Reinforcement Learning Workshop

  18. arXiv:1712.01141  [pdf, other

    stat.ML cs.LG

    Stochastic Maximum Likelihood Optimization via Hypernetworks

    Authors: Abdul-Saboor Sheikh, Kashif Rasul, Andreas Merentitis, Urs Bergmann

    Abstract: This work explores maximum likelihood optimization of neural networks through hypernetworks. A hypernetwork initializes the weights of another network, which in turn can be employed for typical functional tasks such as regression and classification. We optimize hypernetworks to directly maximize the conditional likelihood of target variables given input. Using this approach we obtain competitive e… ▽ More

    Submitted 12 January, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

    Comments: To appear at NIPS 2017 Workshop on Bayesian Deep Learning

  19. arXiv:1708.07747  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Authors: Han Xiao, Kashif Rasul, Roland Vollgraf

    Abstract: We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image s… ▽ More

    Submitted 15 September, 2017; v1 submitted 25 August, 2017; originally announced August 2017.

    Comments: Dataset is freely available at https://github.com/zalandoresearch/fashion-mnist Benchmark is available at http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/