Zum Hauptinhalt springen

Showing 1–50 of 123 results for author: Oseledets, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16414  [pdf, other

    cs.LG cs.AI math.NA physics.comp-ph

    Fourier Spectral Physics Informed Neural Network: An Efficient and Low-Memory PINN

    Authors: Tianchi Yu, Yiming Qi, Ivan Oseledets, Shiyi Chen

    Abstract: With growing investigations into solving partial differential equations by physics-informed neural networks (PINNs), more accurate and efficient PINNs are required to meet the practical demands of scientific computing. One bottleneck of current PINNs is computing the high-order derivatives via automatic differentiation which often necessitates substantial computing resources. In this paper, we foc… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  2. RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

    Authors: Danil Gusak, Gleb Mezentsev, Ivan Oseledets, Evgeny Frolov

    Abstract: Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE… ▽ More

    Submitted 14 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: 5 pages, accepted for CIKM'24

  3. arXiv:2407.15545  [pdf, other

    cs.LG

    Inverted Activations

    Authors: Georgii Novikov, Ivan Oseledets

    Abstract: The scaling of neural networks with increasing data and model sizes necessitates more efficient deep learning algorithms. This paper addresses the memory footprint challenge in neural network training by proposing a modification to the handling of activation tensors in pointwise nonlinearity layers. Traditionally, these layers save the entire input tensor for the backward pass, leading to substant… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  4. arXiv:2406.04709  [pdf, other

    cs.LG

    ConDiff: A Challenging Dataset for Neural Solvers of Partial Differential Equations

    Authors: Vladislav Trifonov, Alexander Rudikov, Oleg Iliev, Ivan Oseledets, Ekaterina Muravleva

    Abstract: We present ConDiff, a novel dataset for scientific machine learning. ConDiff focuses on the diffusion equation with varying coefficients, a fundamental problem in many applications of parametric partial differential equations (PDEs). The main novelty of the proposed dataset is that we consider discontinuous coefficients with high contrast. These coefficient functions are sampled from a selected se… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  5. arXiv:2406.02645  [pdf, ps, other

    physics.comp-ph cs.AI cs.LG math.NA

    Astral: training physics-informed neural networks with error majorants

    Authors: Vladimir Fanaskov, Tianchi Yu, Alexander Rudikov, Ivan Oseledets

    Abstract: The primal approach to physics-informed learning is a residual minimization. We argue that residual is, at best, an indirect measure of the error of approximate solution and propose to train with error majorant instead. Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution and stop the optimization process when the desired ac… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2405.15557  [pdf, other

    cs.LG math.NA

    Learning from Linear Algebra: A Graph Neural Network Approach to Preconditioner Design for Conjugate Gradient Solvers

    Authors: Vladislav Trifonov, Alexander Rudikov, Oleg Iliev, Ivan Oseledets, Ekaterina Muravleva

    Abstract: Large linear systems are ubiquitous in modern computational science. The main recipe for solving them is iterative solvers with well-designed preconditioners. Deep learning models may be used to precondition residuals during iteration of such linear solvers as the conjugate gradient (CG) method. Neural network models require an enormous number of parameters to approximate well in this setup. Anoth… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  7. arXiv:2405.12250  [pdf, other

    cs.LG cs.AI cs.CL

    Your Transformer is Secretly Linear

    Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 9 pages, 9 figures

  8. arXiv:2405.07562  [pdf, other

    cs.LG cs.AI

    GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation

    Authors: Andrey V. Galichin, Mikhail Pautov, Alexey Zhavoronkin, Oleg Y. Rogov, Ivan Oseledets

    Abstract: While Deep Neural Networks (DNNs) have demonstrated remarkable performance in tasks related to perception and control, there are still several unresolved concerns regarding the privacy of their training data, particularly in the context of vulnerability to Membership Inference Attacks (MIAs). In this paper, we explore a connection between the susceptibility to membership inference attacks and the… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  9. arXiv:2404.18791  [pdf, other

    cs.SD cs.AI eess.AS

    Certification of Speaker Recognition Models to Additive Perturbations

    Authors: Dmitrii Korzh, Elvir Karimov, Mikhail Pautov, Oleg Y. Rogov, Ivan Oseledets

    Abstract: Speaker recognition technology is applied in various tasks ranging from personal virtual assistants to secure access systems. However, the robustness of these systems against adversarial attacks, particularly to additive perturbations, remains a significant challenge. In this paper, we pioneer applying robustness certification techniques to speaker recognition, originally developed for the image d… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 9 pages, 9 figures

  10. arXiv:2404.09737  [pdf, other

    cs.LG cs.CL

    Quantization of Large Language Models with an Overdetermined Basis

    Authors: Daniil Merkulov, Daria Cherniuk, Alexander Rudikov, Ivan Oseledets, Ekaterina Muravleva, Aleksandr Mikhalev, Boris Kashin

    Abstract: In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decompos… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  11. arXiv:2404.06212  [pdf, other

    cs.CV cs.AI cs.LG

    OmniFusion Technical Report

    Authors: Elizaveta Goncharova, Anton Razzhigaev, Matvey Mikhalchuk, Maxim Kurkin, Irina Abdullaeva, Matvey Skripkin, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: Last year, multimodal architectures served up a revolution in AI-based approaches and solutions, extending the capabilities of large language models (LLM). We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality. We evaluated and compared several architecture design principles for better text and visual data coupling: MLP and transformer adapters, various… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 17 pages, 4 figures, 9 tables, 2 appendices

    MSC Class: 6804; 68T50 (Primary) ACM Class: I.2.7; I.2.10; I.4.9

  12. arXiv:2402.03232  [pdf, other

    cs.LG

    Explicit Flow Matching: On The Theory of Flow Matching Algorithms with Applications

    Authors: Gleb Ryzhakov, Svetlana Pavlova, Egor Sevriugov, Ivan Oseledets

    Abstract: This paper proposes a novel method, Explicit Flow Matching (ExFM), for training and analyzing flow-based generative models. ExFM leverages a theoretically grounded loss function, ExFM loss (a tractable form of Flow Matching (FM) loss), to demonstrably reduce variance during training, leading to faster convergence and more stable learning. Based on theoretical analysis of these formulas, we derived… ▽ More

    Submitted 1 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  13. arXiv:2402.02890  [pdf, other

    cs.LG math.OC

    Black-Box Approximation and Optimization with Hierarchical Tucker Decomposition

    Authors: Gleb Ryzhakov, Andrei Chertkov, Artem Basharin, Ivan Oseledets

    Abstract: We develop a new method HTBB for the multidimensional black-box approximation and gradient-free optimization, which is based on the low-rank hierarchical Tucker decomposition with the use of the MaxVol indices selection procedure. Numerical experiments for 14 complex model problems demonstrate the robustness of the proposed method for dimensions up to 1000, while it shows significantly more accura… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  14. arXiv:2402.01376  [pdf

    cs.CL cs.AI cs.LG

    LoTR: Low Tensor Rank Weight Adaptation

    Authors: Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets

    Abstract: In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture. Widely used LoRA-like methods of fine-tuning LLMs are based on matrix factorization of gradient update. We introduce LoTR, a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor dec… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Submitted; missing author and sections were added;

  15. arXiv:2401.16367  [pdf, other

    cs.LG cs.AI cs.CL

    TQCompressor: improving tensor decomposition methods in neural networks via permutations

    Authors: V. Abronin, A. Naumov, D. Mazur, D. Bystrov, K. Tsarova, Ar. Melnikov, I. Oseledets, S. Dolgov, R. Brasher, M. Perelshtein

    Abstract: We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained language models in NLP tasks and propose a permutation-based enhancement to Kronecker decomposition. This enhancement makes it possible to reduce loss in model expressivity which is usually associ… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  16. arXiv:2401.14031  [pdf, other

    cs.LG cs.CR cs.CV

    Sparse and Transferable Universal Singular Vectors Attack

    Authors: Kseniia Kuvshinova, Olga Tsymboi, Ivan Oseledets

    Abstract: The research in the field of adversarial attacks and models' vulnerability is one of the fundamental directions in modern machine learning. Recent studies reveal the vulnerability phenomenon, and understanding the mechanisms behind this is essential for improving neural network characteristics and interpretability. In this paper, we propose a novel sparse universal white-box adversarial attack. Ou… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  17. arXiv:2401.10748  [pdf, other

    cs.NE cs.LG

    Fast gradient-free activation maximization for neurons in spiking neural networks

    Authors: Nikita Pospelov, Andrei Chertkov, Maxim Beketov, Ivan Oseledets, Konstantin Anokhin

    Abstract: Elements of neural networks, both biological and artificial, can be described by their selectivity for specific cognitive features. Understanding these features is important for understanding the inner workings of neural networks. For a living system, such as a neuron, whose response to a stimulus is unknown and not differentiable, the only way to reveal these features is through a feedback loop t… ▽ More

    Submitted 25 June, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

  18. arXiv:2401.08261  [pdf, other

    cs.CR cs.AI

    Probabilistically Robust Watermarking of Neural Networks

    Authors: Mikhail Pautov, Nikita Bogdanov, Stanislav Pyatkin, Oleg Rogov, Ivan Oseledets

    Abstract: As deep learning (DL) models are widely and effectively used in Machine Learning as a Service (MLaaS) platforms, there is a rapidly growing interest in DL watermarking techniques that can be used to confirm the ownership of a particular model. Unfortunately, these methods usually produce watermarks susceptible to model stealing attacks. In our research, we introduce a novel trigger set-based water… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  19. arXiv:2312.10064  [pdf, other

    cs.IR cs.AI

    Dynamic Collaborative Filtering for Matrix- and Tensor-based Recommender Systems

    Authors: Albert Saiapin, Ivan Oseledets, Evgeny Frolov

    Abstract: In production applications of recommender systems, a continuous data flow is employed to update models in real-time. Many recommender models often require complete retraining to adapt to new data. In this work, we introduce a novel collaborative filtering model for sequential problems known as Tucker Integrator Recommender - TIRecA. TIRecA efficiently updates its parameters using only the new data… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  20. arXiv:2312.08075  [pdf, other

    cs.LG stat.ML

    TERM Model: Tensor Ring Mixture Model for Density Estimation

    Authors: Ruituo Wu, Jiani Liu, Ce Zhu, Anh-Huy Phan, Ivan V. Oseledets, Yipeng Liu

    Abstract: Efficient probability density estimation is a core challenge in statistical machine learning. Tensor-based probabilistic graph methods address interpretability and stability concerns encountered in neural network approaches. However, a substantial number of potential tensor permutations can lead to a tensor network with the same structure but varying expressive capabilities. In this paper, we take… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  21. arXiv:2312.03415  [pdf, other

    cs.LG

    Run LoRA Run: Faster and Lighter LoRA Implementations

    Authors: Daria Cherniuk, Aleksandr Mikhalev, Ivan Oseledets

    Abstract: LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers. This technique is used both for fine-tuning and full training of large language models. This paper presents the RunLoRA framework for efficient implementations of LoRA that significantly improves the speed of neural network training and fine-tuning using low-ra… ▽ More

    Submitted 14 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  22. arXiv:2311.05928  [pdf, other

    cs.CL cs.AI cs.IT cs.LG math.GN

    The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

    Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from… ▽ More

    Submitted 26 February, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted to EACL-2024

  23. arXiv:2310.01595  [pdf, other

    cs.RO cs.AI

    Memory-efficient particle filter recurrent neural network for object localization

    Authors: Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa

    Abstract: This study proposes a novel memory-efficient recurrent neural network (RNN) architecture specified to solve the object localization problem. This problem is to recover the object states along with its movement in a noisy environment. We take the idea of the classical particle filter and combine it with GRU RNN architecture. The key feature of the resulting memory-efficient particle filter RNN mode… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  24. arXiv:2309.16710  [pdf, other

    cs.CV cs.AI

    General Lipschitz: Certified Robustness Against Resolvable Semantic Transformations via Transformation-Dependent Randomized Smoothing

    Authors: Dmitrii Korzh, Mikhail Pautov, Olga Tsymboi, Ivan Oseledets

    Abstract: Randomized smoothing is the state-of-the-art approach to construct image classifiers that are provably robust against additive adversarial perturbations of bounded magnitude. However, it is more complicated to construct reasonable certificates against semantic transformation (e.g., image blurring, translation, gamma correction) and their compositions. In this work, we propose \emph{General Lipschi… ▽ More

    Submitted 9 August, 2024; v1 submitted 17 August, 2023; originally announced September 2023.

  25. arXiv:2309.00107  [pdf, other

    cs.CV

    Unsupervised evaluation of GAN sample quality: Introducing the TTJac Score

    Authors: Egor Sevriugov, Ivan Oseledets

    Abstract: Evaluation metrics are essential for assessing the performance of generative models in image synthesis. However, existing metrics often involve high memory and time consumption as they compute the distance between generated samples and real data points. In our study, the new evaluation metric called the "TTJac score" is proposed to measure the fidelity of individual synthesized images in a data-fr… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

    Comments: 11 pages, 7 figures

  26. arXiv:2308.16510  [pdf, other

    cs.CV eess.IV

    Robust GAN inversion

    Authors: Egor Sevriugov, Ivan Oseledets

    Abstract: Recent advancements in real image editing have been attributed to the exploration of Generative Adversarial Networks (GANs) latent space. However, the main challenge of this procedure is GAN inversion, which aims to map the image to the latent space accurately. Existing methods that work on extended latent space $W+$ are unable to achieve low distortion and high editability simultaneously. To addr… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 22 pages, 28 figures

  27. arXiv:2308.04595  [pdf, other

    cs.LG

    Quantization Aware Factorization for Deep Neural Network Compression

    Authors: Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak

    Abstract: Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a d… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  28. arXiv:2306.02697  [pdf, other

    cs.AI

    Efficient GPT Model Pre-training using Tensor Train Matrix Representation

    Authors: Viktoriia Chekalina, Georgii Novikov, Julia Gusak, Ivan Oseledets, Alexander Panchenko

    Abstract: Large-scale transformer models have shown remarkable performance in language modelling tasks. However, such models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch. To reduce the number of the parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Tensor Train Matrix~(TTM)… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  29. arXiv:2305.19818  [pdf, ps, other

    cs.LG

    Spectal Harmonics: Bridging Spectral Embedding and Matrix Completion in Self-Supervised Learning

    Authors: Marina Munkhoeva, Ivan Oseledets

    Abstract: Self-supervised methods received tremendous attention thanks to their seemingly heuristic approach to learning representations that respect the semantics of the data without any apparent supervision in the form of labels. A growing body of literature is already being published in an attempt to build a coherent and theoretically grounded understanding of the workings of a zoo of losses used in mode… ▽ More

    Submitted 30 October, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: 12 pages, 3 figures

  30. arXiv:2304.05342  [pdf, other

    cs.RO

    TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain

    Authors: Alexey I. Boyko, Anastasiia Kornilova, Rahim Tariverdizadeh, Mirfarid Musavian, Larisa Markeeva, Ivan Oseledets, Gonzalo Ferrer

    Abstract: This paper addresses the following research question: ``can one compress a detailed 3D representation and use it directly for point cloud registration?''. Map compression of the scene can be achieved by the tensor train (TT) decomposition of the signed distance function (SDF) representation. It regulates the amount of data reduced by the so-called TT-ranks. Using this representation we have prop… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  31. arXiv:2303.10974  [pdf, other

    cs.CL cs.AI

    Translate your gibberish: black-box adversarial attack on machine translation systems

    Authors: Andrei Chertkov, Olga Tsymboi, Mikhail Pautov, Ivan Oseledets

    Abstract: Neural networks are deployed widely in natural language processing tasks on the industrial scale, and perhaps the most often they are used as compounds of automatic machine translation systems. In this work, we present a simple approach to fool state-of-the-art machine translation tools in the task of translation from Russian to English and vice versa. Using a novel black-box gradient-free tensor-… ▽ More

    Submitted 23 May, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  32. arXiv:2303.07897  [pdf, other

    cs.RO cs.AI

    Multiparticle Kalman filter for object localization in symmetric environments

    Authors: Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa

    Abstract: This study considers the object localization problem and proposes a novel multiparticle Kalman filter to solve it in complex and symmetric environments. Two well-known classes of filtering algorithms to solve the localization problem are Kalman filter-based methods and particle filter-based methods. We consider these classes, demonstrate their complementary properties, and propose a novel filterin… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  33. arXiv:2303.04744  [pdf, other

    cs.IR cs.CR cs.HC cs.LG

    Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction

    Authors: Albert Sayapin, Gleb Balitskiy, Daniel Bershatsky, Aleksandr Katrutsa, Evgeny Frolov, Alexey Frolov, Ivan Oseledets, Vitaliy Kharin

    Abstract: In this study, we propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage. Although this problem can be represented as a classical collaborative filtering problem, it requires proper modification since the data are sequential, the user feedback is distributed among devices and the transmission of users' data to aggregate common patterns must be… ▽ More

    Submitted 5 February, 2023; originally announced March 2023.

  34. arXiv:2301.04998  [pdf, other

    physics.flu-dyn cs.LG

    Machine learning methods for prediction of breakthrough curves in reactive porous media

    Authors: Daria Fokina, Pavel Toktaliev, Oleg Iliev, Ivan Oseledets

    Abstract: Reactive flows in porous media play an important role in our life and are crucial for many industrial, environmental and biomedical applications. Very often the concentration of the species at the inlet is known, and the so-called breakthrough curves, measured at the outlet, are the quantities which could be measured or computed numerically. The measurements and the simulations could be time-consu… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

    MSC Class: 68T99; 76S05

  35. arXiv:2301.03025  [pdf, other

    cs.AI cs.HC

    Mitigating Human and Computer Opinion Fraud via Contrastive Learning

    Authors: Yuliya Tukmacheva, Ivan Oseledets, Evgeny Frolov

    Abstract: We introduce the novel approach towards fake text reviews detection in collaborative filtering recommender systems. The existing algorithms concentrate on detecting the fake reviews, generated by language models and ignore the texts, written by dishonest users, mostly for monetary gains. We propose the contrastive learning-based architecture, which utilizes the user demographic characteristics, al… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

    Comments: 15 pages, 3 figures, 1 table

  36. arXiv:2212.12899  [pdf, other

    math.NA cs.AI cs.LG

    FMM-Net: neural network architecture based on the Fast Multipole Method

    Authors: Daria Sushnikova, Pavel Kharyuk, Ivan Oseledets

    Abstract: In this paper, we propose a new neural network architecture based on the H2 matrix. Even though networks with H2-inspired architecture already exist, and our approach is designed to reduce memory costs and improve performance by taking into account the sparsity template of the H2 matrix. In numerical comparison with alternative neural networks, including the known H2-based ones, our architecture s… ▽ More

    Submitted 25 December, 2022; originally announced December 2022.

  37. arXiv:2212.05720  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Tensor-based Sequential Learning via Hankel Matrix Representation for Next Item Recommendations

    Authors: Evgeny Frolov, Ivan Oseledets

    Abstract: Self-attentive transformer models have recently been shown to solve the next item recommendation task very efficiently. The learned attention weights capture sequential dynamics in user behavior and generalize well. Motivated by the special structure of learned parameter space, we question if it is possible to mimic it with an alternative and more lightweight approach. We develop a new tensor fact… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: 15 pages, 6 figures, submitted to IEEE Access

  38. arXiv:2209.14937  [pdf, other

    math.OC cs.LG math.NA

    NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

    Authors: Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets

    Abstract: Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differentia… ▽ More

    Submitted 30 September, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: We study Nesterov acceleration for the Stochastic Differential Equation

  39. arXiv:2209.14782  [pdf, other

    cs.LG cs.CV math.NA physics.ao-ph stat.ML

    A case study of spatiotemporal forecasting techniques for weather forecasting

    Authors: Shakir Showkat Sofi, Ivan Oseledets

    Abstract: The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most essential processes in this domain, and weather forecasting has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are current… ▽ More

    Submitted 8 June, 2024; v1 submitted 29 September, 2022; originally announced September 2022.

  40. arXiv:2208.01421  [pdf, other

    cs.CV

    T4DT: Tensorizing Time for Learning Temporal 3D Visual Data

    Authors: Mikhail Usvyatsov, Rafael Ballester-Rippoll, Lina Bashaeva, Konrad Schindler, Gonzalo Ferrer, Ivan Oseledets

    Abstract: Unlike 2D raster images, there is no single dominant representation for 3D visual data processing. Different formats like point clouds, meshes, or implicit functions each have their strengths and weaknesses. Still, grid representations such as signed distance functions have attractive properties also in 3D. In particular, they offer constant-time random access and are eminently suitable for modern… ▽ More

    Submitted 5 October, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

  41. arXiv:2208.00406  [pdf, other

    cs.LG cs.AI cs.CE cs.CY

    Eco2AI: carbon emissions tracking of machine learning models as the first step towards sustainable AI

    Authors: Semen Budennyy, Vladimir Lazarev, Nikita Zakharenko, Alexey Korovin, Olga Plosskaya, Denis Dimitrov, Vladimir Arkhipkin, Ivan Oseledets, Ivan Barsola, Ilya Egorov, Aleksandra Kosterina, Leonid Zhukov

    Abstract: The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientists and researchers to track energy consumption and equivalent CO2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy co… ▽ More

    Submitted 3 August, 2022; v1 submitted 31 July, 2022; originally announced August 2022.

    Comments: Source code for eco2AI package (energy consumption and carbon emission tracker of code in python) is available at: https://github.com/sb-ai-lab/Eco2AI , the package is also available at PyPi: https://pypi.org/project/eco2ai/

  42. arXiv:2207.02851  [pdf, other

    quant-ph cond-mat.dis-nn cs.AI cs.LG

    Tensor networks in machine learning

    Authors: Richik Sengupta, Soumik Adhikary, Ivan Oseledets, Jacob Biamonte

    Abstract: A tensor network is a type of decomposition used to express and approximate large arrays of data. A given data-set, quantum state or higher dimensional multi-linear map is factored and approximated by a composition of smaller multi-linear maps. This is reminiscent to how a Boolean function might be decomposed into a gate array: this represents a special case of tensor decomposition, in which the t… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: 7 pages

  43. arXiv:2205.05070  [pdf, other

    cs.IR cs.LG

    Tensor-based Collaborative Filtering With Smooth Ratings Scale

    Authors: Nikita Marin, Elizaveta Makhneva, Maria Lysyuk, Vladimir Chernyy, Ivan Oseledets, Evgeny Frolov

    Abstract: Conventional collaborative filtering techniques don't take into consideration the effect of discrepancy in users' rating perception. Some users may rarely give 5 stars to items while others almost always assign 5 stars to the chosen item. Even if they had experience with the same items this systematic discrepancy in their evaluation style will lead to the systematic errors in the ability of recomm… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: Draft version, submitted for review; 14 pages, 3 tables, 2 figures

  44. arXiv:2205.04490  [pdf, other

    cs.IR cs.LG

    Are Quantum Computers Practical Yet? A Case for Feature Selection in Recommender Systems using Tensor Networks

    Authors: Artyom Nikitin, Andrei Chertkov, Rafael Ballester-Ripoll, Ivan Oseledets, Evgeny Frolov

    Abstract: Collaborative filtering models generally perform better than content-based filtering models and do not require careful feature engineering. However, in the cold-start scenario collaborative information may be scarce or even unavailable, whereas the content information may be abundant, but also noisy and expensive to acquire. Thus, selection of particular features that improve cold-start recommenda… ▽ More

    Submitted 12 May, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: Added affiliation. Fixed table references

  45. arXiv:2205.00293  [pdf, other

    cs.LG cs.NE math.OC

    TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning

    Authors: Konstantin Sozykin, Andrei Chertkov, Roman Schutski, Anh-Huy Phan, Andrzej Cichocki, Ivan Oseledets

    Abstract: We present a novel procedure for optimization based on the combination of efficient quantized tensor train representation and a generalized maximum matrix volume principle. We demonstrate the applicability of the new Tensor Train Optimizer (TTOpt) method for various tasks, ranging from minimization of multidimensional functions to reinforcement learning. Our algorithm compares favorably to popular… ▽ More

    Submitted 28 September, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

    Comments: 26 pages, 8 figures, accepted to Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022). Pre camera-ready version

  46. arXiv:2204.11719  [pdf

    physics.flu-dyn cs.LG physics.comp-ph

    On the Performance of Machine Learning Methods for Breakthrough Curve Prediction

    Authors: Daria Fokina, Oleg Iliev, Pavel Toktaliev, Ivan Oseledets, Felix Schindler

    Abstract: Reactive flows are important part of numerous technical and environmental processes. Often monitoring the flow and species concentrations within the domain is not possible or is expensive, in contrast, outlet concentration is straightforward to measure. In connection with reactive flows in porous media, the term breakthrough curve is used to denote the time dependency of the outlet concentration w… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: Submitted to NAFEMS seminar "Machine Learning und Artificial Intelligence in der Strömungsmechanik und der Strukturanalyse"

  47. arXiv:2203.10833  [pdf, other

    cs.CV cs.LG

    Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

    Authors: Aleksandr Ermolov, Leyla Mirvakhabova, Valentin Khrulkov, Nicu Sebe, Ivan Oseledets

    Abstract: Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations -- usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data… ▽ More

    Submitted 22 March, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  48. Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

    Authors: Aleksandr Katrutsa, Sergey Utyuzhnikov, Ivan Oseledets

    Abstract: The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data. This is entirely a data-driven approach that extracts all necessary information from data snapshots which are commonly supposed to be sampled from measurement. The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missi… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

  49. arXiv:2202.10435  [pdf, ps, other

    cs.LG cs.AI

    Survey on Large Scale Neural Network Training

    Authors: Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont

    Abstract: Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good us… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  50. arXiv:2202.07477  [pdf, other

    stat.ML cs.AI cs.LG math.AP math.NA

    Understanding DDPM Latent Codes Through Optimal Transport

    Authors: Valentin Khrulkov, Gleb Ryzhakov, Andrei Chertkov, Ivan Oseledets

    Abstract: Diffusion models have recently outperformed alternative approaches to model the distribution of natural images, such as GANs. Such diffusion models allow for deterministic sampling via the probability flow ODE, giving rise to a latent space and an encoder map. While having important practical applications, such as estimation of the likelihood, the theoretical properties of this map are not yet ful… ▽ More

    Submitted 5 December, 2022; v1 submitted 14 February, 2022; originally announced February 2022.