Skip to main content

Showing 1–9 of 9 results for author: Troshin, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04615  [pdf, other

    cs.CL

    ARM: Efficient Guided Decoding with Autoregressive Reward Models

    Authors: Sergey Troshin, Vlad Niculae, Antske Fokkens

    Abstract: Language models trained on large amounts of data require careful tuning to be safely deployed in real world. We revisit the guided decoding paradigm, where the goal is to augment the logits of the base language model using the scores from a task-specific reward model. We propose a simple but efficient parameterization of the autoregressive reward model enabling fast and effective guided decoding.… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2308.00683  [pdf, other

    cs.LG cs.CL cs.SE

    CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

    Authors: Nadezhda Chirkova, Sergey Troshin

    Abstract: Recent works have widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code. This work investigates another important aspect of such models, namely the effect of different subtokenization options, and aims at identifying most effe… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: Published at ICLR 2023

  3. arXiv:2301.03988  [pdf, other

    cs.SE cs.AI cs.LG

    SantaCoder: don't reach for the stars!

    Authors: Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo , et al. (16 additional authors not shown)

    Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat… ▽ More

    Submitted 24 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

  4. arXiv:2202.08975  [pdf, other

    cs.SE cs.CL cs.LG

    Probing Pretrained Models of Source Code

    Authors: Sergey Troshin, Nadezhda Chirkova

    Abstract: Deep learning models are widely used for solving challenging code processing tasks, such as code generation or code summarization. Traditionally, a specific model architecture was carefully built to solve a particular code processing task. However, recently general pretrained models such as CodeBERT or CodeT5 have been shown to outperform task-specific models in many applications. While pretrained… ▽ More

    Submitted 17 November, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

  5. arXiv:2112.14423  [pdf, other

    eess.SP cs.LG cs.NI

    Machine Learning Methods for Spectral Efficiency Prediction in Massive MIMO Systems

    Authors: Evgeny Bobrov, Sergey Troshin, Nadezhda Chirkova, Ekaterina Lobacheva, Sviatoslav Panchenko, Dmitry Vetrov, Dmitry Kropotov

    Abstract: Channel decoding, channel detection, channel assessment, and resource management for wireless multiple-input multiple-output (MIMO) systems are all examples of problems where machine learning (ML) can be successfully applied. In this paper, we study several ML approaches to solve the problem of estimating the spectral efficiency (SE) value for a certain precoding scheme, preferably in the shortest… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: To appear in Optimization Methods & Software, 22 pages, 10 figures, 2 tables

  6. Study on Precoding Optimization Algorithms in Massive MIMO System with Multi-Antenna Users

    Authors: Evgeny Bobrov, Dmitry Kropotov, Sergey Troshin, Danila Zaev

    Abstract: The paper studies the multi-user precoding problem as a non-convex optimization problem for wireless multiple input and multiple output (MIMO) systems. In our work, we approximate the target Spectral Efficiency function with a novel computationally simpler function. Then, we reduce the precoding problem to an unconstrained optimization task using a special differential projection method and solve… ▽ More

    Submitted 20 June, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 16 pages, 6 figures, 6 tables, the work has been accepted for publication in Optimization Methods and Software, comments are welcome

  7. arXiv:2107.00853  [pdf, other

    cs.IT cs.NI

    Adaptive Regularized Zero-Forcing Beamforming in Massive MIMO with Multi-Antenna Users

    Authors: Evgeny Bobrov, Boris Chinyaev, Viktor Kuznetsov, Hao Lu, Dmitrii Minenkov, Sergey Troshin, Daniil Yudakov, Danila Zaev

    Abstract: Modern wireless cellular networks use massive multiple-input multiple-output (MIMO) technology. This technology involves operations with an antenna array at a base station that simultaneously serves multiple mobile devices which also use multiple antennas on their side. For this, various precoding and detection techniques are used, allowing each user to receive the signal intended for him from the… ▽ More

    Submitted 25 July, 2023; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: 30 pages, 9 figures, 6 tables, prepared for the Wireless Networks, comments are welcome

  8. arXiv:2010.12663  [pdf, other

    cs.SE cs.LG

    A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

    Authors: Nadezhda Chirkova, Sergey Troshin

    Abstract: There is an emerging interest in the application of natural language processing models to source code processing tasks. One of the major problems in applying deep learning to software engineering is that source code often contains a lot of rare identifiers, resulting in huge vocabularies. We propose a simple, yet effective method, based on identifier anonymization, to handle out-of-vocabulary (OOV… ▽ More

    Submitted 27 April, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Published at the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021)

  9. arXiv:2010.07987  [pdf, other

    cs.LG cs.CL cs.SE

    Empirical Study of Transformers for Source Code

    Authors: Nadezhda Chirkova, Sergey Troshin

    Abstract: Initially developed for natural language processing (NLP), Transformers are now widely used for source code processing, due to the format similarity between source code and text. In contrast to natural language, source code is strictly structured, i.e., it follows the syntax of the programming language. Several recent works develop Transformer modifications for capturing syntactic information in s… ▽ More

    Submitted 24 June, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Published at the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2021 (ESEC/FSE'21)