Search | arXiv e-print repository

Towards Foundation Time Series Model: To Synthesize Or Not To Synthesize?

Authors: Kseniia Kuvshinova, Olga Tsymboi, Alina Kostromina, Dmitry Simakov, Elizaveta Kovtun

Abstract: The industry is rich in cases when we are required to make forecasting for large amounts of time series at once. However, we might be in a situation where we can not afford to train a separate model for each of them. Such issue in time series modeling remains without due attention. The remedy for this setting is the establishment of a foundation model. Such a model is expected to work in zero-shot… ▽ More The industry is rich in cases when we are required to make forecasting for large amounts of time series at once. However, we might be in a situation where we can not afford to train a separate model for each of them. Such issue in time series modeling remains without due attention. The remedy for this setting is the establishment of a foundation model. Such a model is expected to work in zero-shot and few-shot regimes. However, what should we take as a training dataset for such kind of model? Witnessing the benefits from the enrichment of NLP datasets with artificially-generated data, we might want to adopt their experience for time series. In contrast to natural language, the process of generation of synthetic time series data is even more favorable because it provides full control of series patterns, time horizons, and number of samples. In this work, we consider the essential question if it is advantageous to train a foundation model on synthetic data or it is better to utilize only a limited number of real-life examples. Our experiments are conducted only for regular time series and speak in favor of leveraging solely the real time series. Moreover, the choice of the proper source dataset strongly influences the performance during inference. When provided access even to a limited quantity of short time series data, employing it within a supervised framework yields more favorable results than training on a larger volume of synthetic data. The code for our experiments is publicly available on Github \url{https://github.com/sb-ai-lab/synthesize_or_not}. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2308.10201 [pdf, other]

Hiding Backdoors within Event Sequence Data via Poisoning Attacks

Authors: Alina Ermilova, Elizaveta Kovtun, Dmitry Berestnev, Alexey Zaytsev

Abstract: The financial industry relies on deep learning models for making important decisions. This adoption brings new danger, as deep black-box models are known to be vulnerable to adversarial attacks. In computer vision, one can shape the output during inference by performing an adversarial attack called poisoning via introducing a backdoor into the model during training. For sequences of financial tran… ▽ More The financial industry relies on deep learning models for making important decisions. This adoption brings new danger, as deep black-box models are known to be vulnerable to adversarial attacks. In computer vision, one can shape the output during inference by performing an adversarial attack called poisoning via introducing a backdoor into the model during training. For sequences of financial transactions of a customer, insertion of a backdoor is harder to perform, as models operate over a more complex discrete space of sequences, and systematic checks for insecurities occur. We provide a method to introduce concealed backdoors, creating vulnerabilities without altering their functionality for uncontaminated data. To achieve this, we replace a clean model with a poisoned one that is aware of the availability of a backdoor and utilize this knowledge. Our most difficult for uncovering attacks include either additional supervised detection step of poisoned data activated during the test or well-hidden model weight modifications. The experimental study provides insights into how these effects vary across different datasets, architectures, and model components. Alternative methods and baselines, such as distillation-type regularization, are also explored but found to be less efficient. Conducted on three open transaction datasets and architectures, including LSTM, CNN, and Transformer, our findings not only illuminate the vulnerabilities in contemporary models but also can drive the construction of more robust systems. △ Less

Submitted 25 August, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

arXiv:2303.14221 [pdf, other]

The Battle of Information Representations: Comparing Sentiment and Semantic Features for Forecasting Market Trends

Authors: Andrei Zaichenko, Aleksei Kazakov, Elizaveta Kovtun, Semen Budennyy

Abstract: The study of the stock market with the attraction of machine learning approaches is a major direction for revealing hidden market regularities. This knowledge contributes to a profound understanding of financial market dynamics and getting behavioural insights, which could hardly be discovered with traditional analytical methods. Stock prices are inherently interrelated with world events and socia… ▽ More The study of the stock market with the attraction of machine learning approaches is a major direction for revealing hidden market regularities. This knowledge contributes to a profound understanding of financial market dynamics and getting behavioural insights, which could hardly be discovered with traditional analytical methods. Stock prices are inherently interrelated with world events and social perception. Thus, in constructing the model for stock price prediction, the critical stage is to incorporate such information on the outside world, reflected through news and social media posts. To accommodate this, researchers leverage the implicit or explicit knowledge representations: (1) sentiments extracted from the texts or (2) raw text embeddings. However, there is too little research attention to the direct comparison of these approaches in terms of the influence on the predictive power of financial models. In this paper, we aim to close this gap and figure out whether the semantic features in the form of contextual embeddings are more valuable than sentiment attributes for forecasting market trends. We consider the corpus of Twitter posts related to the largest companies by capitalization from NASDAQ and their close prices. To start, we demonstrate the connection of tweet sentiments with the volatility of companies' stock prices. Convinced of the existing relationship, we train Temporal Fusion Transformer models for price prediction supplemented with either tweet sentiments or tweet embeddings. Our results show that in the substantially prevailing number of cases, the use of sentiment features leads to higher metrics. Noteworthy, the conclusions are justifiable within the considered scenario involving Twitter posts and stocks of the biggest tech companies. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.00280 [pdf, other]

Label Attention Network for sequential multi-label classification: you were looking at a wrong self-attention

Authors: Elizaveta Kovtun, Galina Boeva, Artem Zabolotnyi, Evgeny Burnaev, Martin Spindler, Alexey Zaytsev

Abstract: Most of the available user information can be represented as a sequence of timestamped events. Each event is assigned a set of categorical labels whose future structure is of great interest. For instance, our goal is to predict a group of items in the next customer's purchase or tomorrow's client transactions. This is a multi-label classification problem for sequential data. Modern approaches focu… ▽ More Most of the available user information can be represented as a sequence of timestamped events. Each event is assigned a set of categorical labels whose future structure is of great interest. For instance, our goal is to predict a group of items in the next customer's purchase or tomorrow's client transactions. This is a multi-label classification problem for sequential data. Modern approaches focus on transformer architecture for sequential data introducing self-attention for the elements in a sequence. In that case, we take into account events' time interactions but lose information on label inter-dependencies. Motivated by this shortcoming, we propose leveraging a self-attention mechanism over labels preceding the predicted step. As our approach is a Label-Attention NETwork, we call it LANET. Experimental evidence suggests that LANET outperforms the established models' performance and greatly captures interconnections between labels. For example, the micro-AUC of our approach is $0.9536$ compared to $0.7501$ for a vanilla transformer. We provide an implementation of LANET to facilitate its wider usage. △ Less

Submitted 4 April, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

arXiv:2210.09611 [pdf, other]

Relationships between patenting trends and research activity for green energy technologies

Authors: Regina Tuganova, Anna Permyakova, Anna Kuznetsova, Karina Rakhmanova, Natalia Monzul, Roman Uvarov, Elizaveta Kovtun, Semen Budennyy

Abstract: Green technology is viewed as a means of creating a sustainable society and a catalyst for sustainable development by the global community. It is responsible for both the potential reduction of production waste and the reduction of carbon footprint and CO2 emissions. However, alongside with the growing popularity of green technologies, there is an emerging skepticism about their contribution to so… ▽ More Green technology is viewed as a means of creating a sustainable society and a catalyst for sustainable development by the global community. It is responsible for both the potential reduction of production waste and the reduction of carbon footprint and CO2 emissions. However, alongside with the growing popularity of green technologies, there is an emerging skepticism about their contribution to solving environmental challenges. This article focuses on three areas of eco-innovation in green technology: renewable energy, hydrogen power, and decarbonization. Our main goal is to analyze the relationship between publication activity and the number of patented research results, thus shedding light on the real-world applicability of scientific outcomes. We used several bibliometric methods for analyzing global publication and patent activity, applied to the Scopus citation database and the European Patent Office's patent database. Our results show that the advancement of research in all three areas of eco-innovation does not automatically lead to the increase in the number of patents. We offer possible reasons for such dependency based on the observations of the worldwide tendencies in green innovation sphere. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: 11 pages, 3 figures

arXiv:2208.07248 [pdf, other]

New drugs and stock market: how to predict pharma market reaction to clinical trial announcements

Authors: Semen Budennyy, Alexey Kazakov, Elizaveta Kovtun, Leonid Zhukov

Abstract: Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence for the result promulgation influence on the public… ▽ More Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence for the result promulgation influence on the public pharma market value. Whereas most works focus on retrospective impact analysis, the present research aims to predict the numerical values of announcement-induced changes in stock prices. For this purpose, we develop a pipeline that includes a BERT-based model for extracting sentiment polarity of announcements, a Temporal Fusion Transformer for forecasting the expected return, a graph convolution network for capturing event relationships, and gradient boosting for predicting the price change. The challenge of the problem lies in inherently different patterns of responses to positive and negative announcements, reflected in a stronger and more pronounced reaction to the negative news. Moreover, such phenomenon as the drop in stocks after the positive announcements affirms the counterintuitiveness of the price behavior. Importantly, we discover two crucial factors that should be considered while working within a predictive framework. The first factor is the drug portfolio size of the company, indicating the greater susceptibility to an announcement in the case of small drug diversification. The second one is the network effect of the events related to the same company or nosology. All findings and insights are gained on the basis of one of the biggest FDA (the Food and Drug Administration) announcement datasets, consisting of 5436 clinical trial announcements from 681 companies over the last five years. △ Less

Submitted 16 August, 2022; v1 submitted 11 August, 2022; originally announced August 2022.

Comments: 17 pages, 14 figures

arXiv:2106.08361 [pdf, other]

Adversarial Attacks on Deep Models for Financial Transaction Records

Authors: Ivan Fursov, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun, Rodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey Zaytsev, Evgeny Burnaev

Abstract: Machine learning models using transaction records as inputs are popular among financial institutions. The most efficient models use deep-learning architectures similar to those in the NLP community, posing a challenge due to their tremendous number of parameters and limited robustness. In particular, deep-learning models are vulnerable to adversarial attacks: a little change in the input harms the… ▽ More Machine learning models using transaction records as inputs are popular among financial institutions. The most efficient models use deep-learning architectures similar to those in the NLP community, posing a challenge due to their tremendous number of parameters and limited robustness. In particular, deep-learning models are vulnerable to adversarial attacks: a little change in the input harms the model's output. In this work, we examine adversarial attacks on transaction records data and defences from these attacks. The transaction records data have a different structure than the canonical NLP or time series data, as neighbouring records are less connected than words in sentences, and each record consists of both discrete merchant code and continuous transaction amount. We consider a black-box attack scenario, where the attack doesn't know the true decision model, and pay special attention to adding transaction tokens to the end of a sequence. These limitations provide more realistic scenario, previously unexplored in NLP world. The proposed adversarial attacks and the respective defences demonstrate remarkable performance using relevant datasets from the financial industry. Our results show that a couple of generated transactions are sufficient to fool a deep-learning model. Further, we improve model robustness via adversarial training or separate adversarial examples detection. This work shows that embedding protection from adversarial attacks improves model robustness, allowing a wider adoption of deep models for transaction records in banking and finance. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Showing 1–7 of 7 results for author: Kovtun, E