Text2TimeSeries: Enhancing Financial Forecasting through Time Series Prediction Updates with Event-Driven Insights from Large Language Models
Abstract
Time series models, typically trained on numerical data, are designed to forecast future values. These models often rely on weighted averaging techniques over time intervals. However, real-world time series data is seldom isolated and is frequently influenced by non-numeric factors. For instance, stock price fluctuations are impacted by daily random events in the broader world, with each event exerting a unique influence on price signals. Previously, forecasts in financial markets have been approached in two main ways: either as time-series problems over price sequence or sentiment analysis tasks. The sentiment analysis tasks aim to determine whether news events will have a positive or negative impact on stock prices, often categorizing them into discrete labels. Recognizing the need for a more comprehensive approach to accurately model time series prediction, we propose a collaborative modeling framework that incorporates textual information about relevant events for predictions. Specifically, we leverage the intuition of large language models about future changes to update real number time series predictions. We evaluated the effectiveness of our approach on financial market data.
1 Introduction
In the rapidly evolving field of global finance, Artificial Intelligence (AI) plays a pivotal role. In an interconnected world characterized by cross-border trade and expanding economies, marked by intricate relationships and interdependencies, AI is essential for navigating these complexities (cite \@BBNCao (2022)). Predicting stock price movements has been a long-standing focus for the AI community, as the stock market is highly sensitive to macroeconomic events, making accurate forecasting a significant challenge. Historically, research has primarily concentrated on forecasting financial markets using univariate time series prediction methods (cite \@BBNWah and Qian (2002)). Some studies have addressed this issue by employing multivariate time series prediction or by considering the interdependence of price series from different companies to forecast price movements (cite \@BBNWu et al. (2013); Xiang et al. (2022)). While time series models are effective at predicting cyclical trends and overall market growth (cite \@BBNZhou et al. (2022); Woo et al. (2022)), they often fail to capture the impact of sequential financial events. Predictions that do not consider such events tend to be less precise. The current work explores time series prediction of stock prices in a multi-modal setting that incorporates both text and time series data, where the textual description of an event is considered for short-term price prediction.
Event-driven stock sentiment prediction primarily focuses on anticipating how an event will affect stock prices, typically classifying the impact into discrete labels such as increase, decrease, or no noticeable change (cite \@BBNDing et al. (2014, 2016)). Some approaches incorporate historical price sequences to forecast whether prices will rise or fall (cite \@BBNSawhney et al. (2020)). However, the effects of an event may span several days, with varying rates of price changes. A simple sentiment label may not be sufficient to capture this complexity. Therefore, instead of assigning a limited number of sentiment polarities to an event, we model the effects of the event in terms of change directions with associated real values. Our current work investigates methods to convert market excitement related to events into real-valued stock prices over the subsequent days. We are motivated by the fact that forecasting an event’s influence on stock prices over an extended period is beneficial for devising effective intervention strategies (cite \@BBNPricope (2021)).
Leveraging the capability of short-term market excitement, large language models (LLMs) could excel in intuitively predicting future changes based on specific events (cite \@BBNLopez-Lira and Tang (2023)). LLMs like ChatGPT are particularly adept at capturing the finer nuances in stock-specific news texts and accurately predicting daily stock market returns due to their superior language understanding capabilities. (cite \@BBNLopez-Lira and Tang (2023)) also highlight the limitations of basic models like BERT in natural language understanding. Our objective is to explore the ability of LLMs to anticipate changes across multiple time points and represent these as distinct labels corresponding to different future time spans. By "time span," we refer to the period of short-term excitement in the market. Additionally, we aim to examine how these insights can inform adjustments in predictions within time-series models.
In our current research, we integrate multivariate time series data with textual information from stock specific news events to forecast how events either enhance or diminish signals in stock prices relative to the overall trend. This particular scenario we are trying to address is depicted in the Figure 1. Initially, we train multivariate time series models to predict individual stock prices. Drawing inspiration from state change models, we conceptualize market excitement following an event as shifts in the stock state (cite \@BBNBosselut et al. (2017)). To accomplish this, we leverage the event-based insights generated as discrete labels by a Large Language Model regarding the price changes for the next time points following an event occurrence. We utilize these stock state changes to anticipate the increase or decrease in a stock’s price beyond what is projected by the time series model. Following this, we combine the time series model’s predictions with the event-induced changes predicted by the state change model to refine our forecasts. To the best of our knowledge, we are the first to develop a scheme for predicting short-term excitement in stock price time series. We are introducing a novel scheme for short-term excitement prediction in stock price time series, utilizing a Large Language Model to forecast sequences of discrete labels representing event-induced price changes over time.
2 Related Work
Methods for Time Series Analysis.
Recent advancements in deep learning architectures, such as Long Short-Term Memory (LSTM) networks (cite \@BBNHochreiter and Schmidhuber (1997)), Gated Recurrent Units (GRU) (cite \@BBNChung et al. (2014)), and transformers (cite \@BBNVaswani et al. (2017)), have demonstrated significant capabilities in capturing complex temporal relationships within time series data. Various transformer models have been proposed (cite \@BBNLi et al. (2019); Zhou et al. (2021a); Wu et al. (2021); Zhou et al. (2022); Liu et al. (2021)) for forecasting time series, often designing novel attention mechanisms to handle longer sequences and using point-wise attention, which can overlook the importance of patches. Although Triformer (cite \@BBNCirstea et al. (2022)) introduces patch attention, it does not use patch inputs. Patch Time Series Transformer (Patch TST) (cite \@BBNNie et al. (2022)) was the first transformer model to use patches as inputs, capturing the semantic coherence among neighboring patches. However, these techniques cannot be directly adapted to a multimodal setting involving textual information. Our current work investigates time series prediction in a multimodal setting, comprising both time series and textual information.
Time Series Analysis for Stock Prediction.
Several time series analysis methods and machine learning techniques can be applied for stock prediction. These include ARIMA models, Exponential Smoothing State Space models (ETS) (cite \@BBNBrown (1956)), and machine learning techniques such as linear regression, decision trees, random forest, SVM, gradient boosting, Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models (cite \@BBNTse and Tsui (2002); Engle (2002)), and ensemble methods involving multiple models. (cite \@BBNHu et al. (2018)) developed a hybrid attention mechanism to predict stock market movements using news articles, while BERT representations have been used to encode texts for the FEARS index (cite \@BBNDa et al. (2011)) in predicting movements in the S&P 500 index (cite \@BBNYang et al. (2019)). However, these techniques are typically adapted to handle information derived from a sequence of financial events, which can result in inaccurate predictions during unforeseen events that impact financial decisions. Our approach models time series prediction in a multimodal setting, where predictions are evaluated in the context of specific events.
NLP for Finance. Financial services have always been tightly regulated by governments due to their pervasive impact on the masses. However, following liberalization and the easing of regulations, financial technology (FinTech) has emerged as one of the top business avenues in the last decade. (cite \@BBNChen et al. (2020)) highlights the application areas of NLP in the finance domain. Financial institutions use end-to-end transformer models to scan and extract financial events from various news articles and financial announcements (cite \@BBNZheng et al. (2019)), evaluating the debt-paying ability of corporate customers. Online forums, blogs, and social media posts are monitored to extract sentiment, which is then used to predict company sales using model-agnostic meta-learning methods (cite \@BBNLin et al. (2019); Finn et al. (2017)). Similarly, insurance companies track daily posts from customers to detect and initiate early treatment of diseases (cite \@BBNLosada et al. (2019); Burdisso et al. (2019)), mitigating the chances of hazards. Social media posts also serve as indicators for stock recommendations (cite \@BBNTsai et al. (2019)). Most of these works are formulated as simple sentiment label predictions, which may not fully capture the complexity of financial events. Therefore, instead of assigning a limited number of sentiment polarities to an event, we model the effects of the event in terms of change directions with associated real values. Our current work investigates methods to convert market excitement related to events into real-valued stock prices over the subsequent days.
3 TimeS: Overall Method
Our objective is to forecast the impact of an event on the price signal of a stock for the next time units and adjust the prediction of our time series model accordingly. Let’s break down the task into three steps.
-
1.
-
2.
-
3.
Where, represents the time series function which takes the historic price of a specific stock for the previous time points as an argument and forecasts its future values for time units. denotes a function predicting the impact of an event on the price of stock for the subsequent time units from the point of occurrence of the event. Finally, signifies an update function that takes outputs from and , adjusting the time signal for the upcoming time- steps by amplifying or attenuating it. we commence by training a dedicated time-series model, denoted as , for each individual stock . This model is designed to project the trajectory and expansion of the stock over the subsequent days, leveraging prices derived from the preceding days as its input. Central to our approach is the utilization of function within the problem formulation, tasked with assessing the influence of specific event, represented as , on the market sentiment surrounding stock . We conceptualize this process as a state transition problem, aimed at depicting the stock’s behavior over the ensuing days following the occurrence of an event. Within this framework, we quantify the extent of amplification or attenuation in the stock price for each future day, predicated on its corresponding stock state. The state transition and prediction are guided by the intuition of an LLM regarding the patterns of future price changes of the stock within the context of the event. Following this assessment, we implement an update mechanism denoted as to refine the predictions generated by the time-series model, integrating insights into amplification or attenuation derived from the preceding analysis. Notably, while each stock is assigned its own , the other components remain consistent across all stocks. The rationale behind this strategic design choice will be explained in subsequent discussions.
3.1 :Time Series Model
Time series models are trained to predict the values for next time points by taking previous time point values. Our time series model can be represented as follows.
(1) |
Where is the price of the stock for next time points from the current time . is a multivariate sequence of historic data of previous time points. The multivariate sequence contains a parallel sequence such as stock prices of the stock, different index values, or exchange rates which can play a role in modeling general market tendencies and its effect on price of .
![Refer to caption](x1.png)
3.1.1 :Stock state computation using Indicators Predicted by large Language Models
LLMs trained on text data could intuitively grasp stock price movements across various future time spans, albeit without predicting exact values. For the purpose we fine-tune large language models to predict stock predict stock price trend as discrete labels containing the intuition of large models regarding price change of stock for next days as follows,
(2) |
The process of fine-tuning to produce these price change labels is explained in the Appendix D. We calculate the stock state transition using a Gated Recurrent Unit initialized with the embedding of the stock , which takes the corresponding LLM-predicted label at each time-step to produce temporal state of the stock .
Amplification Prediction using Temporal stock State
The time series can be viewed as a random walk in the 2D grid as shown in Figure 3. At any point of time, it takes any of the three directions namely increase, decrease, or stay steady which could be represented by direction indicator values 1, -1, 0 respectively. We use the stock states compute the probability for time series to take each of the directions, increase, steady, or decrease. The expected value direction indicator is computed using these probabilities represent the amplification/attenuation value which can be subsequently used to update the time series. With this view in mind, we compute the price amplification/attenuation from stock state at time step as follows.
(3) | ||||
(4) |
Where is a parameter matrix and belongs to which contains the probablity for increase, decrease, and neutral. is the amplification or attenuation value. We concatenate the to to form the amplification vector .
3.2 U: Updating time Series Price Predictions
Once we compute , we use it to update the values predicted by time series model . We take a simple linear transformation of the concatenated vector to predict the update price of stock in the context of the event .
(5) |
is the price predictions by the time series model as represented by the Equation 1 and is a hyper-parameter.
Loss:
We opt for Mean Squared Error (MSE) loss to quantify the disparity between the prediction and the actual values. The loss is computed as the MSE loss between updated price and expected price .
4 Experiments
Our primary objective is to enhance time series predictions in response to events using a large language model (LLM). As illustrated in Figure 2, our method integrates several key components: a time series model, an LLM trained to predict stock price changes over various future time spans as discrete labels, and mechanisms for updating the time series based on the LLM’s predictions. This section details the data, settings, and results for the following tasks: 1) Sub Task1: Training the time series models, 2) Sub Task2: Fine-tuning the LLM for price change prediction, and 3) Main Task1: Overall approach for updating the time series using the LLM’s predicted labels, as depicted in Figure 2.
4.1 Datasets
ExtEDT: Extended EDT Dataset with News Events and Time Series Data
Our experimentation utilized the EDT Dataset, serving as the foundational resource (cite \@BBNZhou et al. (2021b)). This dataset comprises stock tickers, with each entry corresponding to a specific company’s stock, accompanied by a textual description of a company-related news event and the event’s date of occurrence. To enable a detailed evaluation, we partitioned the dataset into small-cap, mid-cap, and large-cap stocks. In order to tailor the dataset to our task, we retrieved the closing price of each stock for the subsequent days following the event using the Yahoo Finance API 111https://python-yahoofinance.readthedocs.io/en/latest/api.html. Additionally, we automatically annotated the price change labels for future days, for each event within every record, adhering to the methodology outlined in Appendix D. The EDT dataset is divided into training, validation, and test sets, containing 46397, 5210, and 5263 samples. To create these partitions, we allocate ticker-wise samples in an 80:10:10 ratio.
Dataset: Training Time Series Models
The focus of the present paper is on updating Time series models trained on long-term stock price sequences. As previously stated, we chose to train separate time series models for each stock available in the EDT dataset. To achieve this, we gathered time series data of closing prices for each stock over the past 30 years, along with the corresponding values for the dollar exchange index and NASDAQ exchange index using yahoo Finance API222https://python-yahoofinance.readthedocs.io/en/latest/api.html. For every stock, we amalgamated these sequences to form a multivariate time series. This multivariate sequence is then divided into different source and target sequences with fixed source length, target length, and stride values. The input comprises the NASDAQ index, dollar exchange rate, and stock price sequence, while the output is a univariate sequence of stock prices. More details of training individual time series models can be found in Appendix A.1
4.2 Fine tuning LLM for Price Change Label Prediction
This task is modeled as a sequence-to-sequence prediction task where the input is a news event about a stock prepended with the ticker’s name and the output is a sequence of price change labels. Each price change label is discrete in nature where we capture the type of the change with its actual value. The type of change can belong to any of two categories: increase (INC) and decrease (DEC). The actual change value is represented in terms of integers instead of real values. For cases where there is no change in the values, we consider that as an increment (INC) with a zero change value. One example from our dataset is shown in Table 7.
4.2.1 Settings:
We leverage three variants of T5 (Text-To-Text-Transfer-Transformer) (cite \@BBNRaffel et al. (2020)) models for the price change predictions. T5’s unified framework excels at transferring knowledge from various tasks via pre-training on a massive dataset. We restrict ourselves from using newer LLMs (cite \@BBNTouvron et al. (2023a, b); Jiang et al. (2023, 2024); Le Scao et al. (2023); Li et al. (2023); Zhang et al. (2022)) to avoid the potential effects of data contamination as these newer models might report overestimated performance in the test sets. We fine tune 3 variants of T5: T5-Base, T5-Large, and T5-3B. For the T5-Base model, we fine tune all its parameters whereas for larger models we fine tune on reduced sets of parameters. We freeze all the encoders layers of the T5-Large model whereas 8-bit low rank adaptation (cite \@BBNHu et al. (2021)) is applied to the T5-3B model.
4.2.2 Evaluation and Results
We evaluate the predictions at two levels. The first one deals with the performance of predicting the change type accurately whereas the second level evaluates the prediction of values. Instead of exactly matching the values, we employ a mechanism of window of values matching for this. We label a prediction correct if the value lies with in a window around the exact value. We use a windows of length 5 for the evaluation of values. For a value v, the window of length 5 is represented as the range v-5..v+5. The change type is evaluated using micro F1 score and the details of the performance of different T5 variants are presented in Table 1.
Model | Validation | Test |
---|---|---|
T5-Base | 0.68 | 0.65 |
T5-Large | 0.63 | 0.61 |
T5-3B | 0.64 | 0.61 |
The F1-scores of predicting the actual change values with different window sizes is reported in Table 2.
Model | Validation | Test |
---|---|---|
T5-Base | 0.55 | 0.56 |
T5-Large | 0.55 | 0.56 |
T5-3B | 0.55 | 0.55 |
4.3 Main Task: Updating Time Series Prediction with Insights from LLM
4.3.1 Baseline Settings
We compared our approach with several state-of-the-art time series models, including variants of Patch-TST and D-Linear, to assess their effectiveness in updating time series predictions. Specifically, we adapted the Patch-TST+W and D-Linear+W variants for multi-channel input to single-channel output prediction (see Appendix A.1 for more details). Additionally, we explored a class of models based on lightweight natural language processing techniques used for stock sentiment predictions. To facilitate a fair comparison, we modified these models to create a time series-specific version that predicts future time-step values instead of sentiment labels. For more information on these settings, please refer to Appendix B.
Setting | Small-Cap | Mid-Cap | Large-Cap | |||
---|---|---|---|---|---|---|
RMSE | MAE | RMSE | MAE | RMSE | MAE | |
DLinear | 0.13 | 0.30 | 0.141 | 0.270 | 0.122 | 0.261 |
PatchTST/5 | 0.190 | 0.35 | 0.190 | 0.280 | 0.162 | 0.271 |
SentiEvent | 0.180 | 0.37 | 0.171 | 0.370 | 0.172 | 0.392 |
T5-base+ TimeS | 0.120 | 0.205 | 0.101 | 0.206 | 0.108 | 0.190 |
T5-Large+TimeS | 0.120 | 0.225 | 0.110 | 0.230 | 0.106 | 0.210 |
T5-3b+TimeS | 0.121 | 0.227 | 0.113 | 0.231 | 0.124 | 0.216 |
T5-base+ TimeL | 0.123 | 0.270 | 0.135 | 0.25 | 0.127 | 0.25 |
T5-Large+ TimeL | 0.127 | 0.290 | 0.136 | 0.28 | 0.120 | 0.270 |
T5-3b+ TimeL | 0.126 | 0.293 | 0.137 | 0.27 | 0.123 | 0.270 |
4.3.2 Model Variants
We combined our approach depicted in Figure 2, for stock state computation and amplification prediction with different finetuned variants of T5 model mentioned in Section 4.2. For , we set the learning rate to , using the Adam optimization algorithm (cite \@BBNKingma and Ba (2014)). During the training of , the pretrained time series component was frozen. In time series, simpler models made surprising models as in the case of D-Linear. Inspired by this scheme created simpler model TimeL, without including stock change computation. This approach re-approximates the original percentage change from discrete labels predicted by the LLM component and this sequence of values are used for updating time series predictions. Details of this setting can be seen in Appendix 3. This approach was also tested with different variants of T5.
5 Results
We assessed the primary task of updating time series using Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) as metrics. RMSE measures the square root of the average squared differences between predicted and actual values, while MAE represents the average of the absolute differences between predicted and actual values. The results of the updated price prediction, in the context of an event, are presented in the Table 3. Clearly, updates based on LLM-predicted indicators have improved the accuracy of the time-series predictions. In contrast, performed poorly compared to the LLM-based models. This disparity is likely due to the sophisticated background understanding and enhanced text comprehension capabilities of LLMs in the financial domain. The TimeS settings outperformed the TimeL settings. TimeS computes amplification in a probabilistic space, whereas TimeL approximates actual values of amplification from LLM-predicted labels. This approximation limits TimeL’s ability to detect errors in LLM predictions and make the necessary adjustments in amplification computation.
6 Ablation Study
6.1 Ablation Study: Performance T5 During Increment and Decrement
Model | Validation | Test | ||||
---|---|---|---|---|---|---|
DEC | INC | Overall | DEC | INC | Overall | |
T5-Base | 0.56 | 0.75 | 0.68 | 0.53 | 0.73 | 0.65 |
T5-Large | 0.42 | 0.72 | 0.63 | 0.39 | 0.71 | 0.61 |
T5-3B | 0.5 | 0.71 | 0.64 | 0.47 | 0.7 | 0.61 |
From Table 4, it is evident that all the models perform better in predicting the INC label while DEC label prediction task is challenging for them.
Data | Model | Low Change | Medium Change | Large Change | ||||||
---|---|---|---|---|---|---|---|---|---|---|
#Samp | INC | DEC | #Samp | INC | DEC | #Samp | INC | DEC | ||
Test | T5-Base | 10811 | 0.71 | 0.49 | 3195 | 0.75 | 0.59 | 1780 | 0.78 | 0.65 |
T5-Large | 0.7 | 0.34 | 0.72 | 0.44 | 0.74 | 0.55 | ||||
T5-3B | 0.68 | 0.44 | 0.72 | 0.52 | 0.73 | 0.58 | ||||
Val | T5-Base | 10453 | 0.73 | 0.5 | 3334 | 0.79 | 0.65 | 1843 | 0.81 | 0.69 |
T5-Large | 0.7 | 0.34 | 0.74 | 0.5 | 0.75 | 0.58 | ||||
T5-3B | 0.7 | 0.46 | 0.73 | 0.56 | 0.75 | 0.62 |
Table 5 depicts a picture of the performance in terms of different magnitude ranges of change values for change type predictions. We denote change values in the range of 0..15 as Low, 16..31 as Medium, and rest as Large. We can observe that the performance of all the models to predict the DEC tag increase as we move from the Low to Large range of change values while that of INC. This may result from the low sensitivity of T5 models towards events which leads to minimal changes in the decrement direction. For a detailed analysis, please refer to Appendix E.
6.2 Ablation Study : Performance During Different Range of Price Variations
Dataset | Model | Low Change | Medium Change | Large Change | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Window Size | Window Size | Window Size | ||||||||
5 | 10 | 15 | 5 | 10 | 15 | 5 | 10 | 15 | ||
Test | T5-Base | 0.71 | 0.92 | 0.96 | 0.28 | 0.58 | 0.85 | 0.1 | 0.18 | 0.28 |
T5-Large | 0.77 | 0.97 | 0.99 | 0.17 | 0.44 | 0.78 | 0.03 | 0.06 | 0.13 | |
T5-3B | 0.72 | 0.92 | 0.96 | 0.26 | 0.55 | 0.82 | 0.09 | 0.15 | 0.26 | |
Validation | T5-Base | 0.72 | 0.91 | 0.96 | 0.26 | 0.57 | 0.83 | 0.1 | 0.18 | 0.27 |
T5-Large | 0.77 | 0.96 | 0.99 | 0.17 | 0.44 | 0.78 | 0.05 | 0.08 | 0.14 | |
T5-3B | 0.72 | 0.92 | 0.96 | 0.26 | 0.54 | 0.82 | 0.08 | 0.16 | 0.26 |
Table 6 represents the prediction accuracies for change values belonging to different categories as mentioned above. It is challenging for all the models to accurately predict the change values when change values are large while smaller change values are predicted with high precision. However, T5 models appear to struggle with anticipating price fluctuations during extreme shifts. For case studies on the prediction of price changes and subsequent updates to time series data, please see Appendix E.
7 Limitations
To avoid data contamination, we restrict ourselves from using newer LLMs. This results in sub-optimal predictions for change types and actual change values. The test data and the validation data contains news articles focusing on trading events from PRNewswire and Businesswire websites in the financial year of 2020-21. As T5 models were released before this duration, we could safely assume that training data of T5 did not overlap with the data considered in this research work. However, capabilities have improved tremendously in the recent past.
8 Conclusion
The paper introduces a multi-modal framework for modeling stock price time-series within the context of financial events. This framework integrates insights from large language models (LLMs), using predicted price changes as discrete labels to update the time series. This approach improves the accuracy of stock price forecasts during financial events. The paper also presents various experimental results demonstrating the ability of LLMs to anticipate price changes.
References
- Bosselut et al. (2017) Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. Simulating action dynamics with neural process networks, 2017.
- Brown (1956) Robert G Brown. Exponential smoothing for predicting demand. Little, 1956.
- Burdisso et al. (2019) Sergio G Burdisso, Marcelo Errecalde, and Manuel Montes-y Gómez. A text classification framework for simple and effective early depression detection over social media streams. Expert Systems with Applications, 133:182–197, 2019.
- Cao (2022) Longbing Cao. Ai in finance: challenges, techniques, and opportunities. ACM Computing Surveys (CSUR), 55(3):1–38, 2022.
- Chen et al. (2020) Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. Nlp in fintech applications: past, present and future. arXiv preprint arXiv:2005.01320, 2020.
- Chung et al. (2014) Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Cirstea et al. (2022) Razvan-Gabriel Cirstea, Chenjuan Guo, Bin Yang, Tung Kieu, Xuanyi Dong, and Shirui Pan. Triformer: Triangular, variable-specific attentions for long sequence multivariate time series forecasting–full version. arXiv preprint arXiv:2204.13767, 2022.
- Da et al. (2011) Zhi Da, Joseph Engelberg, and Pengjie Gao. In search of attention. The journal of finance, 66(5):1461–1499, 2011.
- Ding et al. (2014) Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. Using structured events to predict stock price movement: An empirical investigation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1415–1425, 2014.
- Ding et al. (2016) Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. Knowledge-driven event embedding for stock prediction. In Proceedings of coling 2016, the 26th international conference on computational linguistics: Technical papers, pages 2133–2142, 2016.
- Engle (2002) Robert Engle. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business & Economic Statistics, 20(3):339–350, 2002.
- Finn et al. (2017) Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
- Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Hu et al. (2021) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Hu et al. (2018) Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. Listening to chaotic whispers: A deep learning framework for news-oriented stock trend prediction. In Proceedings of the eleventh ACM international conference on web search and data mining, pages 261–269, 2018.
- Jiang et al. (2023) Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Jiang et al. (2024) Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
- Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Le Scao et al. (2023) Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, et al. Bloom: A 176b-parameter open-access multilingual language model. 2023.
- Li et al. (2019) Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.
- Li et al. (2023) Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, and Yin Tat Lee. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463, 2023.
- Lin et al. (2019) Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Zihan Liu, Yan Xu, Cong Gao, and Pascale Fung. Learning to learn sales prediction with social media sentiment. In Proceedings of the First Workshop on Financial Technology and Natural Language Processing, pages 47–53, 2019.
- Liu et al. (2021) Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International conference on learning representations, 2021.
- Lopez-Lira and Tang (2023) Alejandro Lopez-Lira and Yuehua Tang. Can chatgpt forecast stock price movements? return predictability and large language models. arXiv preprint arXiv:2304.07619, 2023.
- Losada et al. (2019) David E Losada, Fabio Crestani, and Javier Parapar. Overview of erisk at clef 2019: Early risk prediction on the internet (extended overview). CLEF (Working Notes), 2019.
- Nie et al. (2022) Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2022.
- Pricope (2021) Tidor-Vlad Pricope. Deep reinforcement learning in quantitative algorithmic trading: A review, 2021.
- Raffel et al. (2020) Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020.
- Sawhney et al. (2020) Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, and Rajiv Shah. Deep attentive learning for stock movement prediction from social media text and company correlations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8415–8426, 2020.
- Touvron et al. (2023a) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
- Touvron et al. (2023b) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
- Tsai et al. (2019) Yu-Che Tsai, Chih-Yao Chen, Shao-Lun Ma, Pei-Chi Wang, You-Jia Chen, Yu-Chieh Chang, and Cheng-Te Li. Finenet: a joint convolutional and recurrent neural network model to forecast and recommend anomalous financial items. In Proceedings of the 13th ACM conference on recommender systems, pages 536–537, 2019.
- Tse and Tsui (2002) Yiu Kuen Tse and Albert K C Tsui. A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business & Economic Statistics, 20(3):351–362, 2002.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Wah and Qian (2002) Benjamin W Wah and Minglun Qian. Constrained formulations and algorithms for stock-price predictions using recurrent fir neural networks. In AAAI/IAAI, pages 211–216, 2002.
- Woo et al. (2022) Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. Etsformer: Exponential smoothing transformers for time-series forecasting, 2022.
- Wu et al. (2021) Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems, 34:22419–22430, 2021.
- Wu et al. (2013) Yue Wu, José Miguel Hernández-Lobato, and Ghahramani Zoubin. Dynamic covariance models for multivariate financial time series. In International Conference on Machine Learning, pages 558–566. PMLR, 2013.
- Xiang et al. (2022) Sheng Xiang, Dawei Cheng, Chencheng Shang, Ying Zhang, and Yuqi Liang. Temporal and heterogeneous graph neural network for financial time series prediction. In Proceedings of the 31st ACM international conference on information & knowledge management, pages 3584–3593, 2022.
- Yang et al. (2019) Linyi Yang, Ruihai Dong, Tin Lok James Ng, and Yang Xu. Leveraging bert to improve the fears index for stock forecasting. In Proceedings of the First Workshop on Financial Technology and Natural Language Processing, pages 54–60, 2019.
- Zeng et al. (2023) Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023.
- Zhang et al. (2022) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models, 2022.
- Zheng et al. (2019) Shun Zheng, Wei Cao, Wei Xu, and Jiang Bian. Doc2edag: An end-to-end document-level framework for chinese financial event extraction. arXiv preprint arXiv:1904.07535, 2019.
- Zhou et al. (2021a) Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021a.
- Zhou et al. (2022) Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning, pages 27268–27286. PMLR, 2022.
- Zhou et al. (2021b) Zhihan Zhou, Liqian Ma, and Han Liu. Trade the event: Corporate events detection for news-based event-driven trading. arXiv preprint arXiv:2105.12825, 2021b.
newpage
Appendix A Appendix
A.1 Time Series Model
In this section, we describe our adaptations of the PatchTST (cite \@BBNNie et al. [2022]) and D-Linear(cite \@BBNZeng et al. [2023]) time series models for handling multi-channel input to single-channel output.
A.1.1 PatchTST+W
The proposed Transformer-based model for multivariate time series forecasting and self-supervised representation learning utilizes two main methodological components: firstly, the segmentation of time series into subseries-level patches, serving as input tokens for the Transformer model. Secondly, the model adopts a channel-independent approach, where each channel represents a single univariate time series, sharing embedding and Transformer weights across all series. This methodological framework offers advantages such as retaining local semantic information in the embedding, reducing computation and memory usage quadratically, and enabling the model to attend to longer historical contexts. Outputs layers of individual channels are flattened and concatenated to project using a transformation matrix W. We utilized a patch window of 5 and set the learning rate to , employing the Adam optimization algorithm (cite \@BBNKingma and Ba [2014]).
A.1.2 DLinear+W
In this study, the authors challenge the effectiveness of Transformer-based solutions for long-term time series forecasting (LTSF), arguing that while Transformers excel in capturing semantic correlations, their permutation-invariant self-attention mechanism leads to temporal information loss in time series modeling. They propose a simple one-layer linear model, LTSF-Linear, which surprisingly outperforms existing Transformer-based LTSF models across nine real-life datasets, highlighting the importance of preserving temporal relations. The findings suggest a need to reconsider the suitability of Transformer-based approaches for LTSF and other time series analysis tasks, potentially opening up new research directions in the field. Outputs layers of individual channels are flattened and concatenated to project using a transformation matrix W. We set the learning rate to , employing the Adam optimization algorithm (cite \@BBNKingma and Ba [2014]).
Individual Time series models are trained on look back window 30 and prediction length 20.
A.1.3 Why we use Different time series models for different stocks?
Different stocks exhibit unique behaviors and patterns over time, requiring the use of different time series models. This diversity arises from several factors. Firstly, volatility levels vary, with some stocks experiencing frequent and significant price fluctuations, while others remain stable. Secondly, stocks may follow distinct trends, whether upward, downward, or sideways. Additionally, seasonal patterns or cyclical trends, influenced by factors such as weather, holidays, or economic cycles, contribute to the diversity of stock behavior. Moreover, the degree of randomness or noise in stock prices varies among stocks. Furthermore, the liquidity of stocks plays a crucial role, with different levels impacting market behavior. Therefore, selecting appropriate time series models tailored to these factors is essential for effective stock analysis and forecasting.
![Refer to caption](x2.png)
Appendix B SentiEvent: Base Model Settings
In the current section we explain our method serves to calculate the event-induced price amplification levels for stock over the subsequent time steps using a BERT approach. The entire method is depicted in the Figure 2
B.1 :Price Amplification Computation Using Temporal Event Embeddings and Stock States
The impact of an event on a stock’s price tends to fade gradually. This fading effect differs across various stocks and event categories. Hence, in our approach denoted as , we calculate the changes in stock states by considering the temporal representation of the event over the subsequent time units. Rest of the methods explain in detail.
:Computing Stock Specific event representation
Each events impacts different stocks differently and the event details relevant for a different stocks are different. For this reason our method computes stock specific event representation encompassing the relevant information. We encode the event details using Bert model.
To compute the stock specific representation of the event, we use muti- head attention of stock in event bert encodings follows.
(6) |
Where is the embedding of stock ticker of stock from a look up table.
Updating Event Representation for Temporal Information
The effect of an event on a stock changes over time. For this reason, we have to incorporate temporal changes of an event. We compute the temporal representations for for next time units as by adding positional embedding of the corresponding time unit to .
Stock state transition computation and Price fluctuation Predition
We compute the stock state transition using a Gated Recurrent Unit initialized with and takes corresponding temporal event representation at each time- step . Each state is used for price amplification computation and updated prices using Equations 4 and 5. We set the learning rate to , employing the Adam optimization algorithm (cite \@BBNKingma and Ba [2014]).
Appendix C TimeL: A Simpler Approach without Stock States
There are time series models which yielded state of art results with embarrassingly simple one-layer linear models. Inspired by this idea we also include an simple model with temporal stock states computation for computing updated price based on the price change indicator labels predicted by . For this purpose, we use reverse computation of Equations 7 and 8 using the LLM predicted labels to approximate the fractional change in the Equation 7. Such values for the entire label sequence is combined for forming the price amplification sequence. We set the learning rate to , employing the Adam optimization algorithm (cite \@BBNKingma and Ba [2014]).
Appendix D How we train LLM?Converting Price Change Values to Discrete Labels
For each stock-event pairs in our training set we compute discrete labels of their price change using the available price time series data for the stock, for time steps after the event. At any time step label is computed as follows,
(7) |
(8) |
In Equation 7, is the price of the stock at time-step . The Equation 7 computes the percentage of change in price of the stock between time steps and divided by a fractional value and is computed as the floor of the subsequent value. can take negative values as absolute values of price change is not considered during computation. Equation 8 is used assign price change label for the time step . clearly, each percentage of price change in between a fraction value of is project to a single discrete label. For our experiments we set =0.3. ’INC’ and ’DEC’ prefixes indicates whether percentage of change is in increasing or decreasing direction. Using the auto-computed price change labels for all time- steps, an LLM is trained to predict the price change labels for time-steps for stock after the event . To improve predictability, we divide the time steps into three windows, and the maximum change value within each window is taken as for any time- step within the window. For this reason, every time step within a given window receives the same label. Table 7 provides an example of the records used to train the LLM.
Ticker | FNB | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Event |
|
|||||||||
Label Sequence | INC_6 INC_15 INC_10 | |||||||||
Input for TimeS | INC_6 INC_6 INC_6 INC_15 INC_15 INC_15 INC_10 INC_10 INC_10 |
Appendix E CASE STUDIES
Case Study 1, depicted in Figure 5, illustrates a scenario of moderate upward price movement. The accompanying news highlights the company’s victory in a competition, which carries clear positive sentiments. Moreover, the time series updates are nearly accurate. In Case Study 2, also in Figure 6, a pharmaceutical company’s success in a clinical trial is showcased. The market’s high level of excitement can be easily inferred by a Language and Logic Model (LLM). The time series updates in this case closely approximate the trajectory of upward movement. Both Case Studies 3 and (Figures 7)represent instances of partially accurate market predictions. These involve highly volatile stocks, for which the LLM lacks information on volatility during training or inference. Towards the end of the predicted sequence, the updated time series T5+TimeS tends to be biased towards DLinear+W. Moving on to Case Study 5 in Figure 9, the stock under consideration is a low-valued, highly volatile one. The challenge for the LLM lies in accurately identifying the magnitude of price movement due to its ignorance of the stock’s volatility. In Case Study 6, the event concerns operational changes within the company, signaling a potentially risky situation. Consequently, the LLM may predict a negative momentum, and the computed updated time series is nearly accurate. In Case Study 7 (Figure 11), the event revolves around a lawsuit against the company. With enough instances in the training set, the LLM can readily anticipate the magnitude of the negative trend. Finally, in Case Study 8 (Figure 12), the news relates to the quarterly results of a company. Initially appearing positive, the LLM predicts positive labels. However, the company’s performance falls short in comparison to previous quarters. The LLM’s limitations become apparent here, as it lacks the necessary context and capability for such numerical comparisons.