-
CiteScore
3.77
Impact Factor
Volume 2, Issue 2, IECE Transactions on Emerging Topics in Artificial Intelligence
Volume 2, Issue 2, 2025
Submit Manuscript Edit a Special Issue
IECE Transactions on Emerging Topics in Artificial Intelligence, Volume 2, Issue 2, 2025: 68-80

Open Access | Research Article | 21 May 2025
Anomaly Detection and Risk Early Warning System for Financial Time Series Based on the WaveLST-Trans Model
1 Meta Platforms Inc., Seattle, WA 98109, United States
2 Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92555, United States
3 College of Computer Sciences, Northeastern University, Cupertino, CA 95014, United States
4 Business Analytics, Washington University in St. Louis, St. Louis, MO 63130, United States
5 Department of Computer Science, University of Southern California, Los Angeles, CA 90007, United States
6 Department of Mathematics, Northeastern University, San Jose, CA 95131, United States
† Tian Su and Runlong Li contributed equally to this work
* Corresponding Author: Tian Su, [email protected]
Received: 08 April 2025, Accepted: 15 May 2025, Published: 21 May 2025  
Abstract
Abnormal fluctuations in financial markets may signal significant risks or market manipulation, so efficient time series anomaly detection methods are crucial for risk management. However, traditional statistical methods (e.g., ARIMA, GARCH) are difficult to adapt to the nonlinear and multi-scale characteristics of financial data, while single deep learning models (e.g., LSTM, Transformer) have limitations in capturing long-term trends and short-term fluctuations. In this paper, we propose WaveLST-Trans, a financial time series anomaly detection model based on the combination of wavelet transform (WT), LSTM and Transformer. The model first uses wavelet transform to perform multi-scale decomposition, extracts low-frequency trend and high-frequency fluctuation features, and feeds them into LSTM (to learn the long-term dependence) and Transformer (to capture local mutations) respectively, and finally integrates the multi-scale information through the feature fusion layer, which improves the detection accuracy and robustness. The experiments are conducted on Binance (cryptocurrency market) and S&P 500 (stock market) datasets, and the results show that WaveLST-Trans mostly outperforms the mainstream models in terms of F1-score, recall, and precision, and improves the detection performance by 3% and 10% in high-frequency market and long-term trend market, respectively. This study provides a more accurate and stable anomaly detection method for financial market risk management, which can be widely used in market regulation, quantitative trading and financial risk control, helping to improve the security and stability of the financial system.

Keywords
financial anomaly
anomaly detection
LSTM
transformer
wavelet transform

1. Introduction

The abnormal volatility in financial markets can have a profound impact on the global economy and financial systems, such as market crashes, irregular trading, and financial fraud [28]. These unexpected events are often accompanied by drastic price fluctuations and changes in trading volume, which pose significant risks to investors, regulatory bodies, and financial institutions [13, 27]. Therefore, accurately identifying abnormal behaviors in financial time series is of paramount importance for risk management and market stability. However, financial market data exhibit high non-linearity, non-stationarity, and multi-scale characteristics, and abnormal events are often hidden within complex time series patterns, making traditional anomaly detection methods insufficient for practical needs [21]. The construction of efficient anomaly detection systems that can identify abnormalities in real time within the dynamic environment of financial markets remains a critical issue in the field of financial technology [2].

In recent years, deep learning has made significant advancements in time series analysis, offering new approaches for financial anomaly detection [6, 14, 20]. Long Short-Term Memory (LSTM) networks have been widely adopted for financial market trend modeling due to their superior performance in handling temporal dependencies [24]. However, LSTM models still face limitations in capturing long-term dependencies and detecting sudden anomalies. In contrast, the self-attention mechanism in Transformer models enables the modeling of global information, making them particularly effective in capturing short-term fluctuations and local anomaly patterns [11]. Therefore, combining LSTM and Transformer models can create a complementary framework for modeling both long- and short-term dependencies [19]. Additionally, wavelet transform (WT), as an effective signal processing tool, enables multi-scale decomposition of time series, facilitating the extraction of features across different frequencies and enhancing the model's capability in identifying both long-term market trends and short-term anomalies [33].

Despite significant progress in financial anomaly detection, several challenges remain [34]. Many traditional statistical models, such as ARIMA and GARCH, struggle to capture the nonlinear and multi-scale characteristics of financial markets, limiting their effectiveness in anomaly detection [3]. Single deep learning models often fail to simultaneously account for both long-term dependencies and short-term fluctuations in time series, making it difficult to accurately characterize the dynamic nature of financial markets [4]. Moreover, financial anomalies are often abrupt and transient, posing challenges in effectively extracting key anomaly features from time series while minimizing false positives and false negatives. To address these issues, this study introduces a model that integrates wavelet transform, LSTM, and Transformer, aiming to enhance the comprehensiveness and accuracy of financial time series anomaly detection.The main contributions of this paper are as follows:

  • Proposing the WaveLST-Trans model, which integrates Wavelet Transform, LSTM, and Transformer to capture multi-scale features, enabling the separate processing of long-term trends and short-term fluctuations, thereby improving anomaly detection accuracy.

  • Designing a feature fusion strategy, where Wavelet Transform decomposes the time series, feeding low-frequency components into LSTM to extract long-term dependencies, and high-frequency components into Transformer to identify localized anomalies, achieving efficient multi-scale feature integration.

  • Conducting experiments on multiple financial market datasets to validate the superiority of WaveLST-Trans over traditional statistical, machine learning, and deep learning models, and performing ablation studies to analyze module contributions, ensuring model effectiveness and stability.

The structure of this paper is organized as follows: Section 2 reviews related research on financial time series anomaly detection, covering traditional statistical methods, the application of deep learning in finance, and the development of hybrid models. Section 3 provides a detailed description of the WaveLST-Trans model, including data preprocessing, wavelet transform, and the LSTM-Transformer module. Section 4 presents the experimental design, covering datasets, experimental setup, and validating the model's performance through comparative and ablation experiments. Section 5 concludes the study and discusses future research directions.

2. Related Work

2.1 Traditional Time Series Anomaly Detection Methods

In the field of financial time series anomaly detection, traditional methods mainly rely on statistical models and machine learning algorithms. These approaches have achieved certain success in early research, but often exhibit limitations when faced with the complex dynamics of financial markets [25]. Among them, the Autoregressive Integrated Moving Average (ARIMA) model is a widely used statistical method for time series forecasting. It detects anomalies by modeling the linear dependencies of data, but its adaptability to nonlinear and abrupt changes is relatively poor [31]. The Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model and its variants (such as EGARCH and TGARCH) focus on volatility modeling in financial markets. These models effectively describe the heteroscedasticity of financial time series, but still suffer from a delay in identifying sudden anomalies [31].In addition to statistical models, Isolation Forest, an unsupervised anomaly detection method, identifies outliers by random partitioning of data points. It is suitable for high-dimensional data but does not account for temporal dependencies [29]. The Local Outlier Factor (LOF) calculates local abnormality based on data density distributions. While it is effective for detecting certain patterns of anomalies, it is highly sensitive to noisy data [22]. Support Vector Machines (SVM) and its extended versions, such as One-Class SVM, distinguish between normal and anomalous data by constructing hyperplanes in high-dimensional space. While these methods are suitable for static data, they perform poorly when handling dynamic time series anomalies [18].

In comparison to the aforementioned methods, the WaveLST-Trans model proposed in this paper not only captures the long-term dependencies of time series but also models short-term abrupt anomalies. Furthermore, by using wavelet transform to extract multi-scale features, it improves the accuracy and stability of anomaly detection. By combining the strengths of LSTM and Transformer, WaveLST-Trans complements each other in modeling temporal dependencies and extracting local features, while leveraging wavelet transform to enhance the model's adaptability to different time scales, thus enabling a more comprehensive identification of abnormal events in financial markets.

2.2 Deep Learning in Financial Risk Management

In recent years, deep learning has been widely applied in financial time series anomaly detection and risk management, demonstrating superior performance compared to traditional methods. Long Short-Term Memory (LSTM) networks, which can model long-term dependencies, have shown excellent results in financial market trend prediction and anomaly detection [23]. However, LSTM is susceptible to the vanishing gradient problem when processing long time series and has limited ability to capture short-term abrupt anomalies. Gated Recurrent Units (GRU), as a simplified version of LSTM, reduce computational complexity and provide comparable performance in some financial applications, but they still struggle to effectively model local mutations in time series [23]. Convolutional Neural Networks (CNNs), typically used for computer vision tasks, have also been applied to financial time series feature extraction in recent years, such as learning market patterns through one-dimensional convolutional layers [35]. However, CNNs rely on fixed-size receptive fields, making it difficult to capture long-range dependencies, and they perform poorly with non-stationary time series data. Recently, Transformer models have gained attention due to their self-attention mechanism, which excels at globally modeling the relationships between data points and is particularly adept at capturing local anomaly patterns. However, due to the lack of implicit modeling of temporal information, Transformer models may have limitations in modeling long-term trends [30]. Moreover, some hybrid models have emerged, such as combining LSTM with CNN to use CNN for local feature extraction, followed by LSTM to model temporal dependencies [10], or employing variant Transformers (such as Informer and Time Transformer) to optimize time series modeling efficiency. Nonetheless, these models still face issues such as high computational complexity and limited generalization capabilities. In comparison to the aforementioned deep learning methods, the WaveLST-Trans model proposed in this paper utilizes LSTM to model the long-term dependencies in financial time series, while employing Transformer to enhance the model's ability to capture short-term anomalous fluctuations, overcoming the limitations of a single deep learning method. Additionally, the introduction of wavelet transform allows the model to learn time series features at different scales, thereby improving the model's adaptability to market trends and local mutations, resulting in more accurate and robust anomaly detection.

fig1.jpg
Figure 1 WaveLST Trans model architecture.

3. Methodology

3.1 Overall Model Architecture

The WaveLST-Trans model consists of a wavelet transform preprocessing module, an LSTM long-term dependency modeling module, a Transformer short-term fluctuation capturing module, and a feature fusion layer. Financial time series typically exhibit both long-term trends and short-term fluctuations, making it challenging for a single model to capture these features simultaneously. This limitation affects the accuracy of anomaly detection. To address this issue, WaveLST-Trans first applies wavelet transform for multi-scale decomposition of time series data, breaking down the raw data into low-frequency components (long-term trends) and high-frequency components (short-term fluctuations). The low-frequency components are fed into the LSTM network to capture long-term dependencies, while the high-frequency components are input into the Transformer module, where the self-attention mechanism extracts short-term fluctuation features. The outputs of LSTM and Transformer are integrated in the feature fusion layer, ensuring that the model effectively learns anomaly patterns across different time scales. After passing through a fully connected layer, the anomaly detection module computes anomaly scores, ultimately identifying anomalous time steps. The overall architecture of the model is illustrated in Figure 1.

WaveLST-Trans integrates LSTM and Transformer to achieve precise multi-scale modeling, enhancing the accuracy and robustness of financial time series anomaly detection. LSTM is responsible for capturing long-term trends, leveraging its gating mechanism to model temporal dependencies, making it suitable for detecting structural market changes such as sustained price increases or declines. In contrast, Transformer utilizes self-attention to efficiently extract short-term fluctuations, identifying localized sudden anomalies such as market shocks or short-term trading irregularities. The feature fusion layer combines the outputs of LSTM and Transformer, enabling the model to learn both long-term dependency patterns and short-term mutation features, effectively covering anomalies across different time scales [16]. Compared to traditional methods, WaveLST-Trans offers stronger multi-scale feature representation, overcoming the limitations of ARIMA and GARCH on non-stationary data and surpassing Isolation Forest and Local Outlier Factor (LOF) in modeling time dependencies. Additionally, compared to standalone LSTM or Transformer models, WaveLST-Trans combines long-term memory capabilities with short-term anomaly detection, making it suitable for various market environments.

In the computational process, the wavelet transform module first decomposes the raw time series into different frequency components, where low-frequency signals capture long-term market trends, and high-frequency signals reflect short-term fluctuations and local anomalies [26]. The low-frequency signals are fed into LSTM to learn long-term dependencies, while the high-frequency signals are input into Transformer to extract short-term fluctuation features. In the feature fusion layer, the feature vectors from LSTM and Transformer are either concatenated or adaptively weighted, ensuring the model effectively leverages multi-scale information to enhance anomaly detection accuracy. Finally, the fused features are passed through a fully connected layer, and a predefined threshold is applied to determine anomalies at each time step, enabling precise financial market anomaly detection.

The design of WaveLST-Trans not only enhances financial time series anomaly detection but also strengthens its applicability in risk warning systems. In market regulation, quantitative trading, and credit evaluation, this model can promptly identify potential market risks, providing more precise decision support for financial institutions [1]. Compared with traditional methods, WaveLST-Trans offers greater adaptability to the complexities of financial markets, serving as both a theoretical foundation and practical reference for future advancements in financial risk management tools.

3.2 Data Preprocessing

WaveLST-Trans employs Wavelet Transform for data preprocessing to enhance the multi-scale feature representation of time series data [5]. Financial time series typically exhibit both long-term trends and short-term fluctuations, and directly modeling the raw data may make it difficult for the model to distinguish between these components, thereby affecting the accuracy of anomaly detection. The Wavelet Transform is particularly suitable for this task due to its ability to decompose a time series into multiple frequency bands, allowing for the extraction of both low-frequency components (representing long-term trends) and high-frequency components (representing short-term fluctuations).However, there are some trade-offs associated with this approach. While the Wavelet Transform effectively decomposes the time series into meaningful components, there is a computational cost involved in performing the transform and the subsequent model training. Additionally, the choice of wavelet function and decomposition level can impact the model's performance, and selecting inappropriate parameters may lead to overfitting or underfitting. Despite these challenges, this preprocessing approach ensures that the model effectively learns abnormal patterns across different time scales, while simultaneously reducing noise interference and improving the stability and accuracy of anomaly detection. Figure 2 illustrates the process of the data preprocessing module.

fig2.jpg
Figure 2 Schematic diagram of Wavelet Transform and data preprocessing module.

In data preprocessing, the core idea of wavelet transform is to perform time-frequency decomposition on the time series, transforming it into multi-scale feature representations. Given a discrete time series X={xt,x2,,xT} wavelet transform applies a scaling factor s and a time translation factor τ to the signal. Using a mother wavelet function ψ, the Continuous Wavelet Transform (CWT) representation is obtained as follows:

WX(s,τ)=X(t)ψ(tτs)𝑑t

Due to the high dimensionality and non-stationarity of financial time series, Discrete Wavelet Transform (DWT) is adopted in this paper to improve computational efficiency. The time series is decomposed into low-frequency components XL and high-frequency components XH as follows:

XL=nxng(n2k),XH=nxnh(n2k)

In this process, g(n) represent the low-pass and high-pass filters, respectively, which are used to decompose the time series into different scale-dependent features. The low-frequency component h(n) primarily captures long-term trend information, whereas the high-frequency component XL reflects short-term fluctuations and local anomalies. After the initial decomposition, the low-frequency component can undergo recursive decomposition to extract smoother long-term trends, where XH represents the number of decomposition levels. In this paper, we select Daubechies (db4) wavelet as the mother wavelet, as experimental results demonstrate its ability to effectively adapt to both the smoothness and fluctuation characteristics of financial time series data. Finally, after wavelet transform processing, the low-frequency component XL is fed into the LSTM module to model long-term dependencies, while the high-frequency component XH is input into the Transformer module to learn short-term fluctuation features:

XL(m)=nXL(m1)g(n2k)

In addition to multi-scale feature extraction via wavelet transform, this study applies data normalization to ensure that data across different time steps are processed on a consistent scale, thereby improving model convergence speed and stability. The mean μ and standard deviation σ represent the statistical characteristics of the data. The normalization is performed as follows:

x~t=xtμσ

Considering the non-stationarity of financial market data, this study further employs a Sliding Window mechanism to segment the time series into fixed-size windows. Given a window size W, the model input at time step t is defined as: X(t)={xtW+1,xtW+2,,xt} This approach ensures that the model effectively utilizes historical information, while preventing excessively long time series inputs that may lead to high computational complexity. After sliding window processing, the low-frequency and high-frequency components are separately fed into the LSTM and Transformer modules, where they undergo feature extraction and anomaly detection.

Mathematically, the Wavelet Transform provides a multi-resolution analysis of the data, where each resolution corresponds to a different scale of the time series. This is achieved by convolving the signal with a wavelet function, which is localized in both time and frequency, making it well-suited for capturing time-varying behaviors in financial data. By decomposing the time series into these components, we are able to isolate the long-term trends and short-term anomalies, which enhances the model's ability to detect abnormal patterns at different time scales. The low-frequency components are fed into the LSTM network, which learns the long-term dependencies of the time series, while the high-frequency components are input into the Transformer module, where the self-attention mechanism extracts short-term fluctuation features.

3.3 LSTM-Transformer Hybrid Model for Multi-Scale Temporal Dependency Learning

Financial market time series data often exhibit both long-term trend variations and short-term extreme fluctuations, making it challenging for a single model to effectively capture both characteristics. Traditional LSTM, leveraging its gating mechanism, excels at learning long-term dependencies in time series but is limited in detecting short-term market fluctuations and local anomalies. On the other hand, Transformer, with its self-attention mechanism, can effectively model global information, making it well-suited for short-term anomaly detection. However, using Transformer alone may lead to the neglect of long-term dependencies. This module integrates both LSTM and Transformer, combining their strengths to enhance multi-scale time series modeling. The architecture of this module is illustrated in Figure 3.

fig3.jpg
Figure 3 Modeling of multi-scale temporal features.

LSTM is primarily used to capture long-term dependency patterns in financial markets, making it suitable for detecting price trends, market cyclicality, and other structural features. Traditional statistical methods, such as ARIMA and GARCH, struggle to effectively model the nonlinear dynamics of long-term time series, whereas LSTM, through its gating mechanism, controls information storage and forgetting, enabling the learning of long-term temporal features. In this study, Wavelet Transform (WT) is applied to decompose financial time series into low-frequency and high-frequency components. The low-frequency component XL is fed into the LSTM layer to extract long-term dependency features, which are then passed to the feature fusion layer, where they are integrated with the short-term features extracted by Transformer. LSTM consists of three gates: Forget Gate, Input Gate, and Output Gate, with their computations as follows:

ft=σ(Wf[ht1,xt]+bf)

it=σ(Wi[ht1,xt]+bi)

C~t=tanh(WC[ht1,xt]+bC)

Ct=ftCt1+itC~t

ot=σ(Wo[ht1,xt]+bo)

ht=ottanh(Ct)

The hidden state ht serves as the output feature of the LSTM module, which is then passed to the feature fusion layer to be combined with short-term features extracted by Transformer, forming the final time series representation. By leveraging its gating mechanism, LSTM mitigates the vanishing gradient problem, enabling the model to capture long-term trends and enhancing the stability of anomaly detection.

The Transformer module is primarily used for modeling short-term market fluctuations, making it suitable for detecting sudden market changes caused by policy adjustments, market manipulations, or breaking news. Unlike LSTM, which relies on recursive computations to handle temporal dependencies, Trans former utilizes the self-attention mechanism to model global time dependencies and precisely capture short-term fluctuation features. The input to Transformer is the high-frequency component XH extracted by Wavelet Transform, which is processed through a multi-head self-attention mechanism to compute the weighted relationships between time steps, effectively focusing on short-term anomaly patterns.

The Transformer module first computes Query (Q), Key (K), and Value (V), where WQ, WK, WVd×dk are learnable parameter matrices. It then calculates the Scaled Dot-Product Attention, where dk acts as a scaling factor to keep attention scores within an appropriate range, preventing vanishing or exploding gradients:

Q=XHWQ,K=XHWK,V=XHWV

Attention(Q,K,V)=softmax(QKdk)V

To enhance modeling capability, the Multi-Head Attention mechanism is employed to compute multiple independent attention mappings, allowing the model to capture diverse short-term anomaly patterns more effectively. Finally, the output is passed through a Feed-Forward Network (FFN) for feature transformation, further refining the learned representations

MultiHead(Q,K,V)=Concat(head1,,headh)WO

FFN(X)=max(0,XW1+b1)W2+b2D

Table 1 Experimental environment demonstrated.
Dataset Market Type Time Granularity Features Application
Binance Cryptocurrency Minute-level Price, Volume, Order book High-frequency anomalies, Market manipulation
SP 500 Stock Market Daily-level Price, Volume, Adjusted Close Long-term trend anomalies, Market crashes

Since Transformer lacks inherent sequential modeling capability, this study employs Positional Encoding to explicitly introduce temporal position information. This allows Transformer to not only focus on short-term market fluctuations but also retain the sequential characteristics of time series, thereby enhancing its anomaly detection performance.

PE(t,2i)=sin(t100002i/d)

PE(t,2i+1)=cos(t100002i/d)

The hybrid architecture of WaveLST-Trans integrates LSTM's capability in long-term trend modeling with Transformer's ability to capture short-term fluctuations, while leveraging Wavelet Transform's multi-scale signal decomposition for feature extraction. This design enables the model to efficiently identify complex anomaly patterns in financial time series.

4. Experiments

4.1 Datasets

This study selects two publicly available datasets, Binance cryptocurrency market data [7] and S&P 500 index constituent stock data [15], to comprehensively evaluate the performance of WaveLST-Trans in financial time series anomaly detection. These datasets represent high-frequency trading markets and traditional stock markets, covering financial anomalies across different time scales, allowing the model's applicability to be validated in various market environments. The Binance dataset is sourced from Binance, one of the world's largest cryptocurrency exchanges, and includes minute-level trading data for major cryptocurrencies such as Bitcoin (BTC) and Ethereum (ETH), with features including open price, close price, highest price, lowest price, trading volume, and taker buy volume. The cryptocurrency market is highly volatile, with frequent anomalous trades, making it well-suited for studying short-term extreme fluctuations. This dataset provides the Transformer module with rich short-sequence information, enabling a thorough evaluation of the model's short-term anomaly detection capabilities.

Table 2 Comparison of experimental results between WaveLST-Trans and mainstream models on Binance and S&P 500 datasets.
Model Dataset Accuracy Recall Precision F1-score MAE
WaveLST-Trans Binance 0.92 0.90 0.87 0.88 0.15
S&P 500 0.91 0.89 0.85 0.87 0.18
TFT[9] Binance 0.90 0.88 0.88 0.87 0.14
S&P 500 0.89 0.85 0.84 0.84 0.19
TimeDiff[17] Binance 0.89 0.86 0.85 0.85 0.13
S&P 500 0.88 0.84 0.82 0.83 0.17
STGNN[8] Binance 0.88 0.85 0.86 0.85 0.16
S&P 500 0.87 0.83 0.81 0.82 0.20
TS2Vec[32] Binance 0.86 0.84 0.82 0.83 0.18
S&P 500 0.85 0.80 0.79 0.79 0.21
CNN-LSTM-IF[12] Binance 0.85 0.82 0.80 0.81 0.19
S&P 500 0.84 0.79 0.78 0.78 0.23

In contrast, the S&P 500 dataset reflects long-term trends in traditional stock markets. This study selects daily-level data for the S&P 500 index and its constituent stocks, including open price, close price, highest price, lowest price, trading volume, and adjusted close price, which collectively represent the overall performance of the U.S. stock market. Unlike cryptocurrency markets, where anomalies manifest as short-term extreme fluctuations, stock market anomalies typically occur over longer time horizons, influenced by financial crises, policy changes, or industry shifts. Additionally, historical stock market data is more comprehensive, covering multiple economic cycles, allowing the experiment to assess the model's robustness and generalization across different market conditions. Table 1 summarizes their key characteristics, including data sources, time granularity, feature dimensions, and applicable scenarios.

For the WaveLST-Trans model, we selected a learning rate of 0.001, a batch size of 64, and the Adam optimizer based on extensive preliminary testing, achieving a balance between model complexity and performance for both Binance and S&P 500 datasets. The Binance dataset was chosen for its high-frequency, volatile nature, ideal for testing the model's ability to detect For the WaveLST-Trans model, we selected a learning rate of 0.001, a batch size of 64, and the Adam optimizer based on extensive preliminary testing, achieving a balance between model complexity and performance for both Binance and S&P 500 datasets. The Binance dataset was chosen for its high-frequency, volatile nature, ideal for testing the model's ability to detect

4.2 Comparison Experiments and Analysis

In this experiment, to comprehensively evaluate the performance of WaveLST-Trans in financial time series anomaly detection, we selected TFT (Temporal Fusion Transformer), TimeDiff (Diffusion Models for Time Series), STGNN (Spatio-Temporal Graph Neural Networks), TS2Vec (Self-Supervised Learning for Time Series), and CNN-LSTM-IF (Hybrid Deep Learning Models) as benchmark models. Experiments were conducted on both the Binance cryptocurrency market dataset and the S&P 500 stock market dataset.These benchmark models represent mainstream approaches in recent years for time series modeling, anomaly detection, and forecasting, offering strong capabilities in capturing long-and-short-term dependencies. The evaluation was based on five key metrics: Accuracy, Recall, Precision, F1-score, and Mean Absolute Error (MAE). The experimental results are summarized in Table 2. From the experimental results, it is evident that WaveLST-Trans demonstrates strong stability and superiority across all evaluation metrics. Particularly in terms of accuracy, recall, and F1-score, the model outperforms the benchmark models, indicating its strong generalization capability in financial time series anomaly detection.On the Binance dataset (cryptocurrency market), WaveLST-Trans achieves a 6% improvement in recall compared to TS2Vec and a 8% increase compared to CNN-LSTM-IF, proving its effectiveness in capturing short-term extreme fluctuations. In contrast, TS2Vec, based on self-supervised learning, and CNN-LSTM-IF, a hybrid deep learning model, exhibit weaker performance in detecting short-period anomalies. Additionally, in F1-score, WaveLST-Trans surpasses STGNN by 3%, demonstrating its ability to balance precision and recall effectively in anomaly detection tasks.

fig4.jpg
Figure 4 Performance comparison on Binance and S&P 500 datasets.

On the S&P 500 dataset (traditional stock market), WaveLST-Trans also outperforms other models in precision and F1-score, achieving 3% and 5% higher F1-score than TFT and TimeDiff, respectively. This suggests that the model can effectively reduce false positive rates when detecting long-term trend deviations. Although TFT performs slightly better in some metrics (e.g., precision), its lower recall indicates that while it models long-term trends effectively, it struggles to identify abnormal transactions or market anomalies. Additionally, TimeDiff achieves the best MAE scores (0.13 on the Binance dataset and 0.17 on the S&P 500 dataset), highlighting the advantage of diffusion models in continuous time series forecasting but showing no significant advantage in discrete anomaly detection tasks.

Table 3 Ablation study results on Binance and S&P 500 datasets.
Model Variant Dataset Accuracy Recall Precision F1-score MAE
WaveLST-Trans (Full Model) Binance 0.92 0.90 0.87 0.88 0.15
S&P 500 0.91 0.89 0.85 0.87 0.18
w/o without Wavelet Transform (WT) Binance 0.89 0.86 0.85 0.85 0.17
S&P 500 0.88 0.85 0.83 0.84 0.20
w/o without LSTM Binance 0.87 0.84 0.83 0.83 0.18
S&P 500 0.86 0.82 0.81 0.82 0.21
w/o without Transformer Binance 0.86 0.82 0.84 0.83 0.19
S&P 500 0.85 0.81 0.80 0.80 0.22
w/o without WT + LSTM Binance 0.84 0.80 0.78 0.79 0.22
S&P 500 0.82 0.78 0.76 0.77 0.24
w/o without WT + Transformer Binance 0.83 0.78 0.77 0.78 0.24
S&P 500 0.83 0.76 0.74 0.75 0.26
w/o without LSTM + Transformer Binance 0.82 0.77 0.75 0.76 0.25
S&P 500 0.80 0.75 0.72 0.74 0.27

As illustrated in Figure 4, WaveLST-Trans combines the long-term dependency modeling capability of LSTM with the short-term fluctuation capturing ability of Transformer, allowing it to perform well across different market environments. Compared to traditional deep learning models (e.g., CNN-LSTM-IF), it provides more stable detection results; compared to self-supervised learning methods (e.g., TS2Vec), it offers better adaptability to both short-term and long-term patterns; and compared to time series diffusion models (e.g., TimeDiff), it outperforms in anomaly classification tasks. Overall, WaveLST-Trans consistently delivers superior performance in financial anomaly detection across various market conditions, demonstrating strong applicability and robustness.

4.3 Ablation Study and Analysis

In this experiment, to verify the effectiveness of each module in WaveLST-Trans, we designed a series of ablation experiments by removing different key components of the model and evaluating their impact on performance. The experiments were conducted on the datasets to assess how each module contributes to the overall model performance. The primary objective of the ablation study is to analyze the individual contributions of each component to the model's effectiveness. The experimental results are summarized in Table 3.

The ablation study results presented in Table 3 confirm the effectiveness of each component within the WaveLST-Trans model. On the Binance dataset, removing the Wavelet Transform (WT) led to a drop in accuracy from 0.92 to 0.89 and in F1-score from 0.88 to 0.85, underscoring its role in capturing multi-scale features. Excluding either the LSTM or Transformer module resulted in an F1-score of 0.83 in both cases, which highlights their complementary strengths in modeling long-term trends and short-term fluctuations. When both LSTM and Transformer were removed, the F1-score decreased to 0.76, approaching the performance of traditional machine learning methods. On the S&P 500 dataset, similar patterns were observed. The absence of WT led to a decrease in F1-score from 0.87 to 0.84, while removing LSTM or Transformer caused F1-scores of 0.82 and 0.80, respectively. Notably, removing both WT and LSTM resulted in the lowest F1-score of 0.77, reinforcing their key roles in long-term pattern extraction. Overall, these results demonstrate that the full integration of WT, LSTM, and Transformer is critical to the model's superior performance across different types of financial time series.

fig5.jpg
Figure 5 Ablation study: performance degradation with module removal.

As illustrated in Figure 5, the ablation study results validate the synergy among WT, LSTM, Transformer, and the feature fusion layer in WaveLST-Trans. Compared to single-model approaches, the full architecture excels in capturing short-term anomalies, long-term trend changes, and multi-scale information fusion. Regardless of whether in high-frequency trading markets or long-term investment markets, the complete model outperforms all individual module variants, further verifying the rationality of the proposed model architecture.

5. Conclusion And Discussion

This paper proposes WaveLST-Trans, a financial time series anomaly detection model based on an LSTM-Transformer hybrid architecture with wavelet transform (WT). The model first applies WT for multi-scale decomposition, feeding low-frequency components into LSTM to model long-term dependencies and high-frequency components into Transformer to capture short-term market fluctuations. These extracted features are then integrated in the feature fusion layer, and the anomaly detection module outputs an anomaly score. Compared to existing methods, WaveLST-Trans effectively models anomalies across different time scales, and ablation studies confirm the necessity of WT, LSTM, Transformer, and the feature fusion layer.Experiments on Binance (cryptocurrency market) and S&P 500 (stock market) datasets show that WaveLST-Trans outperforms mainstream models, improving F1-score and recall by 3%–10% in high-frequency markets and accuracy by 5%–9% in long-term trend markets.

The model maintains strong generalization and robustness across different financial environments. Future work will focus on optimizing the Transformer structure, adaptive anomaly thresholds, and integrating external market events to enhance adaptability. Additionally, further studies are needed to assess its performance under extreme financial events to ensure stability in sudden risk management.


Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
Tian Su is an employee of Meta Platforms Inc., Seattle, WA 98109, United States.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Wu, H. S. (2016, December). A survey of research on anomaly detection for time series. In 2016 13th international computer conference on wavelet active media technology and information processing (ICCWAMTIP) (pp. 426-431). IEEE.
    [CrossRef]   [Google Scholar]
  2. Shih, S. Y., Sun, F. K., & Lee, H. Y. (2019). Temporal pattern attention for multivariate time series forecasting. Machine Learning, 108, 1421-1441.
    [CrossRef]   [Google Scholar]
  3. Guo, T., Zhang, T., Lim, E., Lopez-Benitez, M., Ma, F., & Yu, L. (2022). A review of wavelet analysis and its applications: Challenges and opportunities. IEEE Access, 10, 58869–58903.
    [CrossRef]   [Google Scholar]
  4. Bao, W., Yue, J., & Rao, Y. (2017). A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PloS one, 12(7), e0180944.
    [CrossRef]   [Google Scholar]
  5. Li, J., Liu, Y., Gong, H., & Huang, X. (2024). Stock price series forecasting using multi-scale modeling with boruta feature selection and adaptive denoising. Applied Soft Computing, 154, 111365.
    [CrossRef]   [Google Scholar]
  6. Yıldız, K., Dedebek, S., Okay, F. Y., & Şimşek, M. U. (2022, September). Anomaly detection in financial data using deep learning: A comparative analysis. In 2022 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-6). IEEE.
    [CrossRef]   [Google Scholar]
  7. Li, W., Bao, L., Chen, J., Grundy, J., Xia, X., & Yang, X. (2024). Market manipulation of cryptocurrencies: Evidence from social media and transaction data. ACM Transactions on Internet Technology, 24(2), 1–26.
    [CrossRef]   [Google Scholar]
  8. Li, Y., Zhao, W., & Fan, H. (2022). A spatio-temporal graph neural network approach for traffic flow prediction. Mathematics, 10(10), 1754.
    [CrossRef]   [Google Scholar]
  9. Lim, B., Arık, S. Ö., Loeff, N., & Pfister, T. (2021). Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4), 1748–1764.
    [CrossRef]   [Google Scholar]
  10. Liu, J., Li, Q., An, S., Ezard, B., & Li, L. (2023). Edgeconvformer: Dynamic graph cnn and transformer based anomaly detection in multivariate time series. arXiv preprint arXiv:2312.01729.
    [CrossRef]   [Google Scholar]
  11. Liu, R., & Vakharia, V. (2024). Optimizing supply chain management through bo-cnn-lstm for demand forecasting and inventory management. Journal of Organizational and End User Computing (JOEUC), 36(1), 1–25.
    [CrossRef]   [Google Scholar]
  12. Lu, Y. X., Jin, X. B., Chen, J., Liu, D. J., & Geng, G. G. (2024). F-se-lstm: A time series anomaly detection method with frequency domain information. arXiv preprint arXiv:2412.02474.
    [CrossRef]   [Google Scholar]
  13. Ma, X., Wu, J., Xue, S., Yang, J., Zhou, C., Sheng, Q. Z., ... & Akoglu, L. (2021). A comprehensive survey on graph anomaly detection with deep learning. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12012–12038.
    [CrossRef]   [Google Scholar]
  14. Alghofaili, Y., Albattah, A., & Rassam, M. A. (2020). A financial fraud detection model based on LSTM deep learning technique. Journal of Applied Security Research, 15(4), 498-516.
    [CrossRef]   [Google Scholar]
  15. Madani, M. A. (2025). The S&P 500 sectoral indices responses to economic news sentiment. International Journal of Finance & Economics, 30(2), 2042–2060.
    [CrossRef]   [Google Scholar]
  16. Masini, R. P., Medeiros, M. C., & Mendes, E. F. (2023). Machine learning advances for time series forecasting. Journal of Economic Surveys, 37(1), 76–111.
    [CrossRef]   [Google Scholar]
  17. Mathonsi, T., & Zyl, T. L. V. (2025). Multivariate anomaly detection based on prediction intervals constructed using deep learning. Neural Computing and Applications, 37(2), 707-721.
    [CrossRef]   [Google Scholar]
  18. Qiao, Y., Wu, K., & Jin, P. (2021). Efficient anomaly detection for high-dimensional sensing data with one-class support vector machine. IEEE Transactions on Knowledge and Data Engineering, 35(1), 404–417.
    [CrossRef]   [Google Scholar]
  19. Ran, J., Zou, G., & Niu, Y. (2024). Deep learning in carbon neutrality forecasting: A study on the SSA-attention-BiGRU network. Journal of Organizational and End User Computing (JOEUC), 36(1), 1–23.
    [CrossRef]   [Google Scholar]
  20. Mubalaike, A. M., & Adali, E. (2018, September). Deep learning approach for intelligent financial fraud detection system. In 2018 3rd International Conference on Computer Science and Engineering (UBMK) (pp. 598-603). IEEE.
    [CrossRef]   [Google Scholar]
  21. Ryan, O., Haslbeck, J. M. B., & Waldorp, L. J. (2024). Non-stationarity in time-series analysis: Modeling stochastic and deterministic trends. Multivariate Behavioral Research, 1–33.
    [CrossRef]   [Google Scholar]
  22. Samariya, D., & Thakkar, A. (2023). A comprehensive survey of anomaly detection algorithms. Annals of Data Science, 10(3), 829–850.
    [CrossRef]   [Google Scholar]
  23. Tang, Q., Fan, T., Shi, R., Huang, J., & Ma, Y. (2021). Prediction of financial time series using LSTM and data denoising methods. arXiv preprint arXiv:2103.03505.
    [CrossRef]   [Google Scholar]
  24. Torres, J. F., Hadjout, D., Sebaa, A., Martínez-Álvarez, F., & Troncoso, A. (2021). Deep learning for time series forecasting: A survey. Big Data, 9(1), 3–21.
    [CrossRef]   [Google Scholar]
  25. Wang, X., Liang, A., Sprinkle, J., & Johnson, T. T. (2025). Robustness verification for knowledge-based logic of risky driving scenes. In Future of Information and Communication Conference (pp. 572–585). Springer.
    [CrossRef]   [Google Scholar]
  26. Wu, D., Wang, X., & Wu, S. (2022). A hybrid framework based on extreme learning machine, discrete wavelet transform, and autoencoder with feature penalty for stock prediction. Expert Systems with Applications, 207, 118006.
    [CrossRef]   [Google Scholar]
  27. Xu, J., Wu, H., Wang, J., & Long, M. (2021). Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642.
    [Google Scholar]
  28. Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European journal of operational research, 270(2), 654-669.
    [CrossRef]   [Google Scholar]
  29. Xu, H., Pang, G., Wang, Y., & Wang, Y. (2023). Deep isolation forest for anomaly detection. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12591–12604.
    [CrossRef]   [Google Scholar]
  30. Xu, J., Wu, H., Wang, J., & Long, M. (2021). Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642.
    [CrossRef]   [Google Scholar]
  31. Yang, T., Li, A., Xu, J., Su, G., & Wang, J. (2024). Deep learning model-driven financial risk prediction and analysis. Applied and Computational Engineering, 77, 196–202.
    [CrossRef]   [Google Scholar]
  32. Yue, Z., Wang, Y., Duan, J., Yang, T., Huang, C., Tong, Y., & Xu, B. (2022, June). Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 8, pp. 8980-8987).
    [CrossRef]   [Google Scholar]
  33. Yu, H., Ming, L. J., Sumei, R., & Shuping, Z. (2020). A hybrid model for financial time series forecasting—integration of EWT, ARIMA with the improved ABC optimized ELM. IEEE Access, 8, 84501-84518.
    [CrossRef]   [Google Scholar]
  34. Ahmed, M., Choudhury, N., & Uddin, S. (2017, July). Anomaly detection on big data in financial markets. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 (pp. 998-1001).
    [CrossRef]   [Google Scholar]
  35. Reddy, N. M., Sharada, K. A., Pilli, D., Paranthaman, R. N., Reddy, K. S., & Chauhan, A. (2023, June). CNN-Bidirectional LSTM based Approach for Financial Fraud Detection and Prevention System. In 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS) (pp. 541-546). IEEE.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Su, T., Li, R., Liu, B., Liang, X., Yang, X., & Zhou, Y. (2025). Anomaly Detection and Risk Early Warning System for Financial Time Series Based on the WaveLST-Trans Model. IECE Transactions on Emerging Topics in Artificial Intelligence, 2(2), 68–80. https://doi.org/10.62762/TETAI.2025.191759

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 86
PDF Downloads: 13

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
CC BY Copyright © 2025 by the Author(s). Published by Institute of Emerging and Computer Engineers. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
IECE Transactions on Emerging Topics in Artificial Intelligence

IECE Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3066-1676 (Online) | ISSN: 3066-1668 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2025 Institute of Emerging and Computer Engineers Inc.