1. Introduction
The gas turbine engine is the power source of aircraft, and its reliability directly affects aircraft safety and performance [
1]. The engine is an easy-fault piece of machinery since it has complex structure and runs in harsh operating conditions. During the course of an engine’s life, various physical failures might happen, such as corrosion, erosion, fouling, and foreign object damage [
2,
3]. These failures lead to gas-path performance degradation, either gradually or abruptly, which is recognized as engine gas-path fault and is greatly harmful to flight safety [
4,
5]. For the purpose of enhancing operating reliability and reducing maintenance costs of aircraft propulsion systems, engine gas-path fault diagnosis technology has attracted interest.
Generally speaking, gas turbine fault diagnosis approaches are divided into model-based and data-driven ones [
6,
7]. Variants of Kalman filters are the typical model-based diagnostic methods for gas turbine engines [
8,
9]. It requires a reliable engine model, which relies on physical characteristics and aero-thermodynamic theory. The engine modeling uncertainties and operation non-linearity during the transient process negatively affects the performance of model-based technologies [
10]. Engine-to-engine variation makes it difficult for the general engine model to represent every individual engine. In addition, the observability of general Kalman filters relates to the number of sensor measurements and their types, and it cannot recognize fault patterns as available sensors less than health parameters.
The data-driven approach is another important method of fault diagnosis for complex non-linear systems, especially in the rich data circumstance [
11,
12,
13]. It depends on the collected data of fault modes but not an accurate mathematical model of a gas turbine engine. It is particularly suitable to a complex strong non-linear system, and it is not limited to the sensor measurement number. In the data-driven field, much attention has been paid to neural network approaches, which are theoretically developed from empirical risk minimization with simple mathematical expressions [
8]. Bartolini applied neural networks to micro gas turbines, and then Amozegar developed an ensemble of dynamic neural network identifiers for engine fault detection and isolation [
14,
15]. The desirable topological structure of a neural network is usually selected by experience, and the diagnostic performance by a neural network easily fluctuates with stochastic measurement noise.
Different from the neural network approaches, the hidden Markov model (HMM) is a classic data-driven fault diagnosis for non-linear stochastic systems [
16]. The HMM has rigorous theoretical deduction and definite model structure [
17]. HMM statistical characteristics in modeling and classification makes it outperform fault pattern recognition for a mechanical system with clear randomness and uncertainties [
18]. However, it is noted that the HMM’s computational cost increases dramatically when the measurement dimension increases. The sensor data from various operating cross-sections for engine gas-path fault diagnosis is usually complex and physically correlative in transient process in the flight envelope. Consequently, it is necessary to extract fault features from raw measurement sequences to decrease test data dimensions and simplify the HMM structure.
Principal component analysis (PCA) is introduced into the HMM to extract fault features to reduce computational effort. PCA is achieved by projecting the linear data matrix onto an uncorrelated subspace with less information loss, but its performance is reduced in the plant with strong non-linearity. Kernel-PCA (KPCA) is developed to overcome shortcomings of conventional PCA-to-linear issues [
19,
20]. Provided a kernel matrix (an
n ×
n matrix where
n is the number of the dataset) to map an original dataset into the feature space, the KPCA computational complexity is O(n3) in the principal component feature extraction. Taouali proposed a novel RKPCA to improve the sparsity capability [
21], but it is difficult to balance the useful information capacity and data scale to reach the appropriate sample number.
To improve fault diagnostic confidence and computational efforts, a novel fault diagnostic approach is proposed using the combination of iterative reduced KPCA and HMM for engine gas-path fault diagnosis. In this paper, the IRKPCA is developed with a sample-reduction mechanism, which is designed to decline the redundant information in the initial observation sequences for gas-path fault feature extraction. The similarity degree and forward kernel inverse are employed to simplify the sample data, and then the kernel fault feature by the IRKPCA is used by the HMM to perform gas-path fault pattern recognition. The systematical tests are carried out to evaluate fault diagnosis performance of the proposed methodology, and it runs on a two-spool turbofan engine simulation in the steady and transient process at various cycle numbers during its life. The results indicate the superiority of the IRKPCA-HMM, and it supports our viewpoints.
The remainder of paper is organized as follows. The IRKPCA is developed from the basic KPCA, and the comparisons of the involved KPCAs are followed by feature extraction performance using benchmark datasets in
Section 2. In
Section 3, the IRKPCA-HMM is presented by the combination of IRKPCA and HMM to simplify the fault diagnostic model using reduced-kernel fault features. Simulation and analysis are given on a turbofan engine in dynamics for gas-path fault diagnosis in
Section 4.
Section 5 draws a conclusion and discusses future research directions.
4. IRKPCA-HMM Based Engine Gas-Path Fault Diagnosis
The proposed IRKPCA-HMM approach to gas-path fault diagnosis is tested on a virtual two-spool turbofan engine developed by the component-level engine model [
30,
31]. The examined turbofan engine is mainly composed of inlet, fan, compressor, bypass, combustor, high-pressure turbine (HPT) and low-pressure turbine (LPT), mixer and nozzle, and it is illustrated in
Figure 3. The inlet supplies airflow into the fan, and then the air is divided to two streams: one flowing into the compressor and the other passing through the bypass. Air leaving the compressor moves to the combustor, where fuel is injected and burns to produce hot gas to drive the turbines. The fan and compressor are driven by the LPT and HPT, respectively. Gas from LPT and air from bypass mix in the mixer, and then leaves the engine through the nozzle. Closed-loop control strategy of spool speed is applied to aero engine with safety protection [
32]. The engine station numbers in
Figure 3 are as follows: inlet exit marked by 2, compressor inlet by 22, compressor exit by 3, HPT entrance by 43, LPT entrance by 5, and LPT exit by 6.
The data are generated from the numerical engine model [
33,
34] to evaluate the involved methods in the steady behavior of the maximum power operation and transient behavior including acceleration and deceleration. The involved engine parameters are reported in
Table 3. The control variables include fuel flow
Wf and Nozzle area
A8, which define the operating point of the engine. The health parameters are unmeasurable and represent engine gas-path health, containing indicators of fan efficiency
SE1, fan flow
SW1, compressor efficiency
SE2, compressor flow
SW2, HPT efficiency
SE3, HPT flow
SW3, LPT efficiency
SE4, and LPT flow
SW4. The available measurements are used to calculate health parameters, and they are low-pressure spool speed
NL, high-pressure spool speed
NH, compressor inlet temperature
T22, compressor inlet pressure
P22, compressor outlet pressure
P3, compressor outlet temperature
T3, LPT inlet temperature
T43, LPT inlet pressure
P43, LPT outlet pressure
P5 and mixing chamber inlet temperature
T6 [
35]. The maximum power point on the ground is defined as corrected percentage of high-pressure spool speed
NHcor = 100%, and corresponds to the corrected normalized values of engine control variables: fuel flow
Wf = 100%, nozzle area
A8 = 47%. The measurement noise follows time-uncorrelated zero-mean Gaussian noise, and the magnitude of these noises can be referred to in paper [
36].
Both gradual and abrupt performance deterioration causes health parameter variations. The health parameters resulting from performance gradual degradation is long term, and all health parameters synchronously diverge from their nominal quantities with the cycle number increase. It starts from a healthy engine (all health parameters at their nominal values) at initial cycle number
CN = 0, and with the linearly deviation at the end of cycle number
CN = 6000. The first factory overhaul occurs at one quarter of the engine’s whole lifetime, and three cycle number points before this overhaul, including
CN = 0,
CN = 807 and
CN = 1558, are addressed in this paper.
Table 4 shows health parameter deviations under gradual degradation with regard to cycle numbers.
The health parameters move suddenly from their nominal values in gas-path fault scenarios, and the shift quantities of each fault case are given in
Table 5. There are thirteen operating scenarios in total at one cycle number, including twelve fault cases and a no-derivation case. Sensor malfunction, such as bias or drift, is not considered in this study.
The historical measured data sample is used offline to build up gas-path fault HMM libraries of the engine by the proposed methodology in training stage. Every gas-path fault case relates to one HMM fault library, and they are independent each other. The IRKPCA-HMM libraries are
as the count of fault case equals to
K. The available engine measurements in
Table 3 are recorded online in sequence, and IRKPCA-HMM runs in the left-right type [
37].
Figure 4 shows a gas-path fault diagnosis framework based on IRKPCA-HMM, and the processed data sequence is fed into HMM libraries and each
LL of HMM will be calculated.
The optimal kernel samples are obtained from an online-sensed sequence by IRKPCA in the test stage. The probabilities related to gas-path fault libraries are calculated from reduced observation by the Baum-Welch algorithm. The index
LL is used to recognize gas-path fault mode from the observation, and it belongs to the fault library that owns the largest
LL [
18]. The examined algorithms including HMM, KPCA-HMM and IRKPCA-HMM are performed on a Windows 10 PC with CPU i5-2450 M @2.50 GHZ (Intel, Santa Clara, CA, USA) and 8 GB RAM using MATLAB R2012b software (The MathWorks, Inc., Natick, MA, USA). The Monte-Carlo simulation is conducted, and the performance indices are from ten tries. The correct diagnostic ratio
Acc and its standard deviation
Std are separated, to assess gas-path fault diagnostic accuracy and stability:
where
Nc is sample number of correct recognition,
Nt is total sample number in one fault scenario, and
U is the count of fault scenarios.
4.1. Fault Diagnosis in Steady Process
To evaluate fault diagnosis capability of the examined algorithms in steady process, tests are conducted in engine gas-path fault scenarios mixed with gradual degradation at full power on the ground. Gas-path faults are separately injected into the nominal performance deterioration at
CN = 0,
CN = 807,
CN = 1558 in
Table 4. The health parameters deviate with the cycle number accumulated over time, and they have an abrupt shift with the constant bias related to every gas-path fault scenario in the steady process. The available measurements are recorded as
Table 3, and the hidden state number of HMMs are obtained by searching from 2 to 8 with unit interval. After several tries, the IRKPCA-HMM iteration stop conditions in training process are as follows: iterative step exceeds 100 or convergence error (
LL difference between current step and last step) is below 0.01. The similarity degree is 0.9, and the training data for stochastic modeling is the observation sequence with the length of 100 sampling points. There are 1300 samples in total used as training data.
Figure 5 gives the effect of gas-path fault feature extraction by IRKPCA in steady behavior.
Similar to that of benchmark dataset, the first three dimensions of aero engine dataset are presented in the form of scatter plots, where points with the same color belong to the same cluster from
Figure 5. We have a more distinct version of thirteen engine fault patterns in steady process after feature extraction by IRKPCA.
The test data of ten gas-path fault scenarios are different from their training observation sequences.
Table 6 shows maximum log-likelihood probability
LL* and correct recognized number
Nc by the HMMs at
CN = 1558 in the steady process. The IRKPCA-HMM produces the least absolute
LL* except in the case of XI and the largest
Nc except in cases VIII and XI among the involved HMMs. It implies that IRKPCA-HMM is superior to HMM and KPCA-HMM with confidence and correct ratio regards at
CN = 1558 in the steady process. The performance comparisons of HMMs are presented at various cycle numbers in the steady behavior in
Table 6, and it shows average performance indices of HMMs in 13 scenarios.
The fault feature number and optimal hidden state number of basic HMM are the same at three cycle numbers, and they are larger than those of KPCA-HMM and IRKPCA-HMM. From
Table 6, KPCA-HMM and IRKPCA-HMM have much simpler topological structure than the basic HMM due to less reduced feature number and hidden state number. The quantities of confusion matrix and transition matrix of IRKPCA-HMM in fault mode 1 are shown in
Table 7.
The fault diagnostic accuracy indices of
Acc,
Std, and execution time
ttest by three HMMs are discussed in
Table 8. The
Acc of KPCA-HMM and IRKPCA-HMM are clearly larger than that of HMM, while
Std are smaller than that of HMM at three cycle numbers. The larger
Acc represents better confidence of fault diagnosis result, and the less
Std illustrates better stability. Hence, both of KPCA-HMM and IRKPCA-HMM have better fault diagnostic confidence and stability than basic HMM, and KPCA-HMM is a little worse than IRKPCA-HMM. When it comes to executing time
ttest, KPCA-HMM and IRKPCA-HMM consume less time compared to basic HMM due to the former two having more simplified topology. It is also found that IRKPCA-HMM has almost half the executing time of KPCA-HMM.
The performance index
Acc decreases a bit with cycle number accumulation over time, while computational time of three HMMs are hardly changed at three cycle numbers. The IRKPCA-HMM produces the least
Std in all cases in
Table 8. It implies that IRKPCA-HMM has more outstanding diagnostic accuracy, stability and computational efforts compared to the HMM and KPCA-HMM. Hence, it is a satisfactory method of gas-path fault diagnosis in the steady behavior of turbofan engine. In addition, the tests are simulated in the steady process of high-altitude operation (
H = 5000,
Ma = 1,
Wf = 100%,
A8 = 52%), and the results are as shown in
Table 9. As seen from
Table 9, the performance indices of HMM, KPCA-HMM and IRKPCA-HMM are similar to those in the steady process of ground operation.
4.2. Fault Diagnosis in Transient Process
The transient test is performed including acceleration and deceleration in the flight envelope to further reveal the performance of proposed methodology for gas-path fault diagnosis. The engine starts from the idle (
Wf = 68%,
A8 = 100%), and gradually increases to full power (
Wf = 100%,
A8 = 47%). After dwelling 0.5 s it moves sharply back to the idle, and the whole operation lasts 9.5 s on the ground. The variations of control variables are shown in the transient behavior in
Figure 6. The deviation quantity of the combination of gradual degradation and abrupt degradation are added into health parameters, and the simulations run at three cycle numbers.
The sampling rate of 10-dimension measurements is 0.1 s, and the length of observation sequence for training is 95. The hidden state number and the iteration stop parameters in the dynamics are set as the same as those in
Section 4.1. There are 1235 samples in total used as training data for 13 fault scenarios, and average training time of these fault scenarios by HMM, KPCA-HMM and IRKPCA-HMM are 23.27 s, 19.53 s and 16.89 s, respectively. We can find that the training computational efforts of IRKPCA-HMM are the least among the examined algorithms. The scale of test data for each gas-path fault scenario is 10 observation sequences, and the length of every sequence is the same as the training one. The indices of
LL* and
Nc by the examined HMMs at
CN = 0 in the transient process of ground operation is given in
Table 10.
From
Table 10, the indices of
LL* and
Nc by IRKPCA-HMM are the largest ones in the most fault cases as engine experiences from idle to full power and then back to idle. The topological parameters of three HMMs and average performance indices of all fault scenarios at
CN = 0,
CN = 807,
CN = 1558 are reported in
Table 11. The fault feature number and optimal hidden state number of basic HMM are clearly larger than the rest HMMs, and it means that the topologies of the latter two HMMs are simplified. This is positive for reducing the computational time of gas-path fault diagnosis.
Both KPCA-HMM and IRKPCA-HMM produce the similar performance indices of Acc and Std, which outperforms basic HMM. The confidence and stability of fault diagnostics are improved by fault feature extraction of KPCAs. When the performance index ttest is concerned, KPCA-HMM is obviously different from IRKPCA-HMM and no longer better than basic HMM. The feature extraction dominates computational time of gas-path fault diagnosis in the transient process. The IRKPCA-HMM consumes less time for feature extraction due to the forward kernel inverse and sample simplification scheme, and it is the best way of weighting off the diagnostic accuracy and computational efforts.
Furthermore, transient performance tests of the proposed methodology are implemented in the flight envelope, and control variables fuel flow
Wf and nozzle area
A8 change along flight operation
H and
Ma. The engine starts from the ground point
H = 0,
Ma = 0, climbs to high altitude (
H = 5000,
Ma = 1), and the whole operation lasts 10 s. It runs at the full power operation using closed-loop control strategy of spool speed.
Figure 7 shows the change route of four input variables during transient process in the flight envelope. The deviation quantities related to each fault scenario are initially added into health parameters as well as that of the ground operation.
The observation sequence length of training data is 100, and there are 1300 samples in total used for 13 fault cases. The average training time of fault cases by HMM, KPCA-HMM and IRKPCA-HMM are 26.75 s, 22.43 s and 17.75 s, respectively. The training time of the IRKPCA-HMM is the least among the examined algorithms. The indices of
LL* and
Nc by the examined HMMs at
CN = 0 in the transient process is given in
Table 12.
From
Table 12, the indices of
LL* and
Nc by IRKPCA-HMM at
CN = 0 are the largest ones in the most fault cases in the flight envelope. The topological parameters of three HMMs and average performance indices in all fault scenarios at
CN = 0,
CN = 807,
CN = 1558 are reported in
Table 12. The fault feature number and optimal hidden state number of basic HMM are clearly larger than the others. It indicates that the topological structure of the HMMs is clearly simplified after feature extraction by the KPCA and IRKPCA, and it is positive for reducing computational time of gas-path fault diagnosis.
5. Conclusions
This paper develops a systematic approach to fault feature extraction and pattern recognition, which leads to an improved data-driven fault diagnosis method. The novelty of this methodology lies in the development of IRKPCA and HMM in combination to facilitate gas-path fault diagnosis for turbofan engines. The reduced samples from IRKPCA in feature space decrease the measurement dimension while the principal information of fault feature is retained. The IRKPCA is evaluated using general benchmark datasets, and the results reveal that IRKPCA is superior to plain KPCA regarding discriminative power, sparsity and reduced dimension time. The simplified observation sequence by IRKPCA is utilized by HMM to develop an IRKPCA-HMM algorithm. The goal of this methodology is to increase gas-path fault diagnostic accuracy and relieve computational effort both in steady and transient behaviors. The proposed methodology is evaluated in the scenarios of gas-path abrupt fault mixed with gradual degradation in the flight envelope, and test data are generated from a dual-spool turbofan engine model. The stochastic diagnostic modeling framework is presented and numerically assessed by several performance indices. The advantage of the proposed methodology is that it does not only produce more reliable results but also consumes less computational efforts of fault diagnosis.
This research establishes a new direction in data-driven fault diagnosis by proposing IRKPCA-HMM technique that is specifically beneficial to gas-path stochastic fault diagnosis for turbofan engine applications. The methodology developed in this study is not only limited to turbofan engine, but also extended to other types of gas turbine engine. There are some important topics for further study related to this work. First, further studies can be carried out to investigate various kernels used to map the measurement space to feature space. Second, extensions of the cases that have more than one gas-path abrupt fault, added to gradual degradation and the tests of semi-physical hardware in the loop, are worthy of future exploration.