Asset Monitoring and Predictive Maintenance

Project details

Large engineered systems such as nuclear power plants consist of thousands of interconnected components.

Components eventually wear out over time which may lead to strange anomalies, faults, and system failures.

Failures may force shutdowns and are expensive to repair. Worst-case scenarios endanger public health and safety.

It is therefore critical to monitor components and understand when they must be repaired BEFORE a failure occurs.

Components are monitored in many ways. One common approach is to record vibrations.

Vibrations give insights into the dynamic response of the component.

Vibrations can tell you if the component characteristics change over time.

Certain changes may mean the component is wearing out and should be repaired before it fails.

However, vibrational data are challenging to work with. Examples of vibrational data are shown below.

They are high-frequency time series signals. Patterns are hidden in the signals.

FPoliSolutions finds the patterns within the signals and monitors how those patterns evolve over time.

Certain pattern changes are associated with the component wearing out.

Finding those changes early prevents failure!

Finding those changes requires training MODELS. The models are used to PREDICT if the component has worn out and needs to be replaced.

However, failures do NOT occur all that often. This leads to significant challenges in properly training the models!

The models need to observe failures, but we do NOT want the systems to fail.

You will learn why RARE events are so challenging to model later!

It is therefore difficult to properly collect and assemble training data for predictive maintenance applications.

Computer experiments to study patterns

Computer simulations can help overcome certain challenges because the simulations are based on physical theory and engineering best practices.

Simulations are used to generate supplemental data of possible failure states.

The simulated data can be added to the existing set of real data to help train more accurate models!

The simulated data consist of higher failure rates compared to real data, because the simulations are specifically designed to induce failures.

The simulations generate vibrational data consistent with real vibrational measurements. Thus, the simulations generate high-frequency time series signals! Patterns can be extracted from those high-frequency signals.

How those patterns are extracted from the signals were not discussed here. The patterns are provided to us.

We will work with the simulated patterns. You will train models to CLASSIFY a simulated failure given the simulated patterns.

Data

The data are provided in the CSV file training_data.csv.
- The columns correspond to different patterns extracted from the data.
- The column naming convention indicates the feature extraction approach used to generate the variables.
- X – Approach 1 at extracting patterns from the signals
- Z – Approach 2 at extracting patterns from the signals
- V – Approach 3 at extracting patterns from the signals
The column letter is followed by a number. Each feature extraction approach includes numerous patterns.
- Approach 1 has 25 columns: X01 through X25
- Approach 2 has 9 columns: Z01 through Z09
- Approach 3 has 29 columns: V01 through V29
The output is named Y and is a binary variable.
The output is encoded as:
- Y = 1 is a FAILURE
- Y = 0 is NOT a failure
The models must predict the PROBABILITY of FAILURE given the INPUT patterns (the X, Z, and V columns).

Project instructions

This project has 2 primary goals:
- Train a model that accurately classifies failure (Y=1).
- Identify the most important inputs that influence the failure probability.
We will need to appropriately explore the inputs BEFORE training models.
- Make sure you study the RELATIONSHIPS between the inputs!
We must use an appropriate validation scheme to select the best model!

Steps

We have divided our project into 6 parts: EDA and Preprocessing, Cluster Analysis, Models, Performance, Prediction, and Bonus. We summed up the summaries in Mains. You will get these files with codes in jupyter notebook and HTML folders in Github. Introduction has been given so far. Let us start with the EDA.

EDA and Preprocessing	Cluster Analysis	Models	Performance	Prediction	Bonus
Plotting necessary data, Standardization, Removing skewness, PCA	KMeans, Hierarchical clustering	7 logistic regression models and accuracy on training data	testing on manually created data	Gridsearch, lasso, ridge, elastic net	SVC, Neural net
Python	Python	Python	Python	Python	Python

EDA and Preprocessing

We can see that the input features are bell-shaped but some of them are left or right-skewed e.g., Z07, Z09, and V02 are left-skewed and V28, V29, and Z08 are right-skewed. We can also see minor bi-modality with X19. Vibrational Data Hence we needed to use log transformation to remove skewness as we will use KMeans later on. We have also observed that the input features are correlated. E.g., successive V inputs are positively correlated which prepares a good stage for PCA.

As we discussed before that we have applied log transformation to remove skew as we will apply the logistic regression model later on.
- Logistic regression assumes that the features follow a normal distribution (or are at least symmetric).
- Algorithms that do not make explicit assumptions about the distribution of the data, such as decision trees and random forests, performed better on data that is more symmetric. This is because extreme values (which are more common in skewed data) can affect the model’s ability to find the best splits and, consequently, its overall performance.
- Highly skewed data had a long range of extreme values that make scaling more difficult. Removing skewness through transformations (like logarithmic, square root, or Box-Cox transformations) made feature scaling more effective.
We have also used standardization
- gradient descent-based algorithms (used in neural networks, linear regression, logistic regression, etc.) converge faster when the features are standardized.
- Support Vector Machines (SVMs), k-nearest neighbors (k-NN), and principal component analysis (PCA) are also sensitive to the scale of the data, as they rely on distance calculations that can be skewed if one feature’s range dominates others.

Cluster Analysis

We have also observed that the input features are correlated. Hence when we applied PCA the correlation got removed. We chose the first 11 PCAs to be a useful one. Then we fitted KMeans and chose 2 clusters by knee bend plot. We have also used hierarchical clustering and went with 2 clusters. Vibrational Data

Models

We have fitted 7 models from linear additive to interaction. Calculated its coefficients and showed statistical significance. We decided on the good models over the number of coefficients, threshold, Accuracy, Sensitivity, Specificity, FPR, and ROC_AUC. This was based on a test dataset.

    formula_linear = 'Y ~ ' + ' + '.join(df_standardized_transformed.drop(columns= 'Y').columns)
    mod_03 = smf.ols(formula=formula_linear, data=df_standardized_transformed).fit()
    mod_03.params

    # Apply PCA to the transformed inputs and create all pairwise interactions between the PCs.
    df_pca_transformed_int = df_pca_transformed.iloc[:, :11].copy()
    df_pca_transformed_int['Y'] = df_transformed.Y
    formula_int = 'Y ~ ' + ' ( '  + ' + '.join(df_pca_transformed_int.drop(columns= 'Y').columns) + ' ) ** 2'
    mod_07 = smf.ols(formula=formula_int, data=df_pca_transformed_int).fit() 
    mod_07.params

We chose model 3 and model 7 from there which are all linear additive features from the original data set and model 7 is the interaction features with PCAs. They have 64 and 67 coefficients respectively.

Prediction

Recall we had only training data with us. Hence, we chose model 3 and model 7 to check them on the test dataset that was created by us manually in the input grids X01, Z01, and Z04. Then we had prediction in model 3 we drew some line prediction plots for model 3 with x=’X01’, y=’pred_probability_03’, hue=’Z01’, and col=’Z04’. Vibrational Data

Next, we had prediction in model 7 we drew some line prediction plots for model 7 with x=’pc01’, y=’pred_probability_07’, hue=’pc04’, col=’pc11’. Vibrational Data

Performance

Next, we evaluated performance with Pipelines fitting logistic regression along with regularization lasso, ridge, and elastic net. We did not restrict ourselves strictly to lasso or ridge, rather went for an elastic net. From the l1 ratio, we observed that it is leaning towards lasso. Hence, we calculated performance with lasso and we got the highest score as 84%.

  # model 7-Apply PCA to the transformed inputs and create all pairwise interactions between the PCs
  pc_interact_lasso_search_grid.best_score_

0.8387878787878786

At the end, we forced to do elestic net a grid search with l1 ratio 0.5. Here also we got 84% accurate with 31 features coefficient zero. Hence we call it the best.

  enet_to_fit = LogisticRegression(penalty='elasticnet', solver='saga',
                            random_state=202, max_iter=25001, fit_intercept=True)
  pc_interact_enet_wflow = Pipeline( steps=[('std_inputs', StandardScaler() ), 
                                           ('pca', PCA() ), 
                                           ('make_pairs', make_pairs), 
                                           ('enet', enet_to_fit )] )
  enet_grid = {'pca__n_components': [3, 5, 7, 9, 11, 13, 15, 17],
             'enet__C': np.exp( np.linspace(-10, 10, num=17)),
             'enet__l1_ratio': np.linspace(0, 1, num=3)}
  pc_df_enet_search = GridSearchCV(pc_interact_enet_wflow, param_grid=enet_grid, cv=kf)
  pc_df__enet_search_results = pc_df_enet_search.fit( x_train_transformed, y_train_transformed )
  #The optimal value for C and no. of pca components is 
  pc_df__enet_search_results.best_params_
  pc_df__enet_search_results.best_score_
0.8387878787878786

0.8387878787878786

  coef = pc_df__enet_search_results.best_estimator_.named_steps['enet'].coef_
  empty_elements = coef[coef == 0]
  empty_elements.size

Extra

We have also fitted SVC and Neural net. In neural net we got 91% to 100% accuracy over cross validation and in SVC we get 100% accuracy all the time.

SVC

  svm_model = SVC()

  svm_param_grid = {
      'C': [0.1, 1, 10, 100],
      'kernel': ['linear', 'rbf', 'poly'],
      'gamma': ['scale', 'auto']
  }

  svm_result=svm_grid_search.fit(x_train_transformed, y_train_transformed)
  svm_result.best_params_

  svm_result.best_score_
  svm_cross_val_scores = cross_val_score(svm_grid_search.best_estimator_, x_train_transformed, y_train_transformed, cv=5, scoring='accuracy')
  print("SVM Cross-Validation Scores:", svm_cross_val_scores)
  print("SVM Mean Cross-Validation Score:", svm_cross_val_scores.mean())

SVM Cross-Validation Scores: [1. 1. 1. 1. 1.]
SVM Mean Cross-Validation Score: 1.0

Neural Net

  # Appropriate model based on our task (regression/classification) is 
  # RandomForestClassifier for classification(RandomForestRegressor for regression )
  model = RandomForestClassifier()

  # Define the parameter grid for tuning
  param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
  }

  # Create the GridSearchCV object
  grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')  # Use appropriate scoring for your task

  # Fit the grid search to your data
  grid_search.fit(x_train_transformed, y_train_transformed)

  # Get the best parameters
  best_params = grid_search.best_params_
  print("Best Parameters:", best_params)

  # Assess performance using cross-validation
  cross_val_scores = cross_val_score(grid_search.best_estimator_, x_train_transformed, y_train_transformed, cv=5, scoring='accuracy')  # Use     
  appropriate scoring
  print("Cross-Validation Scores:", cross_val_scores)
  print("Mean Cross-Validation Score:", cross_val_scores.mean())

Cross-Validation Scores: [1.         1.         0.95555556 0.90909091 1.        ]
Mean Cross-Validation Score: 0.972929292929293

Summary

In EDA we saw the inputs are highly correlated and that’s why they are not very good at separating Y=0,1. The KMeans k2=0,1 worked well and it was not only giving us a better hue in the scatter plot but also matched well with Y=0,1.

We can see that V07, V15, X10 are statistically significant features. It seems the 3rd approach to extracting patterns from the signals is more useful.

Because they are correlated to each other we need the help of PCA to evaluate effective feature variables and at the end, we saw that there are 11 to 13 such PCA features that separate the data well and, hence, effective.

The best logistic regression model turns out to be the elastic net with even mixed with ridge and lasso with 31 zero coefficients. We are getting 83-84% accuracy here. The best model in training turns out to be the best in prediction as well. In the end we saw if we use SVC then we are in fact getting 100% accuracy. I have also included Neural Net in the supporting document where we get 97% accuracy.

Things to answer and to be updated next

This was my second project and more things are yet to be learned and improved. Data Science/machine learning is a journey like life

Removing skew with data-independent approach
How to choose optimal no PCA
More advanced methods after logistic regression.

References

University of Pittsburgh course CMPINF 2100
VSCode, Python

💻

Asset Monitoring and Predictive Maintenance

Arnab Dey Sarkar; Graduate student at the University of Pittsburgh

Asset Monitoring and Predictive Maintenance

Sponsored by FPoliSolutions, LLC

Table of Contents

Project details

Computer experiments to study patterns

Data

Project instructions

Steps

EDA and Preprocessing

Cluster Analysis

Models

Prediction

Performance

Extra

SVC

Neural Net

Summary

Things to answer and to be updated next

References