Vaibhav Attre.boot

retro-futurist workstation init

[ 0.000] POST: workstation bus scan complete
[ 0.013] memctl: 64 GiB addressable, ECC nominal
[ 0.029] sched: run queue initialized
[ 0.041] fs: mounting /home/vaibhav
[ 0.068] net: link up on loopback and uplink0
[ 0.091] ui: terminal compositor ready
[ 0.112] session: entering interactive workspace
Loading command bus, filesystem tree, and process monitor.
Vaibhav Attre.ws
/projects/turbofan-rul-prediction
shell bus
vaibhav@ws:/projects/turbofan-rul-prediction$ session
interactive workstation online
type `help` or press `/` to focus the shell
tab completes top match
/projects/turbofan-rul-prediction

TurboFan RUL Prediction

Remaining useful life prediction for turbofan engines using sliding-window features, GRU sequence modeling, random forests, and regularized linear baselines on NASA N-CMAPSS data.

Turbofan RUL prediction workflow
ml systems

TurboFan RUL Prediction

TurboFan RUL Prediction is a machine learning project focused on predictive maintenance for aircraft engines using NASA N-CMAPSS multivariate time-series data. The project studies how sensor trajectories, operating conditions, and recent degradation history can be used to estimate remaining useful life (RUL) before failure. Rather than relying on a single model family, the system compares three approaches: a regularized linear baseline, a random forest regressor, and a GRU-based sequence model tuned with Optuna. The pipeline includes HDF5 data loading, exploratory analysis, engine-level train/validation/test splitting, feature selection, sliding-window sequence construction, and evaluation with RMSE, MAE, and NASA’s asymmetric scoring metric. The final results showed the GRU as the strongest model, highlighting the value of temporal modeling for degradation prediction.

date: 2026status: active
overview

Predicting remaining useful life from aircraft engine telemetry is challenging because the relationship between sensor measurements and failure is noisy, nonlinear, and history-dependent. This project explores how much model choice and temporal context matter when estimating engine degradation from real multivariate sequence data.

implementation
Loaded and processed NASA N-CMAPSS HDF5 data, combining operating conditions, auxiliary variables, and physical sensor measurements into modeling-ready dataframes.
Performed exploratory analysis on RUL distributions, cycle lengths, and sensor trajectories to understand degradation behavior and guide modeling choices.
Split data by engine identity into train, validation, and test sets so entire trajectories stayed isolated across splits.
Built feature-selection pipelines using correlation with RUL and random-forest feature importance to prioritize informative sensor and flight variables.
Constructed sliding-window inputs over engine histories so models could use recent temporal context when predicting current RUL.
Trained and evaluated a Ridge-regularized linear regressor, a tuned random forest regressor, a baseline GRU, and an Optuna-tuned GRU using RMSE, MAE, and NASA score.
challenges
RUL prediction depends on degradation history, so snapshot-based modeling alone can miss important temporal patterns.
The dataset contains noisy, highly correlated sensor channels, making feature selection and model capacity important design choices.
Generalization had to be measured at the engine level, not the sample level, to avoid leakage between train and test trajectories.
Balancing interpretability, computational cost, and predictive accuracy required comparing simpler baselines against more expressive sequence models.
outcomes
The tuned GRU delivered the best overall performance, substantially outperforming the linear and random-forest baselines on held-out test engines.
The project showed that sequence-aware modeling is well suited to turbofan degradation forecasting, where recent trajectory matters more than isolated measurements.
The comparison framework created a strong baseline for future work on larger N-CMAPSS subsets, alternate recurrent architectures, and richer predictive-maintenance pipelines.
architecture notes
The data pipeline loads multivariate HDF5 engine telemetry, merges operating-condition and sensor channels, and constructs engine-specific trajectories for modeling.
Feature engineering combines domain-relevant variables, correlation analysis, random-forest-based importance ranking, standardization, and sliding-window construction over engine history.
The modeling stack spans interpretable baselines and temporal deep learning, allowing direct comparison between static engineered-feature regressors and sequence-aware recurrent models.
Evaluation uses held-out engines rather than random row-level splits so the reported metrics better reflect generalization to unseen engine trajectories.
stack
PythonPyTorchscikit-learnOptunapandasNumPyMatplotlibh5py
highlights
Built an end-to-end RUL prediction pipeline on NASA N-CMAPSS turbofan engine data using regression and sequence models.
Engineered sliding-window features and sequence inputs to capture short-term degradation history instead of only per-cycle snapshots.
Compared regularized linear regression, random forest, baseline GRU, and Optuna-tuned GRU under a common evaluation pipeline.
Achieved best test performance with the tuned GRU, reaching 5.19 RMSE and 3.92 MAE on held-out engines.
metrics
best test RMSE
5.19
best test MAE
3.92
models compared
4
media
Turbofan exploratory analysis plots
Exploratory analysis of RUL distributions, engine cycle lengths, and sensor degradation trajectories across engine units.
Research paper style essay
Model comparison across linear, random forest, baseline GRU, and Optuna-tuned GRU using RMSE, MAE, and NASA scoring.