Master’s thesis - Fault Detection by Adaptive Process Modeling for Nuclear Power Plant (2007)

Master’s thesis that explores fault detection in nuclear power plants (NPPs) using adaptive process modeling. The author, Jaakko Talonen, from Helsinki University of Technology, details a methodology involving data mining (DM), principal component analysis (PCA), and weighted recursive least squares (WRLS). The core of the work focuses on analyzing NPP process data to identify subtle, slowly developing anomalies, such as leakages, that traditional methods might miss. The thesis also describes the development of a Data Management Tool (DMT) to facilitate this analysis, illustrating its application with a simulated leakage scenario to demonstrate the effectiveness of a derived leakage index.

Podbean jaakko.talonen

Summary (Note)

This master’s thesis, conducted by Jaakko Talonen in collaboration with Teollisuuden Voima Oy (TVO) as part of the NoTeS project, focuses on developing methods for detecting slow-developing abnormal events in Nuclear Power Plants (NPPs). The core approach involves adaptive process modeling using recorded process data from NPPs, specifically Olkiluoto reactors in Finland. The research emphasizes data mining techniques, including Principal Component Analysis (PCA) for variable selection and feature extraction, and Weighted Recursive Least Squares (WRLS) for adaptive modeling. A key outcome is the development of a leakage detection method based on model estimation error, implemented within a custom-built Data Management Tool (DMT) in Matlab. The work addresses the critical need for early detection of faults like leakages or fouling to prevent more severe incidents and enhance NPP safety.

FAQ

What is the primary problem addressed by the research?

The main problem addressed by this research is the detection of slow-developing abnormal events in nuclear power plants (NPPs), such as leakages, heat-exchanger fouling, or calibration errors in flow indicators. Detecting these issues early is crucial to prevent more severe consequences.

What methodology is employed for fault detection in nuclear power plants?

The methodology for fault detection involves adaptive process modeling using recorded process data from nuclear power plants. Specifically, the Weighted Recursive Least Squares (WRLS) method is utilized for adaptive modeling. A leakage detection method is then based on the model estimation error, where a fault is declared if the residual error exceeds a certain threshold.

What data mining techniques are crucial to this fault detection system?

Several data mining techniques are crucial to this system:

Data understanding and preparation: This involves exploring data quality, reliability, filtering (e.g., Moving Average, Exponentially Weighted Moving Average, Median filtering), differencing to measure the rate of change, and normalization (zero mean, unit variance, or range scaling).
Variable selection and feature extraction: Principal Component Analysis (PCA) is a key tool for dimensionality reduction and identifying significant variables. Hotelling’s T² statistics help in selecting statistically significant variables, while cross-correlation functions are used to identify delays between process variables. Dynamically similar variables are also identified.
Modeling: The Weighted Recursive Least Squares (WRLS) method is used for adaptive modeling of nonstationary processes, with a time-varying forgetting factor to adapt to system changes.

What is the purpose of the Data Management Tool (DMT) developed in this research?

The Data Management Tool (DMT) is a software programmed in Matlab that facilitates all data mining tasks for the project. It is structured into three main phases: preprocessing, variable selection, and modeling. The DMT allows users to select data, preprocess it, select and extract features, perform WRLS modeling, visualize results, and even manipulate data to simulate faults like leakages for testing the detection method. It also helps manage and utilize a database of statistical properties from various data sets.

How are “slow developing” faults, like leakages, identified?

Slow-developing faults, such as leakages, are identified by monitoring the model estimation error generated by the adaptive Weighted Recursive Least Squares (WRLS) model. When a fault occurs, the process variables’ behavior changes, causing the WRLS model’s estimate to drift, leading to an increasing estimation error. A “leakage index” is then created based on this cumulative estimation error, which rises if a real leakage situation is detected, indicating an abnormal system state.

What is the significance of “adaptive modeling” and “time-varying forgetting factor” in this context?

Adaptive modeling, specifically using Weighted Recursive Least Squares (WRLS), is significant because it allows the model to continuously update its parameters based on new data, making it suitable for nonstationary processes like those in nuclear power plants. The “time-varying forgetting factor” (λ) further enhances this adaptivity. It determines how quickly older data points are “forgotten,” allowing the model to respond more rapidly to abrupt or sudden changes in the system (e.g., a developing fault), as a larger error causes older data to be forgotten faster. This ensures the model remains robust and sensitive to deviations from normal operation.

What types of data were used in the experiments and where did they originate?

The experiments primarily used high-frequency process data recorded from the Olkiluoto nuclear power plant, specifically from reactor 2. Data sets were delivered via compact discs and represented time series of numerous signals. A total of 37 data sets from both reactors were used to create a database of statistical properties for preprocessing, but simulator data was excluded from this database. The specific case analyzed in detail involved a pump stopping in the primary circulation system of Olkiluoto 2 reactor.

What are some of the potential challenges and future developments discussed for this fault detection system?

Challenges include dealing with noise in raw data, the risk of losing relevant information with excessive smoothing, and ensuring that statistically significant variables truly relate to a fault. Problems in data mining and modeling also arise if preprocessing parameters are not optimal. Future developments include refining the leakage index for clearer user representation, improving data mining of all available data sets by storing more comprehensive information like delays between variables, and exploring other modeling techniques such as Self-Organizing Maps (SOM) or Kalman filters.

Suomeksi

Diplomityö käsittelee ydinvoimalaitosten (NPP) vikojen havaitsemista adaptiivisen prosessimallinnuksen avulla. Se keskittyy Jaakko Talosen diplomityöhön, joka esittelee tiedonlouhinnan (DM) menetelmiä ja painotettua rekursiivista pienimmän neliösumman (WRLS) menetelmää vuotojen ja muiden hitaasti kehittyvien poikkeamien tunnistamiseksi. Työssä hyödynnetään Olkiluodon ydinvoimalaitokselta kerättyä prosessidataa ja esitellään tietohallintatyökalun (DMT) kehitystä off-line-analyysiin. Tavoitteena on parantaa vikadiagnostiikkaa ja tukea operaattoreiden päätöksentekoa turvallisuuden ja tehokkuuden lisäämiseksi ydinvoimaloissa.

Suomeksi, mutta viittaa artikkelin alkuperäiseen Englanninkieliseen nimeen. Selitä tutkimusartikkeli 80-vuotiaalle mummolle.

Thesis - articles

All articles available locally: (talon\OneDrive\TiedostotMore\thesis - articles)

Leakage Detection by Adaptive Process Modeling (2008)

This source describes a method for detecting steam line leakages in a boiling water reactor (BWR) nuclear power plant using adaptive process modeling. The authors propose an adaptive linear approach for time series modeling, employing the Weighted Recursive Least Squares (WRLS) method. To ensure a robust model, they utilize Principal Component Analysis (PCA) to select linearly correlated interpretive variables, examining eigenvalues and eigenvectors. The developed leakage detection index is based on the model estimation error, proving particularly effective for small pipe flows and offering an alternative to other sensors for early detection.

further development and this article was based on Master’s Thesis (2007)

Why is adaptive modeling preferred over static models for industrial processes like nuclear power plants?

Industrial processes are inherently nonstationary, meaning their statistical properties and dependencies between variables can change over time due to various factors like different process states or external conditions (e.g., seasonal variations). Static models, while useful for training simulators or understanding plant dynamics, are not accurate enough for fault detection in such dynamic environments. Adaptive modeling, on the other hand, allows the model to continuously update its coefficients, enabling it to recognize abnormal events in these dynamic processes.

Podbean jaakko.talonen

Vuotojen havaitseminen adaptiivisella prosessimallinnuksella

Tutkimus käsittelee mukautuvan prosessimallinnuksen hyödyntämistä höyrylinjojen vuotojen havaitsemiseen kiehutusvesireaktorityyppisessä ydinvoimalaitoksessa, kuten Olkiluodossa. Siinä esitellään painotettu rekursiivinen pienimpien neliösummien (WRLS) menetelmä aikasarjojen mallintamiseen ja vuodon tunnistusindeksin kehittämiseen mallin estimointivirheen perusteella. Pääkomponenttianalyysiä (PCA) käytetään tulkintamuuttujien valintaan sen varmistamiseksi, että adaptiivisen mallin muuttujat ovat lineaarisesti korreloituneita, mikä on välttämätöntä vankan vuodonilmaisumallin kannalta. Tämä menetelmä osoittautuu erityisen tehokkaaksi pienten putkivirtausten vuotojen havaitsemisessa, tarjoten vaihtoehdon muille tunnistusjärjestelmille.

Publication II: “Abnormal Process State Detection by Cluster Center Point Monitoring in BWR Nuclear Power Plant”

This paper introduces a novel approach for abnormal process state detection in nuclear power plants, specifically demonstrated using data from the Olkiluoto Boiling Water Reactor. The method focuses on monitoring the movement of cluster center points in real-time, classifying process signals into “slow” and “fast” categories using the K-means algorithm. By extracting statistical features like absolute difference and moving standard deviation, the system aims to provide early detection of potential faults, thereby enhancing plant safety and reducing operational costs. The proposed technique is presented as an improvement over traditional monitoring methods and even other multivariate statistical approaches like Hotelling’s T^2 statistics, offering a simpler and more effective tool for operators to identify pre-stages of process anomalies.

Core Problem & Motivation

High Dimensionality and Manual Selection Burden: NPPs like Olkiluoto measure and monitor thousands of signals, making manual selection and monitoring “arduous” due to the “high dimensionality of the system.”
Alarm Overload: Control room operators at Olkiluoto are “overloaded with alarms and notification which make it difficult for the operator to make discerning decision.” This necessitates “alarm sanitation.”
Need for Early Fault Detection: Existing monitoring systems often detect faults after they have developed, leading to potential significant impacts on large-scale systems. The goal is to detect “the pre-stages of a process fault” to enable proactive intervention.
Limitations of Traditional Methods: Traditional methods like Shewhart charts and limit value checking struggle to find “correct target value and limit values for each signal” due to the variety of signal types in industrial processes.

Proposed Method: Cluster Center Point Monitoring

An “unsteadiness” index is introduced, focusing specifically on monitoring the “cluster center point coordinates of the slow signals.”
The rationale for focusing on “slow” signals is that “there are remarkably large changes in some measurements in their normal operating state” which can temporarily increase “the center point coordinates of fast signals” (e.g., control flow changes, watering).
The “unsteadiness limit can be adjusted to produce an automatic alarm.” This alarm is based on the moving average of the cluster center point of the fast signals, with the current alarm limit being its minimum value.

Simplicity for Operators: The “unsteadiness” index simplifies monitoring; “a notification system using the unsteadiness index works without operator monitoring.”

Podbean jaakko.talonen

Publication III: Generated Control Limits as a Basis of Operator-Friendly Process Monitoring (2009)

This academic paper from the IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems introduces a novel method for process monitoring and fault detection in complex industrial environments, specifically nuclear power plants. Authors Jaakko Talonen and Miki Sirola propose using data-generated control limits and a new feature called “alarm balance” to address the problem of excessive alarms that often overwhelm operators. Their research, utilizing real and simulator data from the Olkiluoto nuclear power reactors in Finland, aims to provide operator-friendly visualizations that simplify the identification of abnormal process states, thereby improving safety and reducing downtime. The core of their method involves dynamically adjusting alarm thresholds based on historical signal deviations and statistical properties, rather than relying on static, pre-set limits.

What are the main advantages and limitations of this new method?

Early and Accurate Fault Detection: The method can detect faults or their pre-stages, improving safety and reducing downtime.
Operator-Friendly: It addresses the problem of alarm overload by reducing the perceived dimensionality of monitored signals through aggregated features like alarm balance.
Data-Driven and Applicable to Large Systems: Unlike knowledge-based methods, it is easily applicable to large-scale, complex systems as it is based on readily available process data.
Robust and Simple Implementation: The model is robust to different parameter values and can be implemented relatively easily by tuning a few parameters, contrasting with the one-by-one adjustment needed in conventional alarm systems.
Computational Efficiency: The method is computationally light, allowing for frequent data analysis. Improved Fault Diagnosis: Visualizations like alarm balance by unit and localized time stamps aid in determining the type and location of faults.
Podbean jaakko.talonen

Publication IV: Modelling Power Output at Nuclear Power Plant by Neural Networks (2010)

This article proposes two distinct neural network (NN) methodologies for industrial process signal forecasting, specifically focusing on predicting power output at a boiling water reactor (BWR) type nuclear power plant in Olkiluoto, Finland. The research utilizes real data from the plant and explores methods for input signal selection, emphasizing the importance of cross-correlation analysis to detect delays between signals. The paper compares the performance of Feed-Forward (FF) and Elman Neural Network (ENN) architectures, finding that ENN generally yields better prediction results for one-step ahead idle power forecasting. Model evaluation is conducted using criteria like Normalized Mean Square Error (NMSE) and Median Absolute Error (MAE), with the authors noting the need for further research on nonstationary processes before real-world implementation.

“Convert to podcast introduction, max 500 characters”

“Welcome to ‘Nuclear Insights,’ where we explore cutting-edge advancements in nuclear technology. In today’s episode, we dive into neural network innovations for industrial forecasting. Using real data from a boiling water reactor in Olkiluoto, Finland, we compare Feed-Forward and Elman Neural Networks for power output prediction. With techniques like cross-correlation analysis and metrics such as NMSE, discover why ENN stands out and the challenges of tackling nonstationary processes. Stay tuned!”

Podbean jaakko.talonen

Publication V: Analyzing Parliamentary Elections Based on Voting Advice Application Data (2011)

Article written by Jaakko Talonen and Mika Sulkava describes a methodology for modeling the values of Finnish citizens and Members of Parliament (MP) by combining voting advice application (VAA) data with the results of the 2011 parliamentary elections. The authors preprocess the qualitative VAA data into a high-dimensional matrix, which is then reduced to two principal components using Principal Component Analysis (PCA) for visualization. They employ kernel density estimation to create “value grids” representing the distribution of opinions for candidates, parties, and the electorate, even approximating missing data based on party affiliations. The paper visualizes these value distributions, indicating left/right economic views and liberal/conservative social stances as the primary axes, offering insights into voter behavior, party positions, and potential government coalition formations.

How was the data collected and preprocessed for analysis?

The data originated from two main sources: a VAA published by Helsingin Sanomat (HS), Finland’s largest newspaper, and the official results of the 2011 parliamentary elections. Candidates answered 31 questions across various topics (e.g., economy, taxes, defense). This qualitative VAA data was then converted into a high-dimensional quantitative matrix, where each answer option became a variable. Candidate-assigned importance levels for questions were also incorporated into a weight matrix. For analysis, candidates who provided more than 20 answers were included, and the data matrix was mean-centered.

What are the broader implications and future possibilities of this research?

This study opened up numerous possibilities for political data mining. The methodology could be used to analyze how well citizens’ values align with their MPs’ values, explore the impact of different electoral systems, and examine value distribution differences between demographic groups (e.g., young vs. old candidates). Furthermore, by incorporating areal information (electoral districts, cities, or even polling stations), the research could visualize citizens’ values on maps and correlate them with other socio-economic indicators like housing prices or unemployment rates. The authors emphasize that this is a dynamic field of research with ongoing data production that can lead to more complex future analyses.

Podbean jaakko.talonen

Publication VI: Network Visualization of Car Inspection Data using Graph Layout (2012)

This academic paper, authored by Jaakko Talonen, Miki Sirola, and Mika Sulkava, introduces a network visualization method for car inspection data, specifically focusing on rejection reasons. The researchers utilized data from A-Katsastus, a major vehicle inspection provider in Northern Europe, to aggregate extensive statistical tables into a single visual network. They compare this novel network visualization, implemented using the Gephi platform and ForceAtlas2 algorithm, with a Principal Component Analysis (PCA) approach previously explored. The core objective is to enhance the analysis of dependencies between various rejection reasons and car models, ultimately providing a more efficient and readable way to interpret complex vehicle inspection statistics than traditional tables.

What is the primary goal of the research presented in “Network Visualization of Car Inspection Data using Graph Layout”?

The primary goal is to aggregate and visualize extensive car inspection data, specifically rejection statistics, into a single network visualization. This aims to overcome the limitations of traditional table-based reporting, which typically involves dozens of separate tables based on year, make, or model. By visualizing this information as a network, the researchers intend to make it easier for users to study dependencies between different rejection reasons and car models, and to draw their own conclusions.

Podbean jaakko.talonen

Publication VII: The Finnish Car Rejection Reasons Shown in an Interactive SOM Visualization Tool. In Workshop on Self Organizing Maps, Chile, Santiago, 325–334, December 2012

The authors highlight the integration of Collaborative Filtering (CF) as a preprocessing step to address missing values and filter discrete data, enhancing the effectiveness of SOM training. This combined approach allows for deeper insights into car rejection reasons, including their temporal relationships and dependencies. The interactive nature of the SOM visualization tool facilitates the exploration of complex datasets, enabling users to analyze car differences through component planes and filter out driver-dependent factors. The paper concludes by demonstrating the tool’s utility in making car inspection data more informative and outlines future work to further classify car performance.

How is Collaborative Filtering (CF) utilized in this study, and why is it important?

Collaborative Filtering (CF) is used as a preprocessing method before SOM training to address missing values and filter the discrete data. Inspired by recommender systems, CF predicts the probability of specific rejection reasons for each car, effectively filling in the “missing” information (both zero and one values) in the large data matrix. This “collaboration” among car data helps in obtaining more reliable estimates of rejection reason probabilities, which is crucial for effective SOM visualization. Without this filtering, a large number of car labels would cluster in the same SOM nodes, making differentiation difficult.

What is a Self-Organizing Map (SOM), and how does it contribute to the visualization goal?

A Self-Organizing Map (SOM) is a type of artificial neural network that visualizes high-dimensional data in a low-dimensional view (e.g., 2D). It preserves the topological structure of the input data, meaning that similar data points are mapped to neighboring neurons on the map. In this study, SOMs are used to visually explore the preprocessed car inspection data, allowing for the identification of relationships and patterns among different cars and their rejection reasons that would be difficult to discern from raw data or traditional tables.

Podbean jaakko.talonen

Publication VIII: Self-organizing map based visualization techniques and their assessment. International Journal of Computing, Vol. 11, issue 2, pages 96–103, http://www.computingonline.net, September 2012.

This article, “SELF-ORGANIZING MAP BASED VISUALIZATION TECHNIQUES AND THEIR ASSESSMENT” by Miki Sirola and Jaakko Talonen, focuses on data-analysis based visualization techniques for decision support, specifically using the Self-Organizing Map (SOM) method. The authors explore the application of SOM in dynamic systems, particularly within the context of a Finnish nuclear power plant, Olkiluoto, using both plant and training simulator data. They discuss various user interface and visualization assessment criteria and present a case study demonstrating the SOM method’s information value in process visualization. The paper concludes by highlighting the challenges in measuring this information value quantitatively and suggests that a combination of methodologies might yield the best results.

Podbean jaakko.talonen

Thesis 2015

Word cloud - Thesis: Advances in Methods of Anomaly Detection and Visualization of Multivariate Data

Thesis published already in year 2015 and this word cloud app was created few months later.

Test simple R shiny application running in shinyapps: Thesis word cloud

Surprise, the most common word is “data” and thus having the largest font size.

References

Shinyapps
Advances in Methods of Anomaly Detection and Visualization of Multivariate Data
personal notes, private - talonendm - preparing document 2013: (Thesis-2013-2015-printeista)

Tags: thesis package shiny data visualization

Thesis Shiny app - word cloud

Visual presentation of the most common words in each Chapter