Appendix: Analysis of a Dynamic Linear Model (DLM)

The formulas of the DLM are described below.

Example DLM Metric

In this example the following data will be used:

Insp Date	1-Jan-1989	1-Jan-1990	1-Jan-1991	1-Jan-1992	1-Jan-1993
Reading (mm)	18.0	18.0	17.8	16.8	16.0
Reading (inch)	0.71	0.71	0.701	0.661	0.63

In this example, every year a reading is taken but that’s not necessary, the algorithm works also for other time intervals.

Let’s assume the initial estimated corrosion rate = 0.1 mm/y.

Step 1

The first reading at 1-Jan-1989 is 18 mm.

The first reading is the blue dot in the graph at 18 mm. In this graph we also see the:

Wall Thickness (WT) data on the left y-axis.
The dates on the x-axis.
The Tmin (the renewal thickness line), which is the blue line at 13.8 mm.

Step 2

With an initial Corrosion Rate (CR) of 0.1 mm/y the best predicted wall thickness for 1-Jan-1990 is 17.9 mm (0.1 mm lost over 1 year). So, the program predicts 17.9 mm with an uncertainty band. The width of the band depends on the uncertainty in the readings and the uncertainty in the model (can be Model 1: linear, Model 2: outlier, Model 3 change in level or Model 4 change in corrosion rate).

Step 3

The reading in 1990 is 18 mm.

From this reading onwards the DLM model starts updating the Corrosion Rate. The reading is higher than was predicted, the updated CR is 0.08 mm/y and with the 0.08 mm/y, the model predicts the wall thickness of 17.8 for 1991. So, the model predicts a Wall Thickness and after a reading has been taken, it updates the model parameters using Bayesian statistics.

Step 4

The reading in 1991 is 17.8 mm, exactly what the model predicted. Then the model continues with the 0.08 mm/y and predicts for 1992 a wall thickness of 17.7 mm. Because the model gets more confidence in Model 1 (linear model), the uncertainty band becomes smaller.

Step 5

The reading in 1992 is 16.8 mm.

So, a reading around 17.6 is expected and the reading is 16.8. This reading is now flagged as an Anomaly in the data, which is displayed in the graph as a red circle. Question - what should the model predict for 1993?

Step 6

Step 6a

If the reading in 1992 is a wrong measurement (outlier, Model 2), then in 1993 a reading around 17.5 mm is expected:

Step6b

If the reading in 1992 is a change in level (Model 3), then in 1993 a reading around 16.5 mm is expected:

Step6c

If the reading in 1992 is a change in Corrosion Rate (Model 4), then in 1993 a reading around 15.8 mm is expected:

Step 6d

Because the model doesn’t know if the reading is an outlier, a change in level or a change in Corrosion Rate in 1992, a large uncertainty band will be displayed in the graph. The estimated CR is a weighted average over the four models but must be verified.

Step 7

The reading in 1993 is 16.0 mm, which confirms a change in Corrosion Rate.

Step 8

After the reading, the model parameters are updated and a CR of 0.8 mm/y will be used to predict the Wall Thickness for 1994.

Analysis of a Single DLM

The DLM is a system of equations describing how observations of a process are stochastically dependent on the current process parameters, and how these parameters evolve in time.

The general form of a DLM is:

Observation equation:

System equation:

where

denotes the observation series at time t
is a vector of known constants (the regression vector)
denotes the vector of model state parameters
is a stochastic error term having a normal distribution with zero mean and variance
is a matrix of known coefficients that defines the systematic evolution of the state vector across time
is a stochastic error term having a normal distribution with zero mean and covariance matrix

The observation equation defines the sampling distribution for conditional on the quantity . The system equation defines the time evolution of the state vector.

A DLM is characterized by a set of for each time t.

Updating: Prior to Posterior Analysis

Bayesian learning proceeds by combining information from observations expressed through the likelihood function with the engineer’s existing state of knowledge before the observations are made. The mechanism of combination is Bayes’ theorem.

Bayes’ theorem

For two quantities X and Y for which probabilistic beliefs are given, Bayes’ theorem states

where the notation denotes a probability density function for the quantity . The vertical bar '|' means 'given', so that all items to the right of this conditioning symbol are taken as being true.

The various terms in Bayes' theorem have formal names. The quantity , is called the prior probability; it represents our state of knowledge about the truth of the hypothesis before we have analyzed the current data. This is modified by the experimental measurements through the likelihood function, or , and yields the posterior probability, , representing our state of knowledge about the truth of the hypothesis in the light of the data. The term is called evidence, which is simply a normalization constant and has been omitted in the analysis, thus

posterior likelihood * prior

The Bayes’ Theorem enables us to make statements about (say) a level following an observation given (i) a quantification of what we believed prior to making the observation, and (ii) a model for the system generating the observation series. This will be seen in the derivation of posterior information.

Prior information

Prior information on the state vector for time (t+1) is summarized as a normal distribution with mean and covariance ,

where denotes the state of knowledge at time t.

Forecasting one step ahead

From the prior information, forecasts are generated using the observation equation. The forecast quantity is a linear combination of normally distributed variables, and , and is therefore also normally distributed. The forecast mean and variance are:

The forecast distribution for one step ahead therefore has the normal form

Likelihood

The model likelihood, a function of the model parameters, is the conditional forecast distribution evaluated at the observed value. It has the normal form

Posterior information

The prior information is combined with information in the observation (the likelihood) using Bayes’ theorem to yield the posterior distribution on the state

For the dynamic linear model the state posterior is the product of two normal density functions, yielding another normal density:

where the moments are obtained as

The posterior mean is adjusted from the prior value by a multiple of the one step ahead forecast error. The amount of that adjustment is determined by the quantity which is the regression matrix of the state vector on the observation conditional upon the history . This regression matrix, or adaptive factor as it is called is determined by the relative size of the state prior variance and the observation variance. This means that the larger the observation variance compared with the state prior variance, the smaller will be the adaptive factor.

Evolution

Once an observation is made, and posterior descriptions calculated, concern moves to consideration of the next time. Given the posterior distribution for the sate at time t-1 as normally distributed with mean and covariance , direct application of the system evolution equation leads to the prior for time t. Once again a linear combination of normally distributed quantities yields a normal distribution,

where

Now the cycle of prior to forecast to posterior to next prior. These stages characterize the routine on-line updating analysis of the DLM.

Multi-Process DLM

For prediction of the ultrasonic wall thickness in future and to detect anomalies four models have been implemented and is therefore named the multi-process DLM. The model consists of four states and the assumption is that the observation at time t belongs to state j:

is a vector of known constants (the regression vector); is a matrix of known coefficients that defines the systematic evolution of the state vector across time; and are known variance matrices.

The four different states of the model can be characterized by the values for the observation noise and the two components of the system noise, for changes in level and for changes in slope. The four possible states of j are numbered as follows:

j=1, static linear growth, V¹ = normal observation noise,
j=2, outlier,
j=3, change-in-level,
j=4, change-in-slope,

For each state j it is assumed that applies at time t with fixed and pre-specified probability . Thus, at time t, the model is defined by observation and evolution equations:

Assume also that at t=0, the initial prior for the state vector is the usual normal form

where and are known.

Historical information is summarized in terms of a 4-component mixture posterior distribution for , the mixture being with respect to the four possible models obtaining at time t-1. Within each component, the posterior distributions have the usual conjugate normal forms. Thus,

(a) For j =1 to 4, model applied at t-1 with posterior probability . These probabilities are now known and fixed.

(b) Given and , is

Evolving to time t, statements about and depend on the combinations of possible models applying at both t-1 and t.

, the prior distribution

where

(d) Similarly, the one-step ahead forecast distribution is given, for each possible combination of models

where

(e) Now consider updating the prior distributions in (c) to posteriors when is observed. Given i and j, the standard updating equations apply within each of the sixteen combinations, the posterior means, variances, etc. obviously varying across combinations.

where

(f) Posterior probabilities across the sixteen possible models:

The second term is the observed value for the predictive density, providing the model likelihood, and so the probabilities are given by

where is the constant of normalization such that

These calculations essentially complete the evolution and updating steps at time t. In moving to t+1 the sixteen-component mixture will be reduced, or collapsed, over possible models at t-1. For each j=1 to 4 it follows that . This equation gives the current model probabilities at time t.

The equation gives the posterior probabilities of the various models at time t-1.

We can now write, for each j, the appropriate mean vectors and the corresponding variance matrices .

Then

This distribution has four standard normal components. So, we complete the cycle of evolution, updating and collapsing; the resulting four-component mixture is analogous to the starting four component mixture defined by components in (b) with the time index updated from t-1 to t.