IOW Analysis Tab

In the Analyze Tab of the IOW (Integrity Operating Window), more details of the Process Data can be visualized and basic stats can be calculated. The first step is to set the From and To date:

Set the From and To date in the Analyze tab to see more detail of the process data and calculate basic stats

There are thirteen tabs with information available:

Raw Data
Detected Data
Clean Data
Time (S)
Time (M)
Histogram
Scatter Chart
Correlation
Box Plot
Data Cleaning
Virtual Tags
Basic Stats
Basic Stats Selection

Raw Data, Detected Data and Clean Data

The Raw Data grid

In the tab Raw Data the raw PI data is shown (so spikes can be present)
In the tab Detected Data, data will be shown after you perform Data Detection of anomalies in the tab Data Cleaning. An anomaly is for example a spike in the data.
In the tab Clean data the data will be shown after you perform the data cleansing step on the detected data. Here, if the spike of the data will be replaced by, for example, an interpolated value.

In the example below, the anomaly detection step has found an anomaly on the second row, which will be coloured green in the Detected Data tab. In the Clean Data tab, the anomaly has been replaced by interpolating the values from rows 1 and 3.

Detected anomaly shown in the Raw Data, Detected Data and Clean Data Tabs.

Time(S)

In this tab, the times series plot of one Variable is displayed. In this chart also the Min, Mean and Max lines will be shown. Clicking on the Previous or Next button will show the previous or the next variable.

Times series plot of a single Variable with the Max, Mean and MIn lines.

Time(M)

In this tab, the times series for multiple Variables will be shown. This gives the possibility to compare the behavior of Variables in time. In the example given below, by clicking on the name of the Variable a drop-down menu will appear where you can select one or more Variables.

The checkbox Normalize can be used to compare two Variables which have a big difference in the data range. In the example below one Variable is around 10 (the orange curve), and the other variable is around 1800 (the blue curve). If you want to compare the trend then the Normalize checkbox should be ticked:

Tick the Normalize checkbox to compare two Variables with big differences in the data range

The Grid view option shows all the Variables in one go:

Grid View showing all the Variables present in IOW.

Histogram

A Histogram is a graphical representation of the frequency distribution of a dataset. The data is grouped in intervals (bins) with a fixed number of ten intervals. The height of the columns displays the frequency per interval. A Histogram provides insight into the distribution of a Variable. The Histogram displays Raw (orange) and Clean (blue) Data of a single Variable; an example is displayed below:

Histogram displaying the frequency distribution of Raw Data of a variable.

Scatter Chart

A scatter diagram is a visualization of the relationship between two variables: one variable on the y-axis versus one variable on the x-axis. A scatter diagram makes it particularly easy to spot trends and correlations between two Variables.

Scatter diagrams with different correlation coefficients

The scatter diagram displays the Raw or Clean Data; an example is displayed below:

Scatter diagram displaying the Raw Data

Correlation

A correlation map is a graphical representation of a correlation matrix. Element cij in the matrix represents the correlation between Variable i and Variable j. As cij is by definition equal to cji, the matrix is symmetric. The strength of the correlation is depicted by the correlation coefficients and colours. The correlation coefficient is in the range of -1 (perfect negative correlation) to +1 (perfect positive correlation). In Statstools, red implies a positive correlation, blue implies a negative correlation, and white implies a zero correlation (no correlation between the variables).Examples of a negative, zero, and positive correlation coefficient

Two types of correlation maps are available. A Spearman correlation map looks similar to a Pearson correlation map. The difference in the calculation of the coefficients is described below. The correlation maps display Raw or Clean Data; an example of a Pearson correlation map is displayed below:

Pearson's correlation map

The Pearson Correlation

The Pearson, also linear or product moment, correlation coefficient determines the extent to which values of two Variables are ‘proportional’ to each other. The Pearson correlation coefficient for Variables x and y, rxy, is calculated as:

Pearson's correlation coefficient formula

The Spearman Correlation

The Spearman correlation coefficient can be thought of as the regular Pearson product moment correlation coefficient, except that the Spearman correlation coefficient is computed from ranks instead of from the original variable values. The Spearman rank correlation coefficient is a measure of monotone association that is used when the distribution of the data makes Pearson's correlation coefficient undesirable or misleading. The Spearman rank correlation coefficient for variables x and y, r_xy, is defined as:

Spearman rank correlation coefficient formula

Where d is the difference in statistical rank of corresponding Variables and N is the number of observations. Example to illustrate the difference between Pearson and Spearman correlation: In the table below, 10 observations of variables x and y are displayed. The two Variables follow an almost perfect correlation: y=x+2. Only one observation, y7, does not follow this rule; it is an outlier:

The Pearson correlation coefficient, calculated as the covariance of Variables x and y (227.3) divided by the square root of the population variance of x (14.36) and the square root of the population variance of y (17.86), equals 0.88 in this example. The Spearman correlation coefficient, calculated with the squared difference in statistical rank, d2, of the variables x and y, equals 0.93 in this example.

Now consider a change in the dataset; y7 = 200. We increase the size of the outlier. The Pearson correlation coefficient then decreases to 0.44, where the Spearman correlation coefficient does not change (the ranking did not change).

Box plot

A box plot is a graphical representation of data through their five-number summaries: the smallest observation, the lower quartile, the median, the upper quartile, and the largest observation. A box plot provides insight into the distribution of a variable.

The box plot displays Raw (orange) and Clean (blue) Data of a single Variable; an example is displayed below.

Box plot displaying Raw and Clean Data of a single Variable

Data Cleaning

See Cleaning Data.

Virtual Tags

See Creating Virtual Tags for Analysis.