- Print
- DarkLight
- PDF
In the Analyze Tab of the IOW (Integrity Operating Window), more details of the Process Data can be visualized and basic stats can be calculated. The first step is to set the From and To date:
There are thirteen tabs with information available:
- Raw Data
- Detected Data
- Clean Data
- Time (S)
- Time (M)
- Histogram
- Scatter Chart
- Correlation
- Box Plot
- Data Cleaning
- Virtual Tags
- Basic Stats
- Basic Stats Selection
Raw Data, Detected Data and Clean Data
- In the tab Raw Data the raw PI data is shown (so spikes can be present)
- In the tab Detected Data, data will be shown after you perform Data Detection of anomalies in the tab Data Cleaning. An anomaly is for example a spike in the data.
- In the tab Clean data the data will be shown after you perform the data cleansing step on the detected data. Here, if the spike of the data will be replaced by, for example, an interpolated value.
In the example below, the anomaly detection step has found an anomaly on the second row, which will be coloured green in the Detected Data tab. In the Clean Data tab, the anomaly has been replaced by interpolating the values from rows 1 and 3.
Time(S)
In this tab, the times series plot of one Variable is displayed. In this chart also the Min, Mean and Max lines will be shown. Clicking on the Previous or Next button will show the previous or the next variable.
Time(M)
In this tab, the times series for multiple Variables will be shown. This gives the possibility to compare the behavior of Variables in time. In the example given below, by clicking on the name of the Variable a drop-down menu will appear where you can select one or more Variables.
The checkbox Normalize can be used to compare two Variables which have a big difference in the data range. In the example below one Variable is around 10 (the orange curve), and the other variable is around 1800 (the blue curve). If you want to compare the trend then the Normalize checkbox should be ticked:
The Grid view option shows all the Variables in one go:
Histogram
A Histogram is a graphical representation of the frequency distribution of a dataset. The data is grouped in intervals (bins) with a fixed number of ten intervals. The height of the columns displays the frequency per interval. A Histogram provides insight into the distribution of a Variable. The Histogram displays Raw (orange) and Clean (blue) Data of a single Variable; an example is displayed below:
Scatter Chart
A scatter diagram is a visualization of the relationship between two variables: one variable on the y-axis versus one variable on the x-axis. A scatter diagram makes it particularly easy to spot trends and correlations between two Variables.
The scatter diagram displays the Raw or Clean Data; an example is displayed below:
Correlation
A correlation map is a graphical representation of a correlation matrix. Element cij in the matrix represents the correlation between Variable i and Variable j. As cij is by definition equal to cji, the matrix is symmetric. The strength of the correlation is depicted by the correlation coefficients and colours. The correlation coefficient is in the range of -1 (perfect negative correlation) to +1 (perfect positive correlation). In Statstools, red implies a positive correlation, blue implies a negative correlation, and white implies a zero correlation (no correlation between the variables).
Two types of correlation maps are available. A Spearman correlation map looks similar to a Pearson correlation map. The difference in the calculation of the coefficients is described below. The correlation maps display Raw or Clean Data; an example of a Pearson correlation map is displayed below:
The Pearson Correlation
The Pearson, also linear or product moment, correlation coefficient determines the extent to which values of two Variables are ‘proportional’ to each other. The Pearson correlation coefficient for Variables x and y, rxy, is calculated as:
The Spearman Correlation
The Spearman correlation coefficient can be thought of as the regular Pearson product moment correlation coefficient, except that the Spearman correlation coefficient is computed from ranks instead of from the original variable values. The Spearman rank correlation coefficient is a measure of monotone association that is used when the distribution of the data makes Pearson's correlation coefficient undesirable or misleading. The Spearman rank correlation coefficient for variables x and y, rxy, is defined as:
Where d is the difference in statistical rank of corresponding Variables and N is the number of observations. Example to illustrate the difference between Pearson and Spearman correlation: In the table below, 10 observations of variables x and y are displayed. The two Variables follow an almost perfect correlation: y=x+2. Only one observation, y7, does not follow this rule; it is an outlier:
The Pearson correlation coefficient, calculated as the covariance of Variables x and y (227.3) divided by the square root of the population variance of x (14.36) and the square root of the population variance of y (17.86), equals 0.88 in this example. The Spearman correlation coefficient, calculated with the squared difference in statistical rank, d2, of the variables x and y, equals 0.93 in this example.
Now consider a change in the dataset; y7 = 200. We increase the size of the outlier. The Pearson correlation coefficient then decreases to 0.44, where the Spearman correlation coefficient does not change (the ranking did not change).
Box plot
A box plot is a graphical representation of data through their five-number summaries: the smallest observation, the lower quartile, the median, the upper quartile, and the largest observation. A box plot provides insight into the distribution of a variable.
The box plot displays Raw (orange) and Clean (blue) Data of a single Variable; an example is displayed below.
Data Cleaning
See Cleaning Data.
Virtual Tags
See Creating Virtual Tags for Analysis.