pdCluster: Partial Discharges Clustering
Partial discharge measurements analysis may determine the existence of defects. This package provides several tools for feature generation, exploratory graphical analysis, clustering and variable importance quantification for partial discharge signals.
The development pages of pdCluster
are here.
It can be installed with:
install.packages("pdCluster", repos="http://R-Forge.R-project.org") install.packages(c("hexbin", "RColorBrewer"))
Along this webpage you will find some examples using some real datasets.
library(pdCluster)
Table of Contents
1 The Prony's method
A clean partial discharge signal can be regarded as a finite combination of damped complex exponentials. Under this assumption, the so-called Prony's method allows for the estimation of frequency, amplitude, phase and damping components of the signal.
We have a collection of signals in a list
named signalList
(download).
load('signalList.RData')
The signals contain zeros at the beginning and at the
end. The no0
function can remove these parts.
xyplot(signalList, y.same=NA, FUN=function(x){xyplot(ts(no0(x)))})
With these cleaned signals the Prony's method can provide their components.
signal <- signalList[[3]] pr <- prony(signal, M=10) xyplot(pr)
Since the number of components must be fixed \a priori\,
the function compProny
allows the comparison of different numbers:
compProny(signal, M=c(10, 20, 30, 40))
2 Feature generation
pdCluster
includes several functions for feature
generation. The analysis
function comprises all of them. The
results for our example signal are:
analysis(signal)
This function can be used with a list of signals in order to obtain a matrix of features:
analysisList <- lapply(signalList[1:10], analysis) pdData <- do.call(rbind, analysisList)
Now we need the angle and reflection information, available from
another different dataset (named pdSummary
, download).
load('pdSummary.RData')
In order to safely share the information, both data frames must be reordered by their energy values:
idxOrderSummary=order(pdSummary$sumaCuadrados) idxOrderData=order(pdData$energy) pdDataOrdered=cbind(pdData[idxOrderData,], pdSummary[idxOrderSummary,c('angulo', 'separacionOriginal')])
Later, the data frame to be used with the clustering algorithm has to
ordered by time. Thus the samples of the clara
method will
be random.
idx <- do.call(order, pdSummary[idxOrderSummary, c('segundo', 'inicio')]) pdDataOrdered <- pdDataOrdered[idx,]
We can now construct a PD
object. (The
pdCluster
package is designed with S4 classes and
methods. Two classes have been defined: PD
and PDCluster
).
pd <- df2PD(pdDataOrdered)
The results of analysis
to the whole dataset are available here.
load('dfHibr.RData') dfHibr <- df2PD(dfHibr)
3 Transformations
Prior to the clustering algorithm, the feature matrix has to be filtered:
dfFilter <- filterPD(dfHibr)
and transformed:
dfTrans <- transformPD(dfFilter)
The next figure compares the datasets after and before of the transformations:
nZCbefore <- as.data.frame(dfFilter)$nZC nZCafter <- as.data.frame(dfTrans)$nZC comp <- data.frame(After=nZCafter, Before=nZCbefore)
h <- histogram(~After+Before, data=comp, scales=list(x=list(relation='free'), y=list(relation='free', draw=FALSE)), breaks=100, col='gray', xlab='', strip.names=c(TRUE, TRUE), bg='gray', fg='darkblue')
The filterPD
method is a wrapper for the general
subset
method. With subset
it is possible to extract
a group of samples based on a condition and select only certain
columns.
dfTransSubset <- subset(dfTrans,
subset=(angle >= 90 & angle <=180),
select=c(energy, W1, nZC))
dfTransSubset
4 Graphical tools
The pdCluster
packages includes a set of graphical exploratory
tools, such as a scatterplot matrices with hexagonal binning, density
plots histograms or phase resolved partial discharge patterns, both
with partial transparency or hexagonal binning.
splom(dfTrans)
densityplot(dfTrans)
histogram(dfTrans)
xyplot(dfTrans)
hexbinplot(dfTrans)
5 Clustering
The filtered and transformed object can now be used with the clustering algorithm. The results are displayed with a phase resolved pattern with clusters in separate panels in the . The colors encode the distance of each point to the medoid of its cluster. The displays the same pattern with superposed clusters. Here the colors encode the membership to a certain cluster, and transparency is used to denote density of points in a region.
The results can be easily understood with the density plots of each cluster and feature or with the histograms .
dfTransCluster <- claraPD(dfTrans, noise.level=0.7, noise.rm=TRUE)
xyplot(dfTransCluster)
xyplot(dfTransCluster, panelClust=FALSE)
histogram(dfTransCluster)
densityplot(dfTransCluster)